When a problem turns into cool code

The goal

Generate some random data for an application.  The value we’re focusing on here, is a short “about me” paragraph.

What we have

  • Around 40 pre-populated rows.
  • Free reign to do as we want.

First solution

Create a collection of unique words from existing records by splitting the existing about me strings on spaces, removing special characters and building a unique list of words using an associative array.

This gets run once when the package initializes.  After that the word_list is retained in memory.

As each new record is created we generate the about value as a random number of random words from the word list.

The problem

Even though the words are real, the values are gibberish.

We decided to create a table of “generic” sentences and build the new ‘about me’ value by randomly selecting sentences.

With this change, we can drop the build_word_list function and just do a simple bulk collect from the sentences table.  Then we can change generate about to use the sentences list instead of a word list.

This works fine except, now that we have a much sorter list to choose from, we wind up with a lot of duplicate sentences.  We’d like to only have unique sentences.

Ideas

Associative Array
  • Create an associative array to keep track of the the used indexes.
  • Generate the index number of the sentence to use separately.
  • If the new index number is already used regenerate the index.

The risk here is a long running process if we repeatedly hit the same random number.  It’s a low risk, but it’s there.

Copy the Collection and delete used items

Make a copy of the master collection and collection.delete(x) as the sentences are used.

Of course the problem is, deleting an item doesn’t collapse the collection and we’ll get an exception if we hit an empty index.  There are several ways to get around this, but now our code is getting long.

Copy the Collection using the Table() function

If we make a copy of the master collection by selecting from it with the Table() function, we can make a collection that is randomized.  Also, if we use FETCH NEXT we can just get the number of rows needed.

Cool, but…

We can do better

Let’s get rid of the CAST

Now, let’s use an analytic function and get rid of the new collection and the loop that goes with it.

Note, that we used a nested SELECT statement.  This allows us to still use FETCH NEXT to limit the number of sentences we send to LISTAGG.

We could have selected directly from the sentences table, but that would mean an extra trip back to the database which may add up when generating 10,000 rows.

In the end

We added a small table of sentences, removed an extra function and generated semi-coherent sounding profiles.

And we look cool doing it.

 

Leave a Reply