Random Number Generation Part Two

In the last part, we discussed the reasons you might want to create your own pseudo-random number generator.  In this part, I’m going to discuss my implementation and how it relates to some of the other well-known RNG algorithms.  In part three (yeah I decided to split it into even more parts!) I’ll discuss the uses of “filtered random number generation” in games to help make some of your generation appear more random to your users.  That’s one of my favorite topics in this subject, so it should be pretty fun.

 

Important Qualities in a Generator

I wanted to create an algorithm that will serve all purposes in the games I’m creating.  This means the generator needs to be able to generate nice uniformly distributed random numbers for generating level data, seeding economic systems, etc.  It also needs to be able to do things related to gameplay such as: dice rolls, random index selection in arrays, return values within a range, etc.  Given these two sets of use cases, we can sit back and figure out a list of requirements for our algorithm.  Here are a few of the important ones with small definitions.

No Easily Discernible Patterns – We need the period of the function to be long enough that we don’t end up with strange artifacts from repeating patterns.  If the period of our generator is something like “10″ for example (a really bad generator), this means that after just 10 uses the function has reset and is going to produce the same string of 10 numbers over and over again.  (Note: this is probably the most important requirement for one of these algorithms.  You want a period that is sufficiently large, I’ll discuss this a little more later).

Random Parity Distribution - You want to make sure that you are fairly evenly distributed across even and odd numbers.  If you’re not careful, it’s actually quite easy to create an algorithm that only produces odd numbers / even numbers / or produces alternating numbers.  This becomes a huge problem if you start using your generator to produce small numbers (4 or 6 sided die rolls that produce only odd numbers for example, would be fairly horrible).

Completely Deterministic Given Known Seed Data - To be able to dynamically create worlds, and not just store the raw data on disk after generation, you need to make sure that you can reproduce the data like we discussed earlier.  This means that we can’t use any mid-generation entropy data like the current time, player movements, player score, etc.  You can still use these sources of entropy for setting the seed, as long as you store it somewhere so re-initialization is really easy to do if it’s needed.

Uniform Distribution of Values – This requirement is based on your use cases and common data sets.  For ranges that you care about, you need to make sure your generator doesn’t favor values in certain sections of the range.  If you commonly generate a number between 1 and 100, but most numbers produced from your algorithm are larger than 80, this is not ideal.

Given these 4 major requirements.  We can sit down and figure out how to tackle each of these categories, either in designing the algorithm or in the testing strategy we’ll use to determine if it fails in any of them.

 


My Algorithm

So I’ll start with the algorithm and then go into the ways I proved each of the categories above.  There was a little trial and error in designing it, but since I wrote a fairly strong test suite (for my purposes anyway) it was easy to plug a new algorithm in and test the results.

I decided to create a “Fibonacci style” generator.  Basically the idea is, to use more than one previously generated value when generating the next value, similar to how the Fibonacci number system works.  I chose to do this because I wanted to shoot for a period that was higher than 2^32.  If you only use one previously generated value (like the most recent value) when generating a new number, this means your max period is equal to the total number of possible seed values.  For a single value, this is the size of the variable holding the data (and in my case would be an unsigned int, 2^32).

Since I’m using two values, my function’s theoretical period ceiling is 2^32 * 2^32 (all possible combinations of two values).  You can see, the more values you add, the higher your max ceiling will be.  But I felt 2^64 was a big enough ceiling that I should be able to get an actual period greater than 2^32.

So here’s the algorithm I’ve designed:

V = 0xF2EB19A03

X0 = #seedValue1

X1 = #seedValue2

Xn = (Xn-2) >> 1  * V + (Xn-1)

The hard coded value “V” is basically a randomly chosen large number used in a multiplication operand.  I chose the number fairly randomly, I just made sure that it was odd and large enough to likely cause unsigned int overflow often.  It was important for this constant to be odd, so that when it’s multiplied with the first operand the result can be either even or odd.  If the constant was even then the result would be even every time, and we’d fail our parity requirements.

The next part, the shift by 1 bit.  This was my source of “random parity distribution”.  This allows us to use a previously generated number as a source of “random bits” to allow us to randomly choose parity.

We then simply add the previously generated result and we have our new value.

 

Testing

We’ve proven mathematically that the number parity is sufficiently random and patternless.  We need to prove that the generated numbers are uniform across the ranges we care about.

The range I currently care about is -100000 to 100000.  This is equivalent to 0 to 200000.  The test runs 400000000 iterations (200000 distinct values in the range, 200 iterations).  I then check to see if any of the values in the range has been hit less than half of the iterations.  I then measure how often this happens, and use that as a uniformity score across the range.  I’m okay with the margin of error being 50% across this range.

Upon testing, this algorithm actually performed better than I expected.  Turns out the margin if error is actually very close to 30%, and the numbers are more evenly distributed than I thought over a large sample.

As of the writing of this article, I’ve proven that the period of this function is > 2^42.  I haven’t run the full test on it, but this makes me happy enough for my purposes, I might update this article in the future if I get around to just letting the full test run.

 

Limitations

The above algorithm works great for the range 0 to 200000.  It also appears to work pretty well for larger ranges too but there is a limitation.

Since the algorithm uses unsigned int overflow as part of the algorithm, values can get in a “flip flop” state where the numbers produced alternate between “near max” (~2^32) and “small (~2^16).  Since this wasn’t an important use case, I decided it wasn’t enough of a reason to modify the algorithm.  However, I did want to actually implement a solution to this that I was happy with, just in case the need came up.

What I decided to do was also have my generator support a “generate large number” function.  The way this function works is fairly simple:

The function takes the number of bits as a parameter.  It then uses the generator to produce a coin flip (0 or 1) and then set that as a bit value on the result.  The algorithm looks like this:

unsigned int result = 0;

for (int i = 0; i < bits; ++i) {

result = (result << 1) + coinflip();

}

return result;

Since we’ve proven that we can uniformly distribute values in ranges of 0 to 200000 or smaller, we can use the range 0 to 1, and set the bits in an unsigned in randomly.  This allows us to produce uniformly spaced and arbitrarily sized unsigned int values at random.

So the generator works in “fast mode” for ranges up to 200000 units in size.  After that it switches over to “slow mode”, and uses the above algorithm to produce larger values and mod them down to the required range.  There’s also support to just always operate in “slow mode” for implementations that have widely varying ranges.

This was a fun experiment, and apparently the large number generation (setting bits randomly) is a similar process to this algorithm.

Feel free to leave comments here, I’d love to hear what you think of this algorithm.  Anything you’d like to change, or go ahead and use it in your projects and let me know how it goes! And of course if you’d like some more details, I can elaborate in some comments or update the document.  I know it’s pretty long winded and mostly a brain dump, so I might have to come back later and fix tons of grammatical errors.

Of course, part 3 is still coming in the next couple of days.  In part 3 we’ll digress a bit and talk about applications of random numbers and filtered random generators.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>