## Whitepaper - Pseudo Random Number Generators

### ABSTRACT

• Random numbers are used for many cryptographical purposes, but people do not pay enough attention to creating secure random numbers. Random numbers that can be guessed can break the whole crypto system, and are therefore very important factors. At first there is talk about pseudorandom numbers and generators and where they are used. First section also includes discussion about algorithms and seeding. Cryptographic pseudorandom number generators and attacks against them are also presented generally. Analyzing part includes introductions to random number generator tests such as Diehard, Nist's test suite, ENT and Michal Zalewski's Attractor analysis tool. Data, which came with attractor analysis tool, was also tested with ENT and results are presented at the end of this paper.

### Introduction

• Random numbers are used for many cryptographical purposes like generation of session keys, initialization vectors and creation of public keys. The security of cryptographic systems depends on the generation of unpredictable numbers. Usually people do not pay enough attention to creating secure random numbers. Random numbers that can be guessed, will become system's weakest point and break the hole crypto system. Algorithms and protocols can't cover for bad random numbers.
• Pseudorandom number generator is a mechanism that produces random-like numbers. Numbers are called the pseudorandom numbers, because truly random numbers come from non-deterministic sources like radioactive materials and atmospheric noise. Computers are designed to execute strictly defined sets of commands in repeatable and accurate ways. Thus, every fixed algorithm can be used to produce exactly the same results on another computer, which then can be used to effectively predict output values, assuming attacker can reconstruct internal state of such a remote system. Also even if the target pseudorandom number generator function is not known, sooner or later the algorithm will start generating the exact same sequence over again because there is a limited number of possible internal states that can be used by specific algorithm. PRNGs are typically constructed from other cryptographic primitives, such as block ciphers, hash functions, and stream ciphers.
• A good pseudorandom number generator will produce a sequence of numbers that can not be easily guessed or determined by an adversary. Cryptographically secure generators should produce numbers, that could not be predicted, thought an attacker would know which algorithm generator uses. A good generator also has to have good statistical properties, long period, efficient implementation and it is reproducible and portable.
• PRNG needs a seed, or a key, as a input and then produces a stream of bits or numbers determined completely by the seed. This why seeding is very important when PRNG is used for cryptographical purposes. Internally, a PRNG needs to have a mechanism for processing unpredictable samples, a mechanism for using those samples to update its internal state or its seed, and a mechanism to use some part of its internal state or its seed to generate pseudorandom outputs. In some PRNG designs, the same mechanism does all three tasks and in others, the mechanisms are clearly separated.

Algorithms

• There exists many algorithms for generating pseudorandom numbers. The linear congruential, linear feedback shift register, and related generators are not good enough for cryptographical purposes but with care can be used for simulation. Some related generators called lagged Fibonacci generators have extremely long periods but are nevertheless insecure. By contrast, the Blum-Blum-Shub generator is believed to be secure, since it's security is provably equivalent to the existence of a fast algorithm for quadratic residues modulo a composite number whose factorization is not known, which is believed to be hard.
• The linear congruential generators will be maximal period generators, if increment, multiplier and modulus are properly chosen. They are also fast, but unfortunately totally predictable.
• Linear feedback shift registers are easily implemented in digital hardware and popular because of that, but their sequential bits are linear, which makes them useless for encryption.

Seeding

• Information collected for the seed does not need to be truly random, but unguessable and unpredictable. Some PRNGs, that are software based, use hardware based RNGs to generate the seeds for PRNGs. Typical seeds such as current time, number of clock ticks since reboot, process ID or user input, produce only a limited number of random bits and are therefore insecure.
• Some protocols are such that they can "leak" some details of the PRNG state. For example linear congruential generators essentially give their entire internal state with every output. Given the deterministic nature of the PRNG this can compromise security unless reseeding is carried out appropriately. Even cryptographically secure algorithms like Blum-Blum-Shub fail in pseudorandom number generation if the seed is known.

Cryptographic PRNGs

• A good PRNG is designed in such a way that given enough entropy, it keeps producing numbers that are hard to guess, even if an attacker knows the full details of the algorithm.
• One cryptographic PRNG is Blum-Blum-Shub. Its security is based on a simple mathematical principle: factoring a composite number N=pq, where p and q are large unknown primes. Other cryptographic PRNGs tend to base their security on block ciphers or hash functions. They are much more faster but it is harder to analyze their security accurately.
• Another cryptographic PRNG is Tiny, which is a simple evolution of the Yarrow PRNG. It works by encrypting a counter using AES to produce successive blocks of output. It is believed to be secure against all known cryptographic attacks .

Yarrow and Tiny

• Yarrow  rely on one-way hash function SHA-1 and block cipher three-key triple-DES. There are four major components: an entropy accumulator which collects samples in two SHA-1 pools from entropy sources, a reseed mechanism which periodically reseeds the 160-bit key with new entropy from the pools, a generation mechanism which generates PRNG outputs from the key using triple-DES and a reseed control that determines when a reseed is to be performed.

• Generation mechanism: Needs n-bit counter value C. To generate the next n-bit output block R, increment C and encrypt it with the block cipher E_K() using the key K.
• C < -- (C+1) mod 2^n R < -- E_K(C)

• Tiny  is a simple evolution of the Yarrow PRNG and also facilities for dealing with entropy. Tiny works by encrypting a counter using AES to produce successive blocks of output.The key and the initial state are derived from the seed data. Tiny uses few blocks of output left from user's request to reseed itself.

Blum-Blum-Shub

• Blum-Blum-Shub generator is provably secure assuming that it is hard to factor large numbers into primes.
• Choose two large secret primes p and q, both congruent to 3 mod 4. Compute product n=pq. For given seed s_0 compute sequence s_0,s_1,... by recursive formula s_{i+1}=s_i^2 modulo n. From this sequence of numbers in the range 0,1,...,n-1 create a sequence of pseudorandom bits b_i=s_1 modulo 2. 
• Picking up prime numbers similar size as RSA keys are means that you will almost certainly be doing modular arithmetic with software library, because your internal state will be too big to be a primitive type. This makes the Blum-Blum-Shub algorithm impractical for many uses, as it is excruciatingly slow in practice. 
• At higher level there are two kind of attack types against PRNG which has high quality seed. The first attack type is cryptanalytic attack and the second is attack against the internal state. Attacker prefer to attack the PRNG rather then computationally intense attacks such as computing inverse logarithms and factoring large numbers.
• Direct cryptanalytic attack is any attack that can distinguish generator output from truly random data. This kind of attack is applicable to most uses of PRNGs. If PRNG is passed statistical tests it is likely that it will not be vulnerable to cryptanalytic attacks.
• Input based attack occurs when an attacker is able to control or use the knowledge of the PRNG's inputs to cryptanalyze the PRNG. This kind of attack type can further be divided into known input, replayed input and chosen input attacks. Using many sources of entropy in the seed values can minimize this attack.
• State compromise extension attack succeeds when the attacker is able to recover unknown PRNG outputs from before internal state was compromised, or recover outputs from after the PRNG collected a sequence of inputs which the attacker cannot guess. A backtracking attack uses the compromise of the generator's internal state at some point to learn previous PRNG outputs. A permanent compromise attack occurs if, once an attacker compromises internal state at some point, all future and past internal state values are vulnerable to attack. Iterative guessing attack uses knowledge of internal state at some point and the intervening PRNG outputs, to learn internal state at a later point when the inputs collected during this span of time are guessable by the attacker. A meet in the middle attack is essentially a combination of an iterative guessing attack with backtracking attack. These are the reasons why internal state should be well guarded.
• If an attacker knows the seeds, which the generator uses to create random outputs, and gets all outputs it can test using different generators, which one creates the same output. An attacker can also use direct guess of starting point, use precomputed list of likely starting points or initialize PRNG with closely related inputs.
• Different kind of cryptanalytic attack types are properly described in .

• One way to generate coming random numbers might be Michal Zalewski's spoofing set based on phase space analysis. Not tested, but seemed to work.

### Methodology

Collecting (sampling)

Analysing

• The quality of random numbers and random number generators can be measured in a variety of ways. One common method is to compute the information density, or entropy, in a series of numbers. The higher the entropy in a series of numbers is, the more difficult it is to predict a given number on the basis of the preceding numbers in the series. A sequence of good random numbers will have a high level of entropy, although a high level of entropy does not guarantee randomness.
• There are many statistical test suites to check PRNGs. Those tests will not always give similar results, but they give direction. Comparing results they may give you a hint if used PRNG is good or not. Here are a few examples of statistical tests:

Diehard

• DIEHARD  is a battery of tests for randomness developed by George Marsaglia. Tests are

• Birthday spacing test

• Choose m birthdays in a year of n days. List the spacings between the birthdays. If j is the number of values that occur more than once in that list, then j is asymptotically Poisson distributed with mean m3/(4n). Experience shows n must be quite large, say n > =218, for comparing the results to the Poisson distribution with that mean. This test uses n=224 and m=29, so that the underlying distribution for j is taken to be Poisson with lambda=227/(226)=2. A sample of 500 j's is taken, and a chi-square goodness of fit test provides a p value. The first test uses bits 1-24 (counting from the left) from integers in the specified file. Then the file is closed and reopened. Next, bits 2-25 are used to provide birthdays, then 3-26 and so on to bits 9-32. Each set of bits provides a p-value, and the nine p-values provide a sample for a KSTEST.

• Overlapping 5-permutation test

• This is the OPERM5 test. It looks at a sequence of one million 32-bit random integers. Each set of five consecutive integers can be in one of 120 states, for the 5! possible orderings of five numbers. Thus the 5th, 6th, 7th,... numbers each provide a state. As many thousands of state transitions are observed, cumulative counts are made of the number of occurrences of each state. Then the quadratic form in the weak inverse of the 120x120 covariance matrix yields a test equivalent to the likelihood ratio test that the 120 cell counts came from the specified (asymptotically) normal distribution with the specified 120x120 covariance matrix (with rank 99). This version uses 1,000,000 integers, twice.
• Binary rank test for 31x31 matrices

• The leftmost 31 bits of 31 random integers from the test sequence are used to form a 31x31 binary matrix over the field {0,1}. The rank is determined. That rank can be from 0 to 31, but ranks < 28 are rare, and their counts are pooled with those for rank 28. Ranks are found for 40,000 such random matrices and a chi-square test is performed on counts for ranks 31,30,29 and < =28.

• Binary rank test for 32x32 matrices

• A random 32x32 binary matrix is formed, each row a 32-bit random integer. The rank is determined. That rank can be from 0 to 32, ranks less than 29 are rare, and their counts are pooled with those for rank 29. Ranks are found for 40,000 such random matrices and a chi-square test is performed on counts for ranks 32,31,30 and < =29. :

• Binary rank test for 6x8 matrices

• From each of six random 32-bit integers from the generator under test, a specified byte is chosen, and the resulting six bytes form a 6x8 binary matrix whose rank is determined. That rank can be from 0 to 6, but ranks 0,1,2,3 are rare; their counts are pooled with those for rank 4. Ranks are found for 100,000 random matrices, and a chi-square test is performed on counts for ranks 6,5 and < =4.

• Bitstream test

• The file under test is viewed as a stream of bits. Call them b1,b2,... . Consider an alphabet with two "letters", 0 and 1 and think of the stream of bits as a succession of 20-letter "words", overlapping. Thus the first word is b1b2...b20, the second is b2b3...b21, and so on. The bitstream test counts the number of missing 20-letter (20-bit) words in a string of 221 overlapping 20-letter words. There are 220 possible 20 letter words. For a truly random string of 2^21+19 bits, the number of missing words j should be (very close to) normally distributed with mean 141,909 and sigma 428. Thus (j-141909)/428 should be a standard normal variate (z score) that leads to a uniform [0,1) p value. The test is repeated twenty times.

• Overlapping pairs sparse occupancy test

• The OPSO test considers 2-letter words from an alphabet of 1024 letters. Each letter is determined by a specified ten bits from a 32-bit integer in the sequence to be tested. OPSO generates 221 (overlapping) 2-letter words (from 221+1 "keystrokes") and counts the number of missing words---that is 2-letter words which do not appear in the entire sequence. That count should be very close to normally distributed with mean 141,909, sigma 290. Thus (missingwrds-141909)/290 should be a standard normal variable. The OPSO test takes 32 bits at a time from the test file and uses a designated set of ten consecutive bits. It then restarts the file for the next designated 10 bits, and so on.

• Overlapping quadruples sparse occupancy test

• The test OQSO is similar, except that it considers 4-letter words from an alphabet of 32 letters, each letter determined by a designated string of 5 consecutive bits from the test file, elements of which are assumed 32-bit random integers. The mean number of missing words in a sequence of 221 four-letter words, (221+3 "keystrokes"), is again 141909, with sigma = 295. The mean is based on theory; sigma comes from extensive simulation.

• DNA test

• The DNA test considers an alphabet of 4 letters C,G,A,T, determined by two designated bits in the sequence of random integers being tested. It considers 10-letter words, so that as in OPSO and OQSO, there are 220 possible words, and the mean number of missing words from a string of 221 (overlapping) 10-letter words (2^21+9 "keystrokes") is 141909. The standard deviation sigma=339 was determined as for OQSO by simulation. (Sigma for OPSO, 290, is the true value (to three places), not determined by simulation.

• Count the 1's test on a stream of bytes

• Consider the file under test as a stream of bytes (four per 32 bit integer). Each byte can contain from 0 to 8 1's, with probabilities 1,8,28,56,70,56,28,8,1 over 256. Now let the stream of bytes provide a string of overlapping 5-letter words, each "letter" taking values A,B,C,D,E. The letters are determined by the number of 1's in a byte: 0,1,or 2 yield A,3 yields B, 4 yields C, 5 yields D and 6,7 or 8 yield E. Thus we have a monkey at a typewriter hitting five keys with various probabilities (37,56,70,56,37 over 256). There are 55 possible 5-letter words, and from a string of 256,000 (overlapping) 5-letter words, counts are made on the frequencies for each word. The quadratic form in the weak inverse of the covariance matrix of the cell counts provides a chi-square test Q5-Q4, the difference of the naive Pearson sums of (OBS-EXP)2/EXP on counts for 5- and 4-letter cell counts.

• Count the 1's test for specific bytes

• Consider the file under test as a stream of 32-bit integers. From each integer, a specific byte is chosen , say the leftmost bits 1 to 8. Each byte can contain from 0 to 8 1's, with probabilities 1,8,28,56,70,56,28,8,1 over 256. Now let the specified bytes from successive integers provide a string of (overlapping) 5-letter words, each "letter" taking values A,B,C,D,E. The letters are determined by the number of 1's, in that byte: 0,1,or 2 --- > A, 3 --- > B, 4 --- > C, 5 --- > D, and 6,7 or 8 --- > E. Thus we have a monkey at a typewriter hitting five keys with with various probabilities: 37,56,70, 56,37 over 256. There are 55 possible 5-letter words, and from a string of 256,000 (overlapping) 5-letter words, counts are made on the frequencies for each word. The quadratic form in the weak inverse of the covariance matrix of the cell counts provides a chi-square test: Q5-Q4, the difference of the naive Pearson sums of (OBS-EXP)2/EXP on counts for 5- and 4-letter cell counts.

• Parking lot test

• In a square of side 100, randomly "park" a car---a circle of radius 1. Then try to park a 2nd, a 3rd, and so on, each time parking "by ear". That is, if an attempt to park a car causes a crash with one already parked, try again at a new random location. (To avoid path problems, consider parking helicopters rather than cars.) Each attempt leads to either a crash or a success, the latter followed by an increment to the list of cars already parked. If we plot n: the number of attempts, versus k: the number successfully parked, we get a curve that should be similar to those provided by a perfect random number generator. Theory for the behavior of such a random curve seems beyond reach, and as graphics displays are not available for this battery of tests, a simple characterization of the random experiment is used: k, the number of cars successfully parked after n=12,000 attempts. Simulation shows that k should average 3523 with sigma 21.9 and is very close to normally distributed. Thus (k-3523)/21.9 should be a standard normal variable, which, converted to a uniform variable, provides input to a KSTEST based on a sample of 10.
• Minimum distance test

• It does this 100 times: choose n=8000 random points in a square of side 10000. Find d, the minimum distance between the (n2-n)/2 pairs of points. If the points are truly independent uniform, then d2, the square of the minimum distance should be (very close to) exponentially distributed with mean 0.995 . Thus 1-exp(-d^2/.995) should be uniform on [0,1) and a KSTEST on the resulting 100 values serves as a test of uniformity for random points in the square. Test numbers=0 mod 5 are printed but the KSTEST is based on the full set of 100 random choices of 8000 points in the 10000x10000 square.

• 3Dspheres test

• Choose 4000 random points in a cube of edge 1000. At each point, center a sphere large enough to reach the next closest point. Then the volume of the smallest such sphere is (very close to) exponentially distributed with mean 120pi/3. Thus the radius cubed is exponential with mean 30. (The mean is obtained by extensive simulation). The 3DSPHERES test generates 4000 such spheres 20 times. Each min radius cubed leads to a uniform variable by means of 1-exp(-r^3/30.), then a KSTEST is done on the 20 p-values.
• Squeeze test

• Random integers are floated to get uniforms on [0,1). Starting with k=2^31=2147483647, the test finds j, the number of iterations necessary to reduce k to 1, using the reduction k=ceiling(k*U), with U provided by floating integers from the file being tested. Such j's are found 100,000 times, then counts for the number of times j was < =6,7,...,47, > =48 are used to provide a chi-square test for cell frequencies.

• Overlapping sums test

• Integers are floated to get a sequence U(1),U(2),... of uniform [0,1) variables. Then overlapping sums,S(1)=U(1)+...+U(100), S2=U(2)+...+U(101),... are formed.The S's are virtually normal with a certain covariance matrix. A linear transformation of the S's converts them to a sequence of independent standard normals, which are converted to uniform variables for a KSTEST. The p-values from ten KSTESTs are given still another KSTEST.
• Runs test

• It counts runs up, and runs down, in a sequence of uniform [0,1) variables, obtained by floating the 32-bit integers in the specified file. This example shows how runs are counted: .123,.357,.789,.425,.224,.416,.95 contains an up-run of length 3, a down-run of length 2 and an up-run of (at least) 2, depending on the next values. The covariance matrices for the runs-up and runs-down are well known, leading to chi-square tests for quadratic forms in the weak inverses of the covariance matrices. Runs are counted for sequences of length 10,000. This is done ten times. Then repeated.
• Craps test

• It plays 200,000 games of craps, finds the number of wins and the number of throws necessary to end each game. The number of wins should be (very close to) a normal with mean 200000p and variance 200000p(1-p), with p=244/495. Throws necessary to complete the game can vary from 1 to infinity, but counts for all > 21 are lumped with 21. A chi-square test is made on the no.-of-throws cell counts. Each 32-bit integer from the test file provides the value for the throw of a die, by floating to [0,1), multiplying by 6 and taking 1 plus the integer part of the result.

• NOTE: Most of the tests in DIEHARD return a p-value, which should be uniform on [0,1) if the input file contains truly independent random bits. Those p-values are obtained by p=F(X), where F is the assumed distribution of the sample random variable X---often normal. But that assumed F is just an asymptotic approximation, for which the fit will be worst in the tails. Thus you should not be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p's of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p > .975 means that the RNG has "failed the test at the .05 level". Such p's happen among the hundreds that DIEHARD produces, even with good RNG's. So keep in mind that " p happens".

• Generator to be tested must produce a binary file of 10 or 11 megabytes, that Diehard expects.

ENT

• ENT  applies various tests to sequences of bytes stored in files and reports the results of those tests. ENT does not test generator, only sequence of bytes. ENT tests the following things:

• Entropy

• The information density of the contents of the file, expressed as a number of bits per character.
• Chi-square test

• The chi-square test is the most commonly used test for the randomness of data, and is extremely sensitive to errors in pseudorandom sequence generators. The chi-square distribution is calculated for the stream of bytes in the file and expressed as an absolute number and a percentage which indicates how frequently a truly random sequence would exceed the value calculated. We interpret the percentage as the degree to which the sequence tested is suspected of being non-random. If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is "almost suspect".
• Arithmetic mean value

• This is simply the result of summing the all the bytes in the file and dividing by the file length. If the data are close to random, this should be about 127.5. If the mean departs from this value, the values are consistently high or low.
• Monte Carlo value for Pi

• Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a "hit". The percentage of hits can be used to calculate the value of Pi. For very large streams (this approximation converges very slowly), the value will approach the correct value of Pi (Pi=3.1415926535...) if the sequence is close to random.
• Serial correlation coefficient

• This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (which can be positive or negative) will, of course, be close to zero. A non-random byte stream will yield a serial correlation coefficient on the order of 0.5. Wildly predictable data such as uncompressed bitmaps will exhibit serial correlation coefficients approaching 1.
• NIST's statistical test suite

• NIST's statistical test suite  contains 16 statistical test for random number generators. To run all those test, user needs to know, how PRNG works. Test requires for example sequence length in bits and block length. This test suite is not suitable for black box testing when generator is unknown. Test suite does the following tests:

• Frequency (monobits) test

• The focus of the test is the proportion of zeroes and ones for the entire sequence. The purpose of this test is to determine whether that number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence. The test assesses the closeness of the fraction of ones to ½, that is, the number of ones and zeroes in a sequence should be about the same.
• Test for frequency within a block

• The focus of the test is the proportion of zeroes and ones within M-bit blocks. The purpose of this test is to determine whether the frequency of ones is an M-bit block is approximately M/2.
• Runs test

• The focus of this test is the total number of zero and one runs in the entire sequence, where a run is an uninterrupted sequence of identical bits. A run of length k means that a run consists of exactly k identical bits and is bounded before and after with a bit of the opposite value. The purpose of the runs test is to determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. In particular, this test determines whether the oscillation between such substrings is too fast or too slow.
• Test for longest run of ones in a block

• The focus of the test is the longest run of ones within M-bit blocks. The purpose of this test is to determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence. Note that an irregularity in the expected length of the longest run of ones implies that there is also an irregularity in the expected length of the longest run of zeroes. Long runs of zeroes were not evaluated separately due to a concern about statistical independence among the tests.
• Random binary matrix rank test

• The focus of the test is the rank of disjoint sub-matrices of the entire sequence. The purpose of this test is to check for linear dependence among fixed length substrings of the original sequence.
• Discrete Fourier transform (spectral) test

• The focus of this test is the peak heights in the discrete Fast Fourier Transform. The purpose of this test is to detect periodic features (i.e., repetitive patterns that are near each other) in the tested sequence that would indicate a deviation from the assumption of randomness.
• Non-overlapping (aperiodic) template matching test

• The focus of this test is the number of occurrences of pre-defined target substrings. The purpose of this test is to reject sequences that exhibit too many occurrences of a given non-periodic (aperiodic) pattern. For this test and for the Overlapping Template Matching test, an m-bit window is used to search for a specific m-bit pattern. If the pattern is not found, the window slides one bit position. For this test, when the pattern is found, the window is reset to the bit after the found pattern, and the search resumes.
• Overlapping (periodic) template matching test

• The focus of this test is the number of pre-defined target substrings. The purpose of this test is to reject sequences that show deviations from the expected number of runs of ones of a given length. Note that when there is a deviation from the expected number of ones of a given length, there is also a deviation in the runs of zeroes. Runs of zeroes were not evaluated separately due to a concern about statistical independence among the tests. For this test and for the Non-overlapping Template Matching test, an m-bit window is used to search for a specific m-bit pattern. If the pattern is not found, the window slides one bit position. For this test, when the pattern is found, the window again slides one bit, and the search is resumed.
• Maurer's universal statistical test

• The focus of this test is the number of bits between matching patterns. The purpose of the test is to detect whether or not the sequence can be significantly compressed without loss of information. An overly compressible sequence is considered to be non-random.
• Lempel-Ziv Complexity test

• The focus of this test is the number of cumulatively distinct patterns (words) in the sequence. The purpose of the test is to determine how far the tested sequence can be compressed. The sequence is considered to be non-random if it can be significantly compressed. A random sequence will have a characteristic number of distinct patterns.
• Linear complexity test

• The focus of this test is the length of a generating feedback register. The purpose of this test is to determine whether or not the sequence is complex enough to be considered random. Random sequences are characterized by a longer feedback register. A short feedback register implies non-randomness.
• Serial test

• The focus of this test is the frequency of each and every overlapping m-bit pattern across the entire sequence. The purpose of this test is to determine whether the number of occurrences of the 2m m-bit overlapping patterns is approximately the same as would be expected for a random sequence. The pattern can overlap.
• Approximate entropy test

• The focus of this test is the frequency of each and every overlapping m-bit pattern. The purpose of the test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence.
• Cumulative sum (cusum) test

• The focus of this test is the maximal excursion (from zero) of the random walk defined by the cumulative sum of adjusted (-1, +1) digits in the sequence. The purpose of the test is to determine whether the cumulative sum of the partial sequences occurring in the tested sequence is too large or too small relative to the expected behavior of that cumulative sum for random sequences. This cumulative sum may be considered as a random walk. For a random sequence, the random walk should be near zero. For non-random sequences, the excursions of this random walk away from zero will be too large.
• Random excursions test

• The focus of this test is the number of cycles having exactly K visits in a cumulative sum random walk. The cumulative sum random walk is found if partial sums of the (0,1) sequence are adjusted to (-1, +1). A random excursion of a random walk consists of a sequence of n steps of unit length taken at random that begin at and return to the origin. The purpose of this test is to determine if the number of visits to a state within a random walk exceeds what one would expect for a random sequence.
• Random excursions variant test

• The focus of this test is the number of times that a particular state occurs in a cumulative sum random walk. The purpose of this test is to detect deviations from the expected number of occurrences of various states in the random walk.
• Strange attractors

• Michal Zalewski presented in his article of Strange attractors and TCP/IP sequence number analysis  one way to look at the randomness of believed random numbers. He presented a method of constructing spoofing sets that is based on phase space analysis and the presence of function attractors. Results of his examinations should not be considered as a reliable metric for comparing the relative strength of initial sequence number generators of different operating systems, but tests give a direction. Interesting 3D pictures helps to conclude if random number can be predicted. There are also available attractor analysis tools for users to test own random numbers.

Identifying

• Identifying the used PRNG scheme based on collected and analyzed data
• Making a wild-guess [e.g. platform provided PRNG?]

Predicting

• Chosen samples (case study)

### Results

• Sequence numbers, which came with strange attractors and TCP/IP sequence number analysis , were tested using another tester, ENT . It gave a bit different results. Chi-square test by ENT cut down most of the operating systems.

• Strange attractors and TCP/IP sequence number analysis showed, that at that moment OpenBSD-current (2.9 release) and Linux 2.2.1x got relatively good results. Same operating systems tested using ENT results were slightly different. OpenBSD-current passed also ENT's tests. Linux did not pass those tests, because it failed badly in chi-square test. Solaris 7 (tcp_strong_iss=2) based on the same source IP also passed all ENT's tests but failed in attractor analysis, which gave big 66 % attack feasibility. Results of Solaris 7 (tcp_strong_iss=2) with changing source IP addresses were following: Attractor analysis gave 0 % attack feasibility ant therefore it seemed to generate satisfactorily random numbers. ENT giving 97,5 % chi-square value said that there is reason to believe that sequence numbers are predictable. Only two of tested operating systems passed clearly all ENT's tests and those were OpenBSD-current and Solaris 7 (tcp_strong_iss=2) based on using the same source IP.
• Unfortunately it was not possible to test strange attractor's data with other test suites, because there was not enough material for DIEHARD and operations of generators were unknown, so the NIST's test suite was not suitable. Test suite requires sequence length in bits and block length, which were unknown.

Case study

• Sampling analysis
• (Optionally identification)
• (Optionally prediction)
• Attractor analysis data tested using ENT
 Operating system Attack feasibility % by attractor analysis Entropy Chi-square Arithmetic mean value Monte Carlo value for PI Serial correlation coefficient Conclusions by ENT AIX-4.3 100,00 6,73 0,01 97,65 3,576 -0,1595 predictable BSDI-3.0 18,00 7,99 0,01 112,54 3,23 -0,003258 predictable BSDI-4.0 18,00 7,999 0,01 127,156 3,163 0,001726 predictable FreeBSD-4.0 1,0 7,99 0,01 126,19 3,1369 -0,0048 predictable HPUX10-0 100,00 6,119 23 92,34 3,57 -0,0837 predictable HPUX10-1 100,00 7,26 0,01 99,2011 3,53 -0,144 predictable HPUX10-2 100,00 7,2 0,01 98,48 3,587 -0,1158 predictable HPUX10-20 100,00 6,72 0,01 96,07 3,5778 -0,1354 predictable HPUX11-0 100,00 7,999 0,01 128,89 3,11 -0,001185 predictable IOS-12.0 2,06 7,9988 0,01 127,69 3,1319 0,001591 predictable IOS-12.0-2 20,00 7,9980 0,50 127,215 3,1267 0,000573 predictable IOS-short - 7,97 0,01 128,0176 3,068 -0,019268 predictable IRIX-short 20,00 7,86 0,01 106,28 3,5265 -0,081983 predictable Linux-2.2 below 0,05 7,8776 0,01 132,25 2,79 -0,001192 predictable Linux-2.2.5 - 7,8296 0,01 110,40 3,5 -0,057136 predictable Linux-2.4 - 7,987736 0,01 127,8807 3,0957 0,000402 predictable MacOS 89,00 7,997 0,01 128,81 3,10029 -0,003215 predictable MacOS-X-1.0 17,00 7,99 0,01 127,93 3,1638 0,001190 predictable OpenBSD-2.8 3,05 7,99 0,01 127,4406 3,11474 0,000127 predictable OpenBSD-current 0,00 7,995790 75,00 127,0587 3,144 0,001798 random Solaris7 100,00 7,9971 0,01 125,74 3,1688 0,001726 predictable Solaris7-ISS-0 100,00 6,711 0,01 154,45 2,384 -0,122412 predictable Solaris7-ISS-1 2,08 7,994 0,01 126,434 3,1111 -0,000024 predictable Solaris7-ISS-2 100,00 7,9993 99,00 127,37 3,1344 -0,002421 suspect Solaris7-ISS-2-IP1 66,00 7,999006 25,00 127,5993 3,135656 -0,002193 random Solaris7-ISS-2-IP2 0,00 7,998483 97,50 127,3678 3,13485 0,001699 suspect Solaris8 100,00 7,999442 0,10 127,19 3,1397 -0,00648 predictable Solaris8-ISS-0 100,00 7,769873 0,01 127,054 3,1598 -0,000752 predictable Solaris8-ISS-1 100,00 7,999 0,01 126,43 3,164 -0,000889 predictable Solaris8-ISS-2 - 7,999651 97,50 127,41 3,1404977 0,000511 suspect Windows-2000 12,08 7,995 0,01 131,10 3,0252 -0,003199 predictable Windows-95 100,00 6,91377 0,01 92,23 3,622 -0,1127 predictable Windows-98SE 100,00 6,9595 0,01 92,15 3,640 -0,1169 predictable Windows-NT4-SP3 97,00 6,435 0,01 65,92 3,998 -0,002068 predictable Windows-NT4-SP6 - 6,75 0,01 92,09 3,75 -0,1054 predictable Windows-NT4-SP6a 15 7,994 0,01 128,50 3,14915 -0,001630 predictable

### Analysis

Good(?) generators

• Yarrow 

• Tiny  pp. 237-238
• Blum-Blum-Shub? (cryptographically secure, but period is not maximal! Statistically good or bad?)
• NIST approved random number generators: 

• o DSS, FIPS 186-2 appendix 3.1 o DSS, FIPS 186-2 appendix 3.2 o ANSI X9.31 appendix A o ANSI X9.62 annex A.4.

Bad generators

• Linear congruential generators are susceptible to state attacks because they essentially give their entire internal state with every output. Also if coefficients of linear feedback shift register are chosen careless, period length will not be long enough. In that case PRNG's vulnerability does not depend on seeds. If the seed is known, even PRNGs such as Tiny and Blum-Blum-Shub are completely predictable. Linear congruential PRNGs are the simplest and possibly worst of the mathematical library PRNG algorithms.
• One analysis of PRNGs is included in Zalewski's Strange Attractors and TCP/IP sequence number analysis . The analysis does not identify PRNGs but looks at the randomness of sequence numbers.



• John Viega, Gary ?McGraw. Building Secure Software, How to Avoid Security Problems the Right Way. (2002). Addison-Wesley.



• John Kelsey, Bruce Schneier, Niels Ferguson. (August 1999). "Yarrow-160: Notes on the Design and Analysis of the Yarrow Cryptographic Pseudorandom Number Generator". Counterpane. http://www.counterpane.com/yarrow.html.



• Paul Garret. Making, Braking Codes An Introduction to Cryptology. (2001). Practice Hall, Inc..









• An interdivisional project in the Information Technology Laboratory at NIST. (page last update: December 8, 2000). "Random Number Generation and Testing". http://csrc.nist.gov/rng/index.html.



?CategoryNewSite