Thursday, July 31, 2008

Uniform random numbers

Random numbers are used extensively in modern computing, for example they are used in testing programs, simulations, cryptology and in randomized algorithms like randomized primality testing.

True randomness on a computing machine is impossible, because the numbers are generated by an algorithm, but it is sufficient to produce numbers that appear to be random, these are called pseudo random numbers, they satisfy many of the properties of true random numbers. Algorithms used to generate random numbers are called pseudo random number generators (PRNG). To be sure that the numbers are "random enough", they must pass some statistical tests.

Uniformly distributed random numbers
A uniformly distributed random sequence A1, A2, ..., Am generated from X1, X2, ..., Xn satisfies:
  1. Ai is equally likely to be X1, X2, ..., Xn, i = 1, 2, ..., m.
  2. The average of the generated numbers is (∑i = 1 Ai)/ m.
After generating n numbers, the previous two properties hold, more stronger properties do not:
  1. The sum of any two consecutive numbers is equally likely to be even or odd.
  2. When n numbers are generated, there should be duplicates in the sequence.
Unfortunately, the number of times the sum of two consecutive integers is even is always more than the number of times the sum is odd, and the sequence is duplicate free. All pseudo random number generators fail some statistical tests.

The simplest uniform generator: Linear Congruential Generator
This generator was first introduced in 1951, it is a pseudo random number generator, the generated sequence X1, X2, ..., Xn satisfies:

Xi+1 = A[Xi (mod M)]

The (i+1)th number is generated by multiplying its previous number by a constant
A and computing the remainder of the division of the result be another constant M. It is easy to see that all of the generated numbers are less than M. To start generating the numbers, we must start with an initial value called the seed, say X0, but take care that if X0 = 0 then, you have a sequence of zeroes (which is not random).

If M is prime
Notice that if M is prime, then no zeroes are generated because, primes have no non-trivial divisors, let M = 7, A = 5 and X0 = 1, the first 8 numbers are:
5, 25, 20, 30, 10, 15, 5, 25, ...

value Notice that the sequence is repeated after 6 (which is M - 1) numbers are generated, this is called the period of the sequence, the choice of A matters, not all values of A will generated a full-period sequence, but several choices of A are valid. A full-period is achieved of all the non-zero integers less than M are generated.

To generate a long sequence with a full-period, choose M to be a large prime, very popular choice of A and A are 231 - 1 and 48,271.

Overflow problem
If we use 32-bit integers in our computation, overflow will occur after a sufficient number of values are generated, so, we would no longer be sure if a full-period is achieved. A solution to this problem is to use 64-bit integers (called long in Java), but this is inefficient computationally, a slight change in the main equation helps us overcome the overflow problem:

Xi+1 = A[Xi (mod ℑ)] - ℜ⌊Xi / ℑ⌋ + Mδ(Xi)

Where ℑ and ℜ are the quotient and the remainder of M / A.
  • δ(Xi) = 0, if A[Xi (mod ℑ)] - ℜ⌊Xi/ℑ⌋ ≥ 0
  • δ(Xi) = 0, if A[Xi (mod ℑ)] - ℜ⌊Xi/ℑ⌋ ≤ 0
When the difference between the first two terms is less than zero, then δ(Xi) is defined to be 1, and when the difference is greater than 0, δ(Xi) is defined to be 0.

Pseudo code
M ← 19
A ← 5
ℑ ← ⌊M/A⌋
ℜ ← M (mod A)
for i = 1 to i = n step 1
begin
  • Xi ← A×[Xi-1 (mod ℑ)] - ℜ×⌊Xi-1/ℑ⌋
  • if Xi ≤ 0 then Xi ← Xi + M
end

References
  1. Data Structures & Problem Solving Using Java (1998).
  2. Wikipedia.

Wednesday, July 30, 2008

A test for linear independence

Long live Gram-Schmidt algorithm
What is meant by dependence and independence here is linear dependence and independence.
Given a set S of vectors, and a inner product function.
Required to decide: S is an independent set.

I've always thought that deciding whether a set is dependent or not requires computing determinants which is not an easy task, but there is a more reliable method, which is the Gram-Schmidt algorithm.

The whole idea is that, applying the algorithm on a dependent set will always result in a zero vector in the output set, what is left now is to check if the zero vector exists in the output set.

Example
Suppose we are now in the 2-D Euclidean space (R two), given the set S:
S = { v1=(1,2) , v2=(2,4) }
It is easy to see that S is a dependent set because, (2,4) = 2(1,2).
Applying the Gram-Schmidt algorithm on S:
  1. u1 = v1 = (1,2)
  2. u2 = v2 - Proj(u1 , v2)
    = (2,4) - < (1,2) , (2,4) > / < (1,2) , (1,2) >
    = (2,4) - (10 / 5) x (1,2)
    = (2,4) - 2(1,2) = (0,0)
The output set is I = { u1=(1,2) , u2=(0,0) }, which contains a zero vector.
Checking for the zero vector can be done more efficiently.
The expected run-time for the Gram-Schmidt algorithm is cubic, which is the same that of the most efficient determinant algorithm (probably the Gauss Elimination method), but is more reliable and it always works.

Muhammad El-Halaby