Random numbers in C++

Throughout school and early programming experience, I always got random numbers like this:

double irand = (double) rand() / RAND_MAX;

This is the method provided by all the textbooks and tutorials for beginners. However, almost all of them include a mention of the fact that the rand() function uses a relatively weak pseudo-random number generator, and that this method should not be used in problems requiring industrial strength random number generation. So my questions:

How good is the rand() function? Does it provide an equal probability of returning all positive integers less than RAND_MAX?

Is there any reasonably painless way to get an industrial strength random number generator into my program?

rand() is industrial strength. Unless you really know what you’re doing, or you have an external source of entropy, you’re only going to make things worse by trying to improve on it.

Wel, if you want something really “industrial strength”, you won’t want to be using a software pseudorandom number generator anyway. (cf. von Neumann’s famous quote on the subject :slight_smile: ) You’ll want to use true random numbers from an outside source. Here’s a list of facilities provided in the OpenBSD operating system, for example.

http://pintday.org/hack/crypto/random.shtml

Here’s a pseudorandom number generator that looks better than “rand”:

http://www.math.keio.ac.jp/~matumoto/emt.html

And finally, here’s a site that generates some true random bits online

http://www.fourmilab.ch/hotbits/

Enjoy!

On preview: ultrafilter: do you have a cite for rand being good? I too had always heard that rand wasn’t that great. No disrespect to you intended, of course.

No cite right now (I’ll dig tomorrow), but I’ve always heard that rand() is good enough for most people. If you’re doing security work, you’ll want something stronger, but otherwise there’s no need.

I’m not a programmer but I have to ask a slight hijack question here while we discuss random numbers. But I’ve heard it true that a computer can’t generate a completely random number on its own. Is that true or not?

That’s pretty much true. No matter what algorithm you program in to the computer, a random number function with no “outside influence” will essentially boil down to a function that takes one number as input, and outputs another number. And since the instructions are fixed, inputting the same number will always give the same output. So if you know the first “seed” input value, you will always be able to get the same sequence of random numbers back.

Moreover, software random number generators must have a “period” after which they will start repeating themselves, which real random numbers obviously don’t. Good algorithms have a long period, on the order of 2^32 numbers (I think), but still, it’s not really random.

Oh, then I agree with you. I’d definitely say it’s good enough for most everyday applications - I use it myself for stuff like rolling dice in games, etc. So, no hurry for the cite (though it would be interesting to see how rand does stack up).

I guess the real question here is: ITR champion, what are you using the random numbers for?

Schneier says they’re pretty bad, but doesn’t comment on the specifics. It’s surprisingly difficult to find much online.

There is a whole hierarchy of various random number generators in Unix, and to be perfectly clear, rand() is at the very bottom of the barrel. Nothing worse in any Unix library.

Check your system libraries for other random number generators, e.g., randu, random, etc.

And for a seed, use a hash of the whole system clock, not just the low bits. Use the highest precision clock call available.

I remember that this was really obvious with QBasic, but once you used “Randomize Timer”, the numbers became pretty “random”, imho. That said, I still don’t believe true randomness exists, so anything that “seems” to be random like a person rolling a die or a randomize timer function is good enough for me.

For this particular project, I’m using them to create a variant of a random graph known as a scale-free network. (You can read about the concept here.) My current network sizes are only about 50,000 nodes and 150,000 edges, so I don’t think there’s any real reason to be concerned about the precision of the random number generation. I may move up to much larger networks at some point. Mostly it was just an FYI-type quesiton, though.

Curiously, the C Standard doesn’t state how rand() needs to be implemented. So the function can (and apparently does) vary between implementations.

But in general, linear congruential random number generators (definition in the link below) aren’t very good, and rand() is almost always implemented that way.

More information than you’d ever want to know about this topic is at: this link .

Technically, all numbers produced by deterministic process (which might mean all processes above the quantum level – I’m way out of my depth at that point) are “pseudorandom” rather than random. But the usual suspect is not nearly so complicated as you probably guess. This one’s from the paper above:



SEED = (1103515245*SEED + 12345) mod 2**31 
X = SEED 

Note that given any number in the sequence, you can trivially (assuming you know where 32-bit integers overflow) produce the next one: practically a definition of non-randomness. The constants above aren’t magic: others will work, but if they’re not carefully chosen you get very small periods (number of generations until a repeat), or obviously sequential numbers (consider replacing 1103515245 and 12345 with 1 & 1, respectively, for example).

Also interesting, if you know (or guess) that a generator is of this form, you can feed it a couple numbers and COMPUTE the constants used from the sequence. This is one of the “bad” parts: consider such an algorithm used in a casino game, for example. You (as the casino owner) really, really, really don’t want someone watching a couple hands of video poker and being able to predict the cards!

A more common problem is that people misuse this: Consider a 16-bit system, where rand() produces numbers from 0 to 32767. You’re a programmer who needs ransom numbers from one to 10000. Easy enough:


X = rand() % 1000;

Admit it, we’ve all done this. But note that 32767 isn’t evenly divisible by 1000,
which means that even if rand() produces numbers that are equally likely (which it likely does – often each number is produced once in a generation), in YOUR code, numbers from 0 - 767 are more likely (by one time in 33) than numbers from 768 - 999. That’s a relatively small difference, but generate enough numbers and people will start noticing that your computer game’s aliens tend to veer left.

More accurate is:


X = (int)(double(rand()) / double(RAND_MAX) * 1000);

Although there’s still a potential problem here, since 1000 is now a possibility (i.e. you’re generating 0 to 1000, not 0 to 999. Assuming people notice, they then try the hack:


X = (int)(double(rand()) / double(RAND_MAX) * 1000) % 1000;

Which is actually worse: 0 is now TWICE as likely as any other number in the sequence, since 0 maps to it, and so does 1000.

It’s actually pretty hard to get this right, surprisingly so for a single-line algorithm.

A last note, since I’ve rambled a long time: the “Numerical Recipes in <your favorite computer language>” books provide algorithms and explanations for a number of alternative random number generators.

Dangnabbit. The instance of “10000” above should be “1000”, just like in all the other examples. Sorry.

I agree that rand() is generally “good enough,” but I’d also like to emphasize this:

If you don’t set the seed manually, you’ll get the same sequence of numbers each time you run the program. It’s great when you’re building the program, as it’s nice to get the same results every time, but not so great in a finished project.

(Note: I haven’t used rand() in C++ lately, but I assume it works the same way as in C)

I understand that every Intel CPU has for some time included a random number generator based on… the thermal value of a resistor? Something like that. Don’t know of any libraries that use it, and I’d have to be really sure of its random uniformity across all chip variants before I used it in anything important.

The Boost library has a huge array of random number generators, it’s quite impressive: www.boost.org

Krueger numbers: 65539 This one is esoteric, but notorious. It is the generator for a sequence of random numbers that Donald Knuth (The Art of Computer Programming, vol. 2) claimed “its very name RANDU is enough to bring dismay into the eyes and stomachs of many computer scientists!”

When I was in grad school I needed to do some heavy-duty Monte Carlo calculations. These require high-quality RNGs. Not cryptographic-grade, but each data point ended using more than 10[sup]8[/sup] random numbers. Obviously, system RNGs with periods of only 2[sup]32[/sup]~10[sup]9[/sup] would not be appropriate.

I did some literature research, and ended up with a paper by Marsaglia and Zaman (Comp. in Phys. 8, 117 (1994), sorry couldn’t find an online version). I coded it, and it worked well. It combines a linear congruential and a Fibonacci sequence for an overall period of around 2[sup]192[/sup]~10[sup]57[/sup]. BTW, all the better RNGs carefully combine two or more classes of simpler generators. This avoids the weaknesses of any single generator.

Here is a copy of the core of the C++ code. It’s written by me and is public domain, so feel free to use or abuse it (a version is published in my dissertation, but you’d never find it). The kernel is from Marsaglia and Zaman (recoded into C++). Each object of Random_t defines a separate instance of the RNG. The state can be initialized in its entirety, with a single integer, or with bytes pulled from /dev/urandom. An object can generate a uniformly distributed integer (0 to 2[sup]32[/sup]-1), a uniformly distributed double (0 <= X < 1.0), or a normally distributed double (mean = 0.0, variance = 1.0).

Class header file Random.h:



// -- Random.h

#ifndef __Random_h__
#define __Random_h__

#include <vector>

class Random_t
 {
  public:
  typedef unsigned int Int_t; // MUST be 32 bits
  typedef std::vector<Int_t> State_t;

  private:
  Int_t
    n, // multiplicative modulo seed
    v, w, x, y, z, // Fibonacci modulo seeds
    c; // Fibonacci carry bit

  public:
  Random_t(void); // initialize from /dev/urandom
  Random_t(Int_t S); // initialize from one integer
  Random_t(const State_t& S); // initialize full state

  Int_t operator()(void) {return next();} // random integer
  double Uniform(void); // uniform distribution, 0.0 <= X < 1.0
  double Normal(void); // normal distribution, mean = 0.0, variance = 1.0

  State_t State(void) const; // returns current state vector

  private:
  Int_t next(void); // internal state updater
 };

#endif // __Random_h__

// -- Random.h


Class code file Random.cpp:



// -- Random.cpp

#include <cmath>
#include <fstream>
#include "Random.h"

namespace // private namespace
{

template<typename Type_t>
void ReadBinary(std::istream& S, Type_t& X)
 {
  S.read((char*)(&X),sizeof(Type_t));
 }

} // private namespace

Random_t::Random_t(void)
 {
  std::ifstream fin("/dev/urandom");
  ReadBinary(fin,n);
  ReadBinary(fin,v); ReadBinary(fin,w); ReadBinary(fin,x);
  ReadBinary(fin,y); ReadBinary(fin,z); ReadBinary(fin,c);
 }

Random_t::Random_t(Int_t S): n(S), v(n+2), w(v+3), x(w+5), y(x+7), z(y+11),
  c(z+13)
 {
  for(size_t i=0; i<16; ++i) z=next(); // mix things for a better state
 }

Random_t::Random_t(const State_t& S)
 {
  State_t s=S;
  while(s.size()<7) s.push_back(s.size()); // fill out if state is too small
  n=s[0]; v=s[1]; w=s[2]; x=s[3]; y=s[4]; z=s[5]; c=s[6];
 }

double Random_t::Uniform(void)
 {
  static const double base=1.0/(Int_t(-1)+1.0);
  return (next()*base+next())*base; // we need more than 32 bits
 }

double Random_t::Normal(void)
 {
  static bool isCache=false;
  static double cache=0.0;
  if(isCache)
   {
    isCache=false;
    return cache;
   }
  else
   {
    double v1, v2, vv;
    do
     {
      v1=Uniform()*2.0-1.0;
      v2=Uniform()*2.0-1.0;
      vv=v1*v1+v2*v2;
     }
      while(vv>=1.0 || vv==0.0);
    vv=sqrt(-2.0*log(vv)/vv);

    isCache=true; cache=v1*vv;
    return v2*vv;
   }
 }

Random_t::State_t Random_t::State(void) const
 {
  State_t s(7);
  s[0]=n; s[1]=v; s[2]=w; s[3]=x; s[4]=y; s[5]=z; s[6]=c;
  return s;
 }

Random_t::Int_t Random_t::next(void)
// from G. Marsaglia, A. Zaman, Comp. in Phys. 8, 117 (1994)
// combines two independent pseudo-random number generators
// for a net period of about 2^192 (about 10^58)
// DO NOT TINKER with this function!!!  (it breaks easily)
 {
  Int_t s=v+c; c=(y<s); s=y-s; if(c) s-=10;
  v=w; w=x; x=y; y=z; z=s;
  n=n*69069+1013904243;
  return s+n;
 }

// -- Random.cpp


And some test code TestRandom.cpp:



// -- TestRandom.cpp

#include <cstdlib>
#include <iostream>
#include "Random.h"

int main(int ArgNum, const char* Args[])
 {
  std::size_t num=10;
  if(ArgNum>1) num=std::atol(Args[1]);

  Random_t rnd1, rnd2, rnd3;

  std::cout.precision(21);
  for(; num; --num)
    std::cout<<rnd1()<<"	"<<rnd2.Uniform()<<"	"<<rnd3.Normal()<<std::endl;

  return 0;
 }

// -- TestRandom.cpp


It even gets messier: the above code introduces a very serious bias if the range is close to (but not equal to) RAND_MAX. Say RAND_MAX is 127, and you want number from 0-99. Since you have 128 values mapped to 100, you will have 28 values which have twice the probability of showing up. The only way to remove this bias entirely is to discard any result greater than the highest integer multiple of the maximum of your range. Then you can use the remainder operator (% – note that this is not the modulo operator) without error, like so:


int RandVal(int max_value)
{
  //returns range 0-(max_value-1)
  int max_rand = int( floor(RAND_MAX/double(max_value))*max_value );

  int x;
  while ((x = rand()) >= max_rand) { }
  x %= max_value;

  return x;
} 

Of course, you have no guarantees on worst case time for the function to return.

I think there’s only a very few people who understand pseudo-random number generators (how they work and where they can be used meaningfully). One of the them is Donald Knuth. You can find some good work of his here including source code for some pseudo-random generators.