Fast Splittable Pseudorandom Number Generators

September 12, 2013
2:50 pm - 4:00 pm
Halligan 102
Speaker: Guy L. Steele, Jr., Oracle Labs

Abstract

Sometimes it is useful for a computer program to be able to "roll the dice" or "flip a coin"—that is, behave as if can make a random choice, or more generally, a series of random choices. One approach is to provide a computer with hardware that uses some physical process (such as radioactive decay) to make such choices. Another approach is to use a (deterministic) mathematical algorithm to generate a sequence of numbers that "appear" to be random—that is, the sequence is highly likely to satisfy many statistical criteria that would be expected of a truly random sequence. Such a deterministically produced sequence is said to be "pseudorandom". This approach has a long history, going back to the work of John von Neumann in the 1940s. Many algorithms have been proposed, with various trade-offs between speed and quality.

Most algorithms for generating pseudorandom sequences have been sequential. Today, with multicore computers and multithreaded GPUs quite common, an interesting problem is to be able to use many tasks to collectively generate pseudorandom sets or sequences. It does not suffice simply to have each task use a copy of the same algorithm, because this will generate duplicate values and, worse yet, duplicate subsequences. Additional mathematical theory is required to guarantee that a collection of tasks (probably) produces results of the same quality as a single talk using a sequential algorithm.

In this talk we describe object-oriented "splittable" generators of pseudorandom numbers having two methods: one to generate a new value in a pseudorandom sequence, as is customary, and another to "split" the generator into two generators that can be used in parallel. The goal is that a set of generator objects produced by recursive splitting can be used collectively to generate a sequence of pseudorandom numbers having the same good statistical properties as a sequence produced by a single such generator.

We have developed such an algorithm suitable for use in the Java collection libraries as an improvement over the existing class java.util.Random: the class SplittableRandom is roughly an order of magnitude faster than java.util.Random when used as a single generator in sequential code, and furthermore has much better parallel speedup behavior than java.util.Random when used to produce parallel streams to be operated upon by map and reduce methods. As for quality, sequences produced by SplittableRandom pass the DieHarder test suite (sequences produced by java.util.Random do not). While we believe SplittableRandom does not produce sequences of cryptographic quality, it appears to be a good alternative to java.util.Random for "everyday" use, such as in Monte Carlo methods, and we hope to include it in JDK8. In this talk we discuss our original inspiration for this algorithm (a 2012 paper by Leiserson, Schardl, and Sukha) and the sequence of engineering steps and intuitive leaps we took to hill-climb our way to our final version. (Joint work with Doug Lea, Claire Alvis, and Christine Flood.)

BIO Guy L. Steele Jr. is a Software Architect at Oracle. He received his A.B. in applied mathematics from Harvard College (1975), and his S.M. and Ph.D. in computer science and artificial intelligence from M.I.T. (1977 and 1980). He has also been an assistant professor of computer science at Carnegie-Mellon University; a member of technical staff at Tartan Laboratories in Pittsburgh, Pennsylvania; and a senior scientist at Thinking Machines Corporation in Cambridge, Massachusetts. He joined Sun Microsystems in 1994 as a Distinguished Engineer and was named a Sun Fellow in 2003. Sun Microsystems was acquired by Oracle in 2010, and he is now a member of Oracle Labs.

He is author or co-author of five books: _Common Lisp: The Language_, _C: A Reference Manual_, _The Hacker's Dictionary_, _The High Performance Fortran Handbook_, and _The Java Language Specification_. All are still in print. He has been praised for an especially clear and thorough writing style in explaining the details of programming languages.

The Association for Computing Machinery awarded him the 1988 Grace Murray Hopper Award and named him an ACM Fellow in 1994. He was elected a Fellow of the American Association for Artificial Intelligence in 1990. He led the team that received a 1990 Gordon Bell Prize honorable mention for achieving the fastest speed to that date for a production application: 14.182 Gigaflops. He was also awarded the 1996 ACM SIGPLAN Programming Languages Achievement Award. In 2001 he was elected to the National Academy of Engineering of the United States of America. In 2002 he was elected to the American Academy of Arts and Sciences. In 2011 he was named an IEEE Fellow.

He has served on accredited standards committees X3J11 (C language) and X3J3 (Fortran), and served as chairman of X3J13 (Common Lisp). He was also a member of the IEEE committee that produced the IEEE Standard for the Scheme Programming Language, IEEE Std 1178-1990. He was a representative to the High Performance Fortran Forum, which produced the High Performance Fortran specification in May, 1993.

At Oracle Labs, he is responsible for research in language design and implementation strategies, and architectural and software support for programming languages.