Random Uniform Experiment for the Masses

Updated 2019-06-20

Abstract

Anyone can fire up a random number generator (RNG) of their choice and produce a massive amount of quality random numbers for private consumption. There are, however, applications where this approach does not work: namely, when there is a practical need to convince the masses that the given sequence of numbers is truly random, in a sense that its generation was unbiased by practical and/or political concerns, and was not doctored to accomplish some specific goal. This document offers a blueprint for a distributed network of independent agents, who employ encryption in order to combine their individual contributions into a single random sequence in a way that makes it impractical to game the outcome. As long as just one of the individual contributors is honest and secure, the result is truly random in the same sense as above, and this lack of bias can be trusted by any member of the public who happens to trust that at least one contributor is honest and secure. A description of the process is fleshed out with enough detail to implement a fully automated, user-facing distributed network on top of a bare-bones stack: any GNU- or BSD-like operating system which provides shell scripting, strong encryption, a Web browser, and a Web server.

Motivation

There are some applications where it is desirable to produce a sequence of random bits in such a way that the public at large will find it easier to be convinced of the lack of bias. One way to do that is by enlisting the help of trusted public figures and/or experts, who will monitor the random experiment and testify to its outcome. This, however, is very slow, very expensive, and open to an attack from the party that conducts the experiment. One way to sidestep the last issue is by conducting several independent experiments, and then mixing the outcomes, but that in turn exacerbates the costs, while creating a whole new racing problem: a contributor who can see some/all other entries can attempt to doctor an entry in a way that biases the final result in a specific way. To address these concerns, we will outline a procedure whereas independent participants submit independent entries, and then mix them into the final outcome. Strong symmetric encryption is used by each participant to conceal his/her entry until the moment when all participants finished their submission process. Strong public key encryption enables authentication of entries, and provides the assurance that no entries can be altered after the symmetric keys are disclosed.

One application of such publicly trusted randomness has to do with "magic seeds" within cryptographic tools and PRNGs. Known also as nothing-up-my-sleeve numbers, these are somewhat or totally arbitrary constants which are supposed to create a generic (at the very least, unbiased) starting point. The behavior of the cryptographic algorithm will evolve from that point, possibly guided also by an external seed. There is a well justified concern that all such magic seeds are suspect in a sense that they may have been doctored by their authors to look generic, but are in fact designed to produce a highly non-generic state. This secret knowledge can be used by the creators of the algorithm for breaking encryption and/or biasing PRNGs in a way only known to them. A publicly trusted network of independent RNGs can alleviate these concerns if magic seeds can be derived from its output in a cryptographically secure way. Far from being a purely theoretical concern, magic seeds are found in many heavily used PRNGs, starting with /dev/urandom in many Linux kernels.

Another (rather obvious) application of publicly trusted randomness is the standard gambling scenario, whereas several parties want to flip one coin, but neither party trusts the others to do it right. Using the procedure described here, each participant flips his/her own personal coin, and then they mix the results into the final outcome trusted by all of them. Alternatively, they can outsource the experiment to a distributed network they all happen to trust.

Experiment Description and Facilitation

We will work with a number of independent participants, who will contribute their versions of a random sequence, which will be mixed later to produce the final outcome. Participants have to be in agreement about the length of the binary sequence, and each participant has to be able to verify public key signatures of all other participants. In practice, each participant can be a computer connected to the Internet. All of the logic described below can be easily implemented using pretty much any OS shell, such as bash, so that entire process is automated and repeated at regular intervals.

We will also use a host to facilitate communication. The host will not generate or sign anything, and its only role is simplifying the communication network topology, as well as the explanations that go with it. The host can be replaced in a straightforward way by a distributed communication protocol, if some application requires that.

The outcome of the experiment is a binary sequence of agreed-upon length which is supposed to be a nothing-up-my-sleeve number, along with the crypto-signed history of the experiment.

  1. Each participant generates a random binary sequence of some agreed-upon length, encrypts it with a symmetric cipher, and publishes the ciphertext. The generation step is unique to each participant and falls beyond the scope of this article. For a symmetric cipher, one can use gpg, which provides more than 10 of them. Publishing can be done via a Web server like lighttpd, or submitted to the experiment host using curl.
  2. The host collates ciphertext submissions into a single file phase1.tar and publishes it using a Web server.
  3. Each participant downloads and saves phase1.tar with something like wget, verifies that the archive contains their entry with userland tools like bash and diff, signs phase1.tar with a public key cipher (gpg has several), and publishes the signature.
  4. The host collects all signatures, collates phase1.tar and its signatures into a single file phase2.tar, and publishes it. No new entries can be added after this point, or else gaming outcome becomes possible. Note that phase2.tar contains all encrypted entries, as well as every participant's signature of this specific collection of encrypted entries, so any following change to any of the entries or the collection as a whole will be world-detectable.
  5. Each participant saves phase2.tar, extracts phase1.tar and its signatures, makes sure that it's the same file as before, and that all participants have signed it correctly. If everything looks OK, then a participant publishes the symmetric key.
  6. The host collects symmetric keys, decrypts binary sequences, mixes them (XOR will do), and publishes the final outcome and phase2.tar in a public-friendly form along with all of the symmetric keys, so that the entire experiment can be verified by a member of the public who also happens to trust some or all of the participants.

Discussion

The experiment is extended in time, and it requires a lot of communication, cooperation, and coordination. Because of that, it is rather brittle when it comes to participants doing something weird or dropping out midway. In most cases, the only thing to do when an error is detected is to toss everything and start over, which is why full automation is highly desirable. For example, if one of the participants refuses to disclose the symmetric key at stage (5), then the final result becomes suspect: having seen all the other keys, this participant has the luxury to either disclose or not disclose their entry, therefore gaining an ability to choose between two different final outcomes, regardless of how entries are mixed. The best way out is to discard everything, and do whatever is necessary to improve reliability for the next round.


In general, all tools used for this project have to be as cryptographically strong as possible. The symmetric cipher, for example, has to allow for long keys, and the implementation of the process should set the minimal key length. This cipher should also be highly resistant to the attempts of doctoring a ciphertext that can be decrypted into several different plaintexts using several different keys.

At the same time, some issues can be fixed on the fly without compromising the integrity of the result. For example, if the entry is decrypted successfully, but its size is wrong, then it is OK to trim/pad it in an agreed-upon way, since no entry can ever be disqualified based on its contents.

At least some of the ciphers implemented by gpg have magic seeds in them. It is far from clear how much practical danger the use of these algorithms poses. Ideally, the experiment described above should not rely on any magic seeds within either the crypto-tools or the RNGs those tools are using, unless these magic seeds are themselves publicly trusted: that is, obtained via a procedure similar to RUNE.


Ideas presented in this article are a mere sublimation of those contained in an unpublished 2012 proposal by yours truly, which described a manual, human-driven version of the same uniform experiment. This publication was precipitated by an arrival of a similar 2019 project, League of Entropy (LoE), which attempts to provide a similar type of robot-generated trusted randomness. The League of Entropy is powered by drand "beacons", which play a role similar to that of our participants. There does not seem to be a dedicated communication host, so the communication is less centralized than what is described above, not that it makes any practical difference. (For all we know, LoE's communication protocol assigns or will assign in the future a temporary host for each round, just to speed up and/or simplify information exchange, and no one will notice.) It is worthwhile though to look at some of the significant differences between the approaches used by RUNE and LoE.

Perhaps the greatest differences are found within the technical implementation: LoE appears to be significantly more complicated, making use of secure communication protocols, shared secrets, etc. A fair look at these details would require a lot of research and a whole another article, so we will stop here and instead focus on issues facing the end users.

One difference worthy of mentioning is in the way the outcomes are published. LoE allows to check the validity of the result, but only via the beacon software, drand. We contend it would be better to publish the results in a text form, with crypto-signatures easily verifiable by hand or via shell scripting, using off-the-shelf tools such as gpg. Inserting yet another piece of software and/or a communication protocol between the end users and the verification bits may be beneficial for some use cases, but we struggle to see how this is useful for an RNG that produces dozens of bytes per minute. To put it more bluntly: a mere user of the RNG should not have to run a "beacon", and for the volumes in question, there is hardly an excuse for not publishing everything in a human-readable form, which is also formatted for easy batch-processing. At present, from the user's point of view at least, it is not very clear exactly how the verification of the results is accomplished at LoE, which is precisely what happens when a crucial part of the verification process is folded into an ad hoc software client+server.

Another interesting difference is that RUNE, unlike LoE, stresses how important the identities of the participants/beacons are, as well as the way the public perceives them. We could argue it is nearly pointless for the League to enlist the help of willing strangers, because the public has no reason to trust agents they don't know, and so their contribution to the "trust pool" is not that great. And there's risk too: a larger pool of rogue participants may find it easier to exploit certain kinds of software bugs and weaknesses in the experimental design.

Instead, to make the process more trustworthy, we should probably choose participants carefully in a way that placates specific strata of end users. For example, if lotteries and other gambling enterprises express a lot of interest, then perhaps an independent agency tasked with regulating gambling should be included as a participant. Similarly, a leading cryptographer, or a leading physical scientist specializing in simulations will likely appease their respective fields more effectively than anyone else.


Linked below for historical reasons (mostly), the original 2012 text is basically unedited. (Sorry!)

rune23 proposal