Date: Wed, 12 Dec 2018 17:47:05 +0200 From: George Kadianakis desnacked@riseup.net
I followed your suggestion and turned the tests into deterministic by sampling from a deterministic randomness source. I verified that all the crypto_rand() call outputs are now the same between the 32-bit mingw build and the 64-bit gcc one: [...]
Although it is worthwhile to separate (a) a nondeterministic entropy source from (b) a deterministic PRNG with seeds you can save to get reproducible results...it turns out all that was a red herring, and the culprit was an incautious linear interpolation I had written in the binning for stochastic tests. Fortunately the problematic code is limited to the tests and does not run in the tor daemon!
Lerping -- computing t |---> a + (b - a)*t = (1 - t)*a + t*b -- is deceptive in its apparent simplicity, and it just happened that no trouble arose in binary64 arithmetic, which is what you get on just about every machine on the planet _except_ 32-bit x86 in binary80 mode -- which mingw32 uses. (And the m68k floating-point coprocessor, but one doesn't see those much these days.) Fortunately you don't have to use Windows to run tests -- `gcc -m32' on an amd64 system should do just as well, and you can make sure you're in binary80 mode with
uint32_t cwsave, cw = 0x037f; asm volatile("fnstcw %0; fldcw %1" : "=m"(cwsave) : "m"(cw)); ... asm volatile("fldcw %0" : : "m"(cwsave));
(On NetBSD, this is spelled `fp_prec_t p = fpsetprec(FP_PE); ...; fpsetprec(p)', but I'm not sure anyone else has adopted that.)
Binary80 arithmetic tickled a problem in the lerp used for binning -- a problem which I took some pains to avoid in sample_uniform_interval (see the long theorem in the comments there). By `binning' I mean dividing the support of each distribution into histogram bins, starting at the 1st percentile and ending at the 99th percentile, and uniformly spaced in the support with a linear interpolation to pick the bin boundaries.
We then use the CDF or SF to compute the probability a sample falls into each bin, count a lot of samples, and use a psi test to assess whether the sample counts are close enough to the expected counts. (There's nothing magic about uniform spacing; I just pulled it out of my arse, and it helped me catch bugs.)
When beta is small, the log-logistic distribution has a very fat tail, so the 1st percentile and the 99th percentile are very far away (e.g., for alpha=e and beta=1/10, the 1st percentile is ~2.5e-20 and the 99th percentile is ~3e+20). The naive lerp I had written in bin_cdfs to compute an equispaced partition between these two points overshot in this case in binary80 arithmetic, which led to an intermediate NaN computation, which made the test fail 100% of the time.
The attached patch uses a slightly less naive lerp. It's not necessarily monotonic but that doesn't really matter here since we're only examining 100 histogram bins -- overcounting by the width of one or two consecutive floating-point numbers is essentially inconsequential at this scale for these tests.