Hello Riastradh,
as discussed on IRC, Appveyor recently started failing the stochastic tests of LogLogistic on 32-bit builds: https://github.com/torproject/tor/pull/576 https://ci.appveyor.com/project/torproject/tor/builds/20897462
I managed to reproduce the breakage by cross-compiling Tor and running the tests with wine, using this script of ahf: https://github.com/ahf/tor-win32/
Here are my findings:
The following two test cases are breaking 100% reproducibly:
ok = test_stochastic_log_logistic_impl(M_E, 1e-1); ok = test_stochastic_log_logistic_impl(exp(-10), 1e-2);
The breakage seems to be because of the beta parameter. In particular, it seems like the test will break with any beta <= 0.26, and will succeed with a beta >= 0.27. The space in between is still unclear ;)
I haven't managed to figure out what's actually offending the test but I can reproduce this so I can do some digging if you have any ideas.
FWIW, I haven't noticed any other stochastic test breakage.
Cheers!
George Kadianakis desnacked@riseup.net writes:
Hello Riastradh,
as discussed on IRC, Appveyor recently started failing the stochastic tests of LogLogistic on 32-bit builds: https://github.com/torproject/tor/pull/576 https://ci.appveyor.com/project/torproject/tor/builds/20897462
I managed to reproduce the breakage by cross-compiling Tor and running the tests with wine, using this script of ahf: https://github.com/ahf/tor-win32/
Here are my findings:
The following two test cases are breaking 100% reproducibly:
ok = test_stochastic_log_logistic_impl(M_E, 1e-1); ok = test_stochastic_log_logistic_impl(exp(-10), 1e-2);
Aaaaand here are some updates:
I followed your suggestion and turned the tests into deterministic by sampling from a deterministic randomness source. I verified that all the crypto_rand() call outputs are now the same between the 32-bit mingw build and the 64-bit gcc one: https://github.com/asn-d6/tor/commit/3d8c86c2f08ad2cc7ed030bbf8e11b110351f5c...
I then focused on the test_stochastic_log_logistic_impl(M_E, 1e-1) test case and tried to figure out where the deviation was happening between 64-bit gcc and 32-bit mingw... That took a while but I finally got some figures. Check out my commit that adds some printfs as well: https://github.com/asn-d6/tor/commit/36999c640fe824ab9fb85b5d2cd15017a97a532...
So using the output from that that commit I noticed that many times log_logistic_sample() would give different outputs in these two systems. In particular sometimes the x value would differ even with the same (s, p0) pair, and other times the x value would be the same but the final alpha*pow(x,1/beta) value would differ. Even tho this is the case, the test would only fail for certain values for beta (as mentioned in my previous email).
I now inline various such failure cases and one correct one:
Case #1 (same x, different sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 3122729323, p0: 0x1.68d18a44b82fbp-1 x: 0x1.d686a1e7fa35p+0 alpha*pow(x, 1/beta): 0x1.2affd5bfff433p+10
gcc-64: beta: 0x1.999999999999ap-4 s: 3122729323, p0: 0x1.68d18a44b82fbp-1 x: 0x1.d686a1e7fa35p+0 alpha*pow(x, 1/beta): 0x1.2affd5bfff434p+10
Case #2 (same x, different sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 738208646, p0: 0x1.a1ecd53def5d3p-2 x: 0x1.068987864c2aep-2 alpha*pow(x, 1/beta): 0x1.bfba380255bb8p-19
linux: beta: 0x1.999999999999ap-4 s: 738208646, p0: 0x1.a1ecd53def5d3p-2 x: 0x1.068987864c2aep-2 alpha*pow(x, 1/beta): 0x1.bfba380255bb9p-19
Case #3 (different x, different sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 95364755, p0: 0x1.575b5ea720e3cp-1 x: 0x1.fb7949976ab04p+0 alpha*pow(x, 1/beta): 0x1.3e605e169e8cbp+11
gcc-64: beta: 0x1.999999999999ap-4 s: 95364755, p0: 0x1.575b5ea720e3cp-1 x: 0x1.fb7949976ab03p+0 alpha*pow(x, 1/beta): 0x1.3e605e169e8c5p+11
Case #4 (different x, different sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 2082443965, p0: 0x1.530a8759113bp-2 x: 0x1.42989e50ac641p+2 alpha*pow(x, 1/beta): 0x1.b724d48bf0f6cp+24
gcc-64: beta: 0x1.999999999999ap-4 s: 2082443965, p0: 0x1.530a8759113bp-2 x: 0x1.42989e50ac64p+2 alpha*pow(x, 1/beta): 0x1.b724d48bf0f5ep+24
Case #5 (different x, different sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 443038967, p0: 0x1.b0124b971bbf3p-4 x: 0x1.1f5b72f5f6a3ep+4 alpha*pow(x, 1/beta): 0x1.143a16cdae94fp+43
gcc-64: beta: 0x1.999999999999ap-4 s: 443038967, p0: 0x1.b0124b971bbf3p-4 x: 0x1.1f5b72f5f6a3fp+4 alpha*pow(x, 1/beta): 0x1.143a16cdae958p+43
Case #6 (same sample value):
mingw-32: beta: 0x1.999999999999ap-4 s: 2932701594, p0: 0x1.b407f600e6d87p-1 x: 0x1.7bb183ccc47efp-1 alpha*pow(x, 1/beta): 0x1.181016f03c09p-3
gcc-64: beta: 0x1.999999999999ap-4 s: 2932701594, p0: 0x1.b407f600e6d87p-1 x: 0x1.7bb183ccc47efp-1 alpha*pow(x, 1/beta): 0x1.181016f03c09p-3
Date: Wed, 12 Dec 2018 17:47:05 +0200 From: George Kadianakis desnacked@riseup.net
I followed your suggestion and turned the tests into deterministic by sampling from a deterministic randomness source. I verified that all the crypto_rand() call outputs are now the same between the 32-bit mingw build and the 64-bit gcc one: [...]
Although it is worthwhile to separate (a) a nondeterministic entropy source from (b) a deterministic PRNG with seeds you can save to get reproducible results...it turns out all that was a red herring, and the culprit was an incautious linear interpolation I had written in the binning for stochastic tests. Fortunately the problematic code is limited to the tests and does not run in the tor daemon!
Lerping -- computing t |---> a + (b - a)*t = (1 - t)*a + t*b -- is deceptive in its apparent simplicity, and it just happened that no trouble arose in binary64 arithmetic, which is what you get on just about every machine on the planet _except_ 32-bit x86 in binary80 mode -- which mingw32 uses. (And the m68k floating-point coprocessor, but one doesn't see those much these days.) Fortunately you don't have to use Windows to run tests -- `gcc -m32' on an amd64 system should do just as well, and you can make sure you're in binary80 mode with
uint32_t cwsave, cw = 0x037f; asm volatile("fnstcw %0; fldcw %1" : "=m"(cwsave) : "m"(cw)); ... asm volatile("fldcw %0" : : "m"(cwsave));
(On NetBSD, this is spelled `fp_prec_t p = fpsetprec(FP_PE); ...; fpsetprec(p)', but I'm not sure anyone else has adopted that.)
Binary80 arithmetic tickled a problem in the lerp used for binning -- a problem which I took some pains to avoid in sample_uniform_interval (see the long theorem in the comments there). By `binning' I mean dividing the support of each distribution into histogram bins, starting at the 1st percentile and ending at the 99th percentile, and uniformly spaced in the support with a linear interpolation to pick the bin boundaries.
We then use the CDF or SF to compute the probability a sample falls into each bin, count a lot of samples, and use a psi test to assess whether the sample counts are close enough to the expected counts. (There's nothing magic about uniform spacing; I just pulled it out of my arse, and it helped me catch bugs.)
When beta is small, the log-logistic distribution has a very fat tail, so the 1st percentile and the 99th percentile are very far away (e.g., for alpha=e and beta=1/10, the 1st percentile is ~2.5e-20 and the 99th percentile is ~3e+20). The naive lerp I had written in bin_cdfs to compute an equispaced partition between these two points overshot in this case in binary80 arithmetic, which led to an intermediate NaN computation, which made the test fail 100% of the time.
The attached patch uses a slightly less naive lerp. It's not necessarily monotonic but that doesn't really matter here since we're only examining 100 histogram bins -- overcounting by the width of one or two consecutive floating-point numbers is essentially inconsequential at this scale for these tests.
Date: Thu, 13 Dec 2018 03:25:30 +0000 From: Taylor R Campbell campbell+tor-dev@mumble.net
Binary80 arithmetic tickled a problem in the lerp used for binning -- [...]
Correction: while making the lerp less naive addresses this problem, it also arises only when binary80 arithmetic and binary64 arithmetic are mixed, which you get on 32-bit x86 in C double with the x87 unit in binary80 mode so (a) double is binary64, but (b) intermediate expressions are evaluated in binary80.
The attached program shows this, by putting either one (bad) or two (good) intermediate expressions in double variables. On x86, if you compile it with -mfpmath=387 (default on 32-bit), you'll see the bad result, a negative answer; if you compile it with -mfpmath=sse (default on 64-bit), you'll see only good results, zero. Convert everything to use long double (and %Le) instead so that all the arithmetic and intermediate quantities are binary80, and it's fine.
% cc -o loser -mfpmath=387 loser.c && ./loser bad -2.45760000000000000e+04 good 0.00000000000000000e+00 % cc -o loser -mfpmath=sse loser.c && ./loser bad 0.00000000000000000e+00 good 0.00000000000000000e+00
(This is why I don't like x87 and the automagic evaluation of expressions in higher precision...)