My Review of Steven Koonin's Book on Climate

Steven Koonin, who was Secretary for Science, Department of Energy, in the Obama administration recently wrote a somewhat controversial book on climate science with the title Unsettled. I was curious to read what kind of critics a physicist who partly worked in the field had, even if I believe that climate warming is real, and humans have an influence on it. It turns out that some of his remarks regarding models are relevant way beyond climate science, but some other subjects are not as convincing.

The first half of the book explains how the influence of humans on climate is not as obvious as what many graphs suggest, even those published by well known groups such as CSSR or IPCC. A recurring theme is plots presented without context/with the wrong context. Many times, when we extend the time span in the past, a very different picture emerges of cyclic variations with cycles of different lengths and amplitudes. Other times, the choice of measure is not appropriate. Yey another aspect is the uncertainty of the models, which seems too rarely discussed in reports/papers, as well as the necessary model parameters tweaking.

When it comes to CO2, Koonin explains that the decrease in cooling power is much much lower from 400ppm to 800ppm than from 0ppm to 400ppm, because, at 400ppm (the 2019 average) most of the light frequencies are already absorbed. The doubling corresponds only to a 0.8% change in heat intercepting. 1% is however not so small, as it corresponds to 3 degrees Kelvin/Celcius variation. Furthermore, while humans may increase warming with greenhouse gas, they also increase cooling via aerosols, the latter being much less well known and measured (with those taken into account, we arrive at the 1 degree Celcius impact). The author also presents some plots on CO2 on a scale of millions of years to explain that we live in an era of low CO2 ppm. While an interesting fact, this may be a bit misleading since, if we look at at up to 800,000 years back, the CO2 concentration stayed between 200 and 250 ppm, with an explosion only in the recent decades. The climate.gov website provides more context around the latter plot:

In fact, the last time the atmospheric CO₂ amounts were this high was more than 3 million years ago, when temperature was 2°–3°C (3.6°–5.4°F) higher than during the pre-industrial era, and sea level was 15–25 meters (50–80 feet) higher than today.

What I found disappointing is the absence of the famous 1982 plot from Exxon (p. 7), see also the following article with nice red lines for 2019.

The Exxon 1982 PPM Plot.

On this plot, the relation between the temperature increase and the CO2 ppm looks linear, which would be somewhat more alarming.

The conclusion from the first half of the book may be summarized as follows: while climate warming is real, it is not so obvious how much of it is due to humans. The author agrees for maybe around 1 degree Celcius, and suggests it is not necessarily accelerating. The causes are much less clear. There are some interesting subtleties: temperature increases in some places and decreases in others; around cities, the increase mostly due to the expansion of cities (more buildings).

I found the second half to be in contradiction with the first half, although it is clearly not the author’s intent: the second half focuses on how to address humans influence on the climate, and several times, suggests a strong influence of humans emissions on the climate, while the first half of the book was all about minimizing that aspect. This is especially true for carbon emissions, where it is suggested in the first half that additional emisssions will have a comparatively small impact.

The overall message is relatively simple: reducing emissions won’t help much as concentration will only increase in the coming decades (but then wouldn’t it perhaps be so bad to think beyond the coming decades?). Also the scales of emission reductions necessary for a minimum increase (2 degrees) is not realistic at all in the world we live in. Instead, we’d be better off trying to adapt.

Overall, the author denounce the issue of scientific integrity, which is too often absent or not strongly present enough. Having reviewed many papers, and published some in specialized journals, I can’t say I am surprised. Peer review is important, but perhaps not good enough by itself, especially in the short run. Over decades, I think the system may autocorrect itself however.

Github and SSH setup

Github recently moved to support only ssh access via public/private keys. As I use github to host this blog, I was impacted.

The setup on Linux is not very complicated, and relatively well documented on Github itself but all the steps are not listed in a simplistic manner, and some Google search is still required to find out how to setup multiple private keys for various different servers or different repos.

Here is my short guide. In bash, do the following:

ssh-keygen -t ed25519 -C "your_email@example.com"

Make sure to give a specific name, e.g. /home/you/.ssh/id_customrepo_ed25519

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_customrepo_ed25519

Create ~/.ssh/config with

Host github-as-customrepo
        HostName github.com
        User git
        IdentityFile ~/.ssh/id_customrepo_ed25519
        IdentitiesOnly yes

Eventually follow the official doc to test the connection, e.g. ssh -T git@github-as-customrepo. Update the git URL via

git remote set-url origin git@github-as-customrepo:customrepo/customrepo.github.io.git

Covid-19 Fake News

One thing that motivated me for vaccination is the fake news propaganda against the Covid-19 vaccines.

A mild example relates to the data from Israel about the delta variant. This kind of article, with the title “Covid 19 Case Data in Israel, a Troubling Trend”, puts emphasis on the doubts on the effectivness of the vaccine:

the vaccine appears to have a negligible effect on an individual as to whether he/she catches the current strain. Moreover, the data indicates that the current vaccines used (Moderna, Pfizer-BioNTech, AstraZeneca) may have a decreasing effect on reduced hospitalizations and death if one does get infected with the Delta variant.

CNBC (far from the best news site) is much more balanced and precise in contrast:

However, the two-dose vaccine still works very well in preventing people from getting seriously sick, demonstrating 88% effectiveness against hospitalization and 91% effectiveness against severe illness, according to the Israeli data published Thursday.

Much more shocking is a tweet from Prashant Bhushan (whoever that is) which was relayed widely enough that I got to see it

Public Health Scotland have revealed that 5,522 people have died within twenty-eight days of having a Covid-19 vaccine within the past 6 months in Scotland alone! This is 9 times the people who died due to Covid from March 2020 till Jan 2021 in Scotland!

It points to an article in the Daily Expose (worse than a tabloid?) with a link to the actual study. Giving an actual link to the source is a very good practice, even if in this case, the authors of the news article write a lengthy text which says the opposite as the paper they cite as being the source. Indeed, p.27 of the paper says

Using the 5-year average monthly death rate (by age band and gender) from 2015 to 2019 for comparison, 8,718 deaths would have been expected among the vaccinated population within 28 days of receiving their COVID-19 vaccination. This means the observed number of deaths is lower than expected compared with mortality rates for the same time period in previous years…

Why do people write such fake news articles?

I can imagine 3 kinds of motivations:

  1. The author wants to make money/be famous. Writing controversial things, even if completely false, seems to work wonder to attract readership, as long as there is a trend of readers curious about this controverse. The author will gather page views and followers easily, which he may then monetize.
  2. The author is just someone so convinced in his beliefs that everything is interpreted through a very narrow band-pass filter. They will read and pay attention only to the words in the text that will validate their beliefs and forget everything else. I know examples (not on this particular subject) in my own family.
  3. Farm trolls. There are many farm trolls (we always hear of the Russian farm trolls, but they really exist in most countries), where people are paid to spread disinformation on social media. A simple example is the fake reviews on Amazon. Sometimes the goal is to promote a particular political party ideas. Sometimes this political party is financed by a foreign country with specific interests.

The third motivation may act as an amplifier of the first two.

Why are fake news successful?

  • People do not trust the official government message. On the coronavirus, the French government was initially quite bad at presenting the situation, with the famous “masks are useless” (because they did not have any stock), a couple of months before enforcing masks everywhere. Too many decisions are purely political. A good example is the policy to close a school class on a single positive test when the full school is being tested for coronavirus. The false positive rate guarantees that at least 1/3 of the classes will close. Initially the policy was 3 positive tests, which was much more rational, but not as obvious to understand for the masses. My son ended up being one of those false positives, and it wasn’t a particularly nice experience.
  • People do not trust traditional media anymore. As the quality of standard newspapers dwindled with the rise of the internet, people do not trust those much anymore. Also, perhaps fairly, it is not clear if classic tradional media present better news than free modern social media. There are exceptions of course (typically with monthly publications).
  • Social media bias: almost all social networks act as narrow band-pass filter. If you look at one video of cat on youtube, you will end up with videos of cats in your youtube frontpage every day. Controversial subjects/videos are given a considerable space. A main reason for this behavior is that they are optimized for page views/to attract your attention.
  • Generalized attention deficit disorder due to the ubiquity of smart phones. Twitter got famous because of the 127 characters message limit. Newspaper articles are perhaps five times shorter than they used to be in the early 1990s. The time we have is limited and a lot of it is spent reading/writing short updates to various Whatsapp groups or other social network apps.

Conclusion

This was not the only motivator, another strong one is a Russian friend who caught the Covid-19 in July 2021, who was not very pro-vaccination, but thought, once it was too late, “I really should have applied for the vaccine”, as he realized it impacted the whole circle of relations around him. Then, one by one, his whole family caught the Covid. Fortunately, no-one died.

Are traditional banks ready for the 21st century?

This is a follow up on my parents phishing scam.

After several weeks, my parents and I were finally able to have a real world meeting with the advisor at the bank. The advisor is a young woman with an obvious background in sales.

In order to process the paperwork around the reimbursement of the phishing scam, the main issue was the request of the original phishing e-mail by the bank, as my mother had deleted the e-mail. It turns out, that in Thunderbird, deleted e-mails are not deleted on disk until an operation of compactification of the mailbox is done. I was thus able to recover the deleted e-mail. Interestingly, the deleted e-mail was not in the trash file, but directly in the inbox file.

At first, I sent the message source as PDF, along with a screenshot to the bank. The bank asked to forward the e-mail instead. Interestingly, it turns out that once a phishing e-mail is registered in the various anti-phishing filters, it is not possible to forward (directly or in attachment) or copy paste the email as most servers (including the bank servers) will detect the new email as potential phishing and directly reject the e-mail (reject error).

So the standard process of the bank is not really appropriate. It can be followed only during the very short period (more luck than anything) where the phishing has not yet been registered in the anti-phishing filters. I noticed by a simple Google search that many other institution follow a similar process, where they ask to forward the phishing e-mail to a specific e-mail address.

The good news so far is that officially, the advisor pushed the fraud case forward, although it is not entirely clear at this point if it will be processed properly. What shocked me more was the speech of the bank advisor. First, she scared us by telling stories of various credit card frauds (obviously not related to our bank credentials fraud). When she mentioned insurances against credit card frauds, without selling them yet to us, her goal became clear. Then she thought comforting that this kind of fraud only happens once, because once you are confronted to a fraud, you would then be more careful and not make the same mistake anymore, not a convincing argument at all in my views. She indirectly kept blaming my mother for entering her credentials, but was much less clear about the strong authentication at the bank, especially when I explained that, usually, at another bank, I receive a text message with a unique code to enter each time I try to wire money towards a new account, something this bank obviously did not implement properly. Strong authentication is now part of the European law (DSP-2). Then she found every single argument to not close some of the accounts of my parents (they have far too many pointless accounts at this bank). And finally she tried to sell us some special managed account for stock trading (managed by the bank entirely, with the money from my parents), claiming the economy was really great in these COVID-19 times.

Overall there is a real important problem with bank frauds in the 21st century. The system currently in place expects that most people, including 80 years old, must be tech experts, who know how to carefully look at any suspicious email header, such as the from field email address (which could also be better forged than the phishing email my parents received). It expects that everybody carefully checks the http address in the browser (which may also interestingly forged via UTF-8 codes). It relies on somewhat buggy phone applications, which constantly change, and are different for every bank. It expects that most people never click on a link, but then banks themselves send emails with links to promote various products they sell. Banks should really think of adopting a more unified, standard approach to authentication, such as FreeOTP, based on HOTP and TOTP.

My mother became so paranoid that she does not recognize a valid message from the bank as a normal, standard communication from the bank anymore. And we are left me with a bank advisor who is not far of being a fraudster him/herself. One may wonder if things would really be much worse with a bitcoin wallet.

By the time I am writing this, my parents were fully reimbursed by the bank, the phishing e-mail was really all they needed.

Yesterday, Pirates Took Over My Parents Bank Account

This is the story of their hack.

Yesterday evening, I received a call from my mother, frantic over the phone. She says she sees alerts of withdrawals from her bank account on her phone, with new alerts every 5 minutes or so. I try to ask her if she clicked recently on some e-mail related to her bank. She is so panicked that I don’t manage to have an answer. While trying to understand if those alerts are real or not, my wife suggests immediately that my mother should call her bank. On the phone, I ask

  • did you call the bank? You should really call the bank right now if you did not.
  • I don’t have any number to call them, she replies.

After a 5 seconds Google search, my wife finds a number to call in case of phishing at this bank. I start spelling the number to my mother. Before I finish my mother replies

  • ah this number does not work.
  • so you had already this number and tried to call it? I ask
  • yes, it does not work.

She starts shouting and asks me to come over. I hang up and tell her I will call back shortly, when she is calmer. I call her back 1 minute later and tell her I will come over.

In the meantime, my wife attempts to call the number. She stumbles upon some bot asking for bank credentials or alternatively if she wants to speak to a person. She opts for a person, and indeed, ends up with someone hanging up the phone without having the chance to say a word. She then calls the international number, just below that first number. Bingo, someone helpful is here. She asks the person to call my parents.

When I arrive at my parents place, the person from the bank had reached to my mother, and closed internet access to her bank account to the great relief of everyone. Then, I search the computer, her phone, her tablet, for any text message or e-mail that was suspicious that day. I could not find any. She did receive some legitimate emails from the bank, but only alerts around what was happening in the evening. It started with a message of a new device being allowed to access the bank account website.

I then have the idea to look into the browser history. What is the first page of the day being consulted, around noon?

A phishing website with my mother’s bank name as title.

Then I try to find out how she managed to stumble upon that site. I don’t find anything. And when I ask her, it’s not entirely clear at first, there may have been another email she received. She may have clicked on that email. And she may have given various personal information on, what she believed to be, the bank website. Ok, the classic phishing story then. I tell my parents that they know they should never click on a link in an e-mail. My father then asks “but what do you mean exactly by a link?”. I fail to understand the true meaning of the question at the time, and show him what is a link exactly and elaborate further.

It does not stop there. Out of curiosity, I look at the whois information for this phishing web site. It’s on godaddy, there is not much information, except some arab name servers, and the country of registration is Saudi Arabia. When I mention this to my father later on, he says:

  • This might be a coincidence, but yesterday, I gave two checks (of the same bank) to the guy in charge of the repairs (or replacement?) of the water softener I had contacted. He has an arabic name.

I know his tendencies to be “racist”, and tell him it probably does not have any relation. And then we think a bit more about the situation, and there is indeed a strange coincidence, as the phishing e-mail (which I never saw since my mother may have deleted it on purpose) was “from” the same bank, only 1 day later. How could the hackers know my parents bank? They did not receive any phishing e-mail for any other bank. The time and place point towards some sort of targeting.

I go back home, and we further discuss with my wife about all this. And she asks me:

  • If the two are related, how could the water softener guys have the e-mail address of your parents?

Good question. I call my father and ask him. It turns out he had received an e-mail (a SPAM) from the water softener company and replied to it. This is how he contacted them. And perhaps, this explains why he wanted to know more about what “clicking on a link” means. I guess he knows now.

Although I have no real proof, I am quite confident the water softener SPAM and the bank hack are very closely related. I did not think phishing was so “targeted”, and again it is my wife, who told me that targeting is apparently common in phishing. All this targeting makes me think of another story, involving an 80-years old member of the family, where the special forces broke into his house around 3 a.m. a few months back, shouting “target, target”, pointing their big guns, and arresting everybody in the house. But that’s a story for another time.

Remarkable Coincidences, Bad Book?

I stumbled upon a new short book Financial Models in Production from O. Kettani and A. Reghai. A page attracted my attention

A page from Kettani and Reghai's book.

This is the same example as I used on my blog, where I also present the Li’s SOR method combined with the good initial guess from Stefanica. The idea has also been expanded on in Jherek Healy’s book. What is shocking is that, beside reusing my example, they reuse my timing for Jäckel and my implementation is in Google Go, with a timing done on some older laptop. The numbers given are thus highly inconsistent. Of course, none of this is mentioned anywhere, and the book does not reference my blog.

I also find the description of how they improve the implied volatility algorithm (detailed on the next page) to not make much sense. After this kind of stuff, you can’t really trust anything that is in the book…

Worst perhaps, is that the authors advertise their “novel” technique in otherwise decent talks and conferences, such as the one from mathfinance. Here is a quote

Enforced Numerical Monotonicity (ENM) beats Jäckel’s implied volatility calculations – an implied vol calculator that never breaks and automatically fits vanilla option prices.

It is really unfortunate that the world we live in encourages such boasting. Papers always need to present some novel ideas to be published, but there is too often no check on whether the idea actually works, or is worth it. The temptation to make exagerated claims is very high for authors. In the end, it becomes not so easy to sort out the good from the bad.

Bad papers and the roots of high degree polynomials

I was wondering what were exactly the eigenvalues of the Mersenne-Twister random number generator transition matrix. An article by K. Savvidy sparked my interest on this. This article mentioned a poor entropy (sum of log of eigenvalues amplitudes which are greater than 1), with eigenvalues falling almost on the unit circle.

The eigenvalues are also the roots of the characteristic polynomial. It turns out, that for jumping ahead in the random number sequence, we use the characteristic polynomial. There is a twist however, we use it in F2 (modulo 2), for entropy, we are interested in the characteristic polynomial in Z (no modulo), specified in Appendix A of the Mersenne-Twister paper. The roots of the two polynomials are of course very different.

Now the degree of the polynomial is 19937, which is quite high. I searched for some techniques to compute quickly the roots, and found the paper “Efficient high degree polynomial root finding using GPU”, whose main idea is relatively simple: use the Aberth method, with a Gauss-Seidel like iteration (instead of a Jacobi like iteration) for parallelization. Numerical issues are supposedly handled by taking the log of the polynomial and its derivative in the formulae.

When I tried this, I immediately encountered numerical issues due to the limited precision of 64-bit floating point numbers. How to evaluate the log of the polynomial (and its derivative) in a stable way? It’s just not a simple problem at all. Furthermore, the method is not particularly fast either compared to some other alternatives, such as calling eigvals on the companion matrix, a formulation which tends to help avoiding limited precision issues. And it requires a very good initial guess (in my case, on the unit circle, anything too large blows up).

The authors in the paper do not mention which polynomials they actually have tested, only the degree of some “full polynomial” and some “sparse polynomial”, and claim their technique works with full polynomials of degree 1 000 000 ! This may be true for some very specific polynomial where the log gives an accurate value, but is just plain false for the general case.

I find it a bit incredible that this gets published, although I am not too surprised since the bar for publication is low for many journals (see this enlightening post by J. Healy), and even for more serious journals, referees almost never actually try the method in question, so they have to blindly trust the results and focus mostly on style/presentation of ideas.

Fortunately, some papers are very good, such as Fast and backward stable computation of roots of polynomials, Part II: backward error analysis; companion matrix and companion pencil. In this case, the authors even provide a library in Julia, so the claims can be easily verified, and without surprise, it works very well, and is (very) fast. It also supports multiple precision, if needed. For the specific case of the Mersenne-Twister polynomial, it leads to the correct entropy value, working only with 64-bit floats, even though many eigenvalues have a not-so-small error. It is still relatively fast (compared to a standard LinearAlgebra.eigvals) using quadruple precision (128-bits), and there, the error in the eigenvalues is small.

Overall, I found with this method an entropy of 10.377 (quite different from what is stated in K. Savvidy paper), although the plot of the distribution looks similar (but with a different scale: the total number of eigenvalues reported in K. Savvidy paper just does not add up to 19937, which is somewhat puzzling). A naive companion matrix solution led to 10.482. More problematic, if we look directly for the eigenvalues of the Mersenne-Twister transition matrix (Appendix A of the MT paper), we find 10.492, perhaps it is again an issue with the limited precision of 64-bits here.

Distribution of the eigenvalues of the Mersenne-Twister.

Below is the Mersenne-Twister polynomial, expressed in Julia code.

using DynamicPolynomials
import AMRVW
using Quadmath

@polyvar t
n = 624
m = 397
cp = DynamicPolynomials.Polynomial((t^n+t^m)*(t^(n-1)+t^(m-1))^31+(t^n+t^m)*(t^(n-1)+t^(m-1))^30+(t^n+t^m)*(t^(n-1)+t^(m-1))^29+(t^n+t^m)*(t^(n-1)+t^(m-1))^28+(t^n+t^m)*(t^(n-1)+t^(m-1))^27+(t^n+t^m)*(t^(n-1)+t^(m-1))^26+(t^n+t^m)*(t^(n-1)+t^(m-1))^24+(t^n+t^m)*(t^(n-1)+t^(m-1))^23+(t^n+t^m)*(t^(n-1)+t^(m-1))^18+(t^n+t^m)*(t^(n-1)+t^(m-1))^17+(t^n+t^m)*(t^(n-1)+t^(m-1))^15+(t^n+t^m)*(t^(n-1)+t^(m-1))^11+(t^n+t^m)*(t^(n-1)+t^(m-1))^6+(t^n+t^m)*(t^(n-1)+t^(m-1))^3+(t^n+t^m)*(t^(n-1)+t^(m-1))^2+1)

c = zeros(Float128,DynamicPolynomials.degree(terms(cp)[1])+1)
for te in terms(cp)
  c[DynamicPolynomials.degree(te)+1] = coefficient(te)
end
v128 = AMRVW.roots(c)
sum(x -> log(abs(x)),filter(x -> abs(x) > 1, v128))

Disaster Capitalism - Summer Reading Review

Several years ago, I read the book No Logo from Naomi Klein. I did not find it particularly good, but it did raise a valid concern overall. This summer I read Shock Therapy - The rise of disaster capitalism. It suffers from some of the same flaws as No Logo, namely a lot of repetition of the same idea. Here, the underlying idea is that neoliberalism does not work in practice, and often ends up being some kind of corporatism. At the same time, it is suggested that some mild socialism is often much better for the people, although, the latter is not backed by concrete examples in the book. The former is backed by numerous documents, and is analyzed accross time and countries. It starts with Chili under Pinochet, the prototypical example that force is required to impose neoliberalism, then moves around South America in general, with some cases where a strong inflation, may be enough for the people to accept neoliberalism. Then it continues with China under Deng Xiao Ping, which I find a bit too much of a stretch to make a case about any kind of neoliberalism. Russia under Yeltsin is next, and it ends with the war in Irak and the USA.

The most interesting chapters are probably the first one, and the one about Irak. The first chapter explains the creation of the shock therapy treament by psychatrists and how it morphed to become a CIA “interrogation” technique. The chapter on Irak explains in details how people high up in the government reduced the public military staff/budget, and at the same time increased significantly the budget for contractors/external companies, which were closely linked to members of the government. It also makes you understand why it ended up being such a massive failure, even though it was presented as a Marshall plan for the middle east by the American government.

The worst chapters are definitely the introduction and the conclusion. The introduction is just not interesting, and the conclusion is saying that things are becoming better for the socialists, with the changes in South America (Chavez, Ecuador, Bolivia), all of which did not really stand the test of time, since the book was written.

Some annoying facts I found is that Milton Friedman is often made to be some sort of devil and Jeffrey Sachs is portrayed during the first half as his acolyte, and then he appears much more balanced when the author has an actual interview with him. The author however did not rewrite the previous chapters, so there is some sort of inconsistency there.

More annoying is that no positive aspect of neoliberalism ideas is presented, and socialism is often presented as a better alternative, without any proof. There are so many daily life examples that show where socialism is worse than liberalism. Recently, I had to contact a company for issues on my roof. The owner of this small company did not hesitate to say

“It is difficult to find people for this kind of job, because it is not always easy with the cold or the heat. People prefer to work at the city hall, where they are always three to do anything: one to carry the tools, one to watch, and one to actually do the work. In my company, we have to do everything alone”.

Another example that struck me recently is how bad are the school books. Although those are not written by the public workers, they need some sort of approval by those, and it ends up being a very small circle who can actually have those books accepted and distributed to schools. Only a few books are accepted and those will sell in the 10K+ quantity easily. In contrast, holiday children study books, which the parents are free to buy or not at any shop, are amazingly good. Indeed, if they were bad, almost nobody would buy them.

That being said, I don’t think liberalism is always good and socialism always bad either, there is probably a delicate balance somewhere.

More on random number generators

My previous post described the recent view on random number generators, with a focus on the Mersenne-Twister war.

Since, I have noticed another front in the war of the random number generators:

Also, I found interesting that Monte-Carlo simulations run at the Los Alamos National Laboratory relied on a relatively simple linear congruential generator (LCG) producing 24- or 48-bits integers for at least 40 years. LCGs are today touted as some of the worst random number generators, exhibiting strong patterns in 2D projections. Also the period chosen was very small by today’s standards: 7E13.

Regarding the integer to floating point numbers conversion, I remember somewhere reading someone arguing to generate numbers also close to 0 (with appropriate distribution) while most implementations just generate up to \(2^{-32}\) or \(2^{-53}\) (the latter being the machine epsilon). I see one major issue with the idea: if you stumble upon a tiny number (perhaps you’re unlucky) like \(10^{-100}\), then it may distort significantly your result (for example if you call the inverse cumulative normal to generate normal numbers and calculate the mean), because your sample size may not be not large enough to compensate. Perhaps for the same kind of reason, it may be better to use only 32 bits (or less bits). The consequence is that tail events are bound to be underestimated by computers. In a way this is similar to Sobol, which generates with a precision of \(2^{-L}\), for \(2^{L} - 1\) samples.

Finally, some personal tests convinced me that a very long period, such as the one in MT, may not be a good idea, as, in the case of MT, the generator is then slower to recover from a bad state. For Well19337a, it may take 5000 numbers and the excess-0 state is very pronounced (period of \(2^{19937}-1\)). While this is way better than the old MersenneTwister (the newer dSFMT is also much better, around twice slower than Well19937a in terms of recovery), which requires more than 700 000 numbers to recover from the bad state, it may still be problematic in some cases. For example, if you are particularly unlucky, and pick a bad choice of initial state (which may actually have good properties in terms of number of 0 bits and 1 bits) and your simulation is of small size (16K o even 64K numbers), there may be visible an impact of this excess-0 state on the simulation results. For Well1024a (period of \(2^{1024}-1\)), full bit balance recovery takes around 500 numbers and the excess-0 state is much much milder so as to be a non-issue really.

Example with a manufactured by seed to go into excess-0 state.

Below is an example of (manufactured) bad seed for Well19937a, which will lead to excess-0 state after ~1000 numbers, and lasts ~3000 numbers.

     int[] seed = { 1097019443, 321950666, -1456208324, -695055366, -776027098, 1991742627, 1792927970, 1868278530,
                456439811, 85545192, -1102958393, 1274926688, 876782718, -775511822, 1563069059, 1325885775, 1463966395,
                2088490152, 382793542, -2132079651, 1612448076, -1831549496, 1925428027, 2056711268, 108350926,
                1369323267, 149925491, 1803650776, 614382824, 2065025020, 1307415488, -535412012, -1628604277,
                1678678293, -516020113, -1021845340, -793066208, -802524305, -921860953, -1163555006, -1922239490,
                1767557906, -759319941, -245934768, 939732201, -455619338, 1110635951, -86428700, 1534787893,
                -283404203, 227231030, -313408533, 556636489, -673801666, 710168442, 870157845, 1109322330, -1059935576,
                -513162043, 1192536003, -1602508674, 1246446862, 1913473951, 1960859271, 782284340, 122481381,
                -562235323, 202010478, -221077141, -1910492242, -138670306, -2038651468, 664298925, -156597975,
                -48624791, 1658298950, 802966298, -85599391, -406693042, 1340575258, 1456716829, -1747179769,
                1499970781, 1626803166, -687792918, -1283063527, 733224784, 193833403, -230689121, 775703471, 808035556,
                337484408, -518187168, -2136806959, -2115195080, -2137532162, 873637610, 216187601, -477469664,
                -1324444679, 1339595692, 378607523, 2100214039, 701299050, -178243691, 1858430939, 1595015688,
                2139167840, 500034546, -1316251830, 1619225544, 1075598309, 1300570196, -327879940, 414752857,
                -145852840, -1287095704, 355046097, 886719800, -20251033, 1202484569, -96793140, 1846043325, 1192691985,
                928549445, 2049152139, -1431689398, 348315869, -1582112142, -1867019110, 808920631, -342499619,
                -1714951676, 279967346, 385626112, 416794895, -578394455, -1827493006, -2020649044, -396940876,
                937037281, -385129309, -1905687689, -526697401, -1362989274, 1111153207, 27104439, 115923124,
                -1759234934, 495392989, 1848408810, 655641704, 1484391560, 128171526, -91609018, 647891731, 1451120112,
                882107541, 1391795234, -1635408453, 936540423, 564583769, 379407298, -1829214977, 1416544842, 81232193,
                -936231221, 1193495035, 1076101894, 860381190, 728390389, -511922164, -1588243268, -142612440,
                1018644290, 292363137, 475075683, -2071023028, -1224051451, -891502122, 1575411974, -123928662,
                1080946339, 962151951, -1309758596, -558497752, -2126110624, -73575762, -2078269964, -676979806,
                -1165971705, 557833742, -828399554, -1023609625, -482198028, 1700021748, 25284256, -826748852,
                -2139877059, -1280388862, -1521749976, 738911852, -1676794665, -1595369910, -748407377, -709662760,
                680897802, 2094081, -1889225549, -1101409768, -1620191266, 506408464, 1833777989, 244154307,
                -1406840497, -860371799, 1337820797, 614831742, 1965416365, 2044401180, -459642558, -339576217,
                -1599807697, -689958382, 1544444702, 872938368, 871179645, -957732397, 958439335, -770544793,
                -1363785888, -1484683703, 2021823060, -1871739595, -1355536561, -926333946, -1552155978, -171673777,
                993986110, -727417527, 1065139863, 517970706, -453434939, -424362471, 1823459056, -48408572, 863024600,
                190046576, 90264753, 1667010014, -529079929, -1269908431, -2073435303, -1123302722, -1986096205,
                -173411290, -693808986, -1618071944, 990740121, 2120678917, -203702980, -1186456799, -776433190,
                172239859, 126482680, 2048550654, 266718714, 913094204, -937686511, -2096719726, 627687384, 533376951,
                -1413352057, 1900628390, -244457985, 896712029, -1232645079, 1109406070, 1857772786, 86662738,
                -488754308, 360849611, 1187200060, -341213227, 1705204161, -121052077, 1122608367, 2118749875,
                243072462, 204425155, 1386222650, 2037519370, 93424131, -785650065, 45913153, -448515509, -1312863705,
                -834086187, -2101474931, 1478985081, 1288703145, -1705562554, -1758416930, 1440392126, 1783362885,
                279032867, -610479214, 223124643, -367215836, 2140908029, -780932174, 581404379, -1741002899,
                2035577655, -1060511248, 1765488586, -380048770, 1175692479, -1645645388, 1865881815, 2052353285,
                -492798850, -1250604575, -2077294162, 1768141964, 1457680051, -141958370, -1333097647, -285257998,
                -2063867587, 1338868565, -304572592, -1272025276, 1687010269, -1301492878, -931017010, -1303123181,
                -1963883357, 1920647644, 2009096326, 2094563567, 1137063943, -1003295201, -382759268, 1879016739,
                -153929025, -1008981939, -646846913, 1209637755, 1560292706, 725377476, -1457854811, 264360697,
                -197926409, -908579207, -894726681, 194950082, -1631939812, 1620763228, -659722026, 208285727,
                1389336301, -1900616308, 1690406628, 1688632068, -717888847, -1202067733, -2039964596, 1885630763,
                475497380, -488949843, -1679189364, -1358405375, 2132723, -1164703873, -1727721852, 1747612544,
                -885752188, -1450470713, 791640674, 996275741, 397386006, -1977069145, -1841011156, -431458913,
                47865163, 1200765705, 1962743423, 1933688124, -1165500082, -1969953200, 597796878, 1379082884,
                -737292673, 1776141019, 1882257528, -991388501, -1357999809, 497686068, 314237824, -882469634,
                2142408833, -1624234776, -292985482, -412114618, 380982413, -1351123340, 1799246791, 491394003,
                496521378, 1074735076, 1131599274, -1379708840, -256028322, 118705543, 58715272, -449189848, 35299724,
                -1440805390, -893785929, 217256482, 640658194, -1786418454, 1111743603, -2027083091, 2022760758,
                -1001437881, -202791246, 636755388, 1243592208, 1858140407, 1909306942, 1350401794, 188044116,
                1740393120, -2013242769, 207311671, 1861876658, -962016288, -865105271, -15675046, -1273011788, 9226838,
                906253170, -1561651292, -300491515, -409022139, 611623625, 1529503331, 943193131, -1180448561, 88712879,
                1630557185, -17136268, -1208615326, 428239158, 256807260, -918201512, 2022301052, -1365374556,
                -877812100, 2029921285, -1949144213, 2053000545, -563019122, 224422509, 741141734, -1881066890,
                -280201419, 1959981692, 302762817, 477313942, 358330821, -1944532523, -980437107, -1520441951,
                -613267979, -1540746690, -1180123782, -1604767026, 1407644227, -926603589, 1418723393, 2045743273,
                -309117167, 949946922, -105868551, -487483019, 1715251004, -221593655, 2116115055, -1676820052,
                394918360, -2111378352, 1723004967, -224939951, -730823623, -200901038, -2133041681, 1627616686,
                -637758336, -1423029387, 1400407571, 861573924, 1521965068, -614045374, 412378545, 2056842579,
                -225546161, 1660341981, 1707828405, -513776239, -115981255, -1996145379, -2009573356, 44694054,
                616913659, 1268484348, -980797111, -464314672, 1545467677, 174095876, -1260470858, 1508450002,
                1730695676, -613360716, 2086321364, -144957473, 202989102, 54793305, -1011767525, 2017450362,
                -761618523, 1572980186, -138358580, 1111304359, 1367056877, 1231098679, 2088262724, 1767697297,
                -921727838, 1743091870, 974339502, 1512597341, -1908845304, 1632152668, -987957372, 1394083911,
                433477830, 579364091, -27455347, -772772319, -478108249, 641973067, -1629332352, 1599105133, 1191519125,
                862581799, -850973024, -188136014, -398642147, 513836556, 1899961764, 2110036944, 512068782,
                -1988800041, -2054857386, 321551840, -1717823978, -1311127543, 373759091, 71650043, 565005405,
                1033674609, 1344695234, 709315126, 1711256293, -1226183001, -1451283945, 628494029, 1635747262,
                -689919247, 1091991202, 1283978365, 749078685, 1987661236, 1992010052, -2003794364, 2099683431,
                267011343, -1326783466, 678839392, -312043613, 1565061780, 178873340, -719911279, -314555472,
                -231514590, 161027711, 1080368165, 1660461722, -337050383, 399572447, -1555785489, -1502682588,
                2143158964, 592925741, -980213649, -724779906, 395465301, 635561967, 700445106, 1198493979, 1707436053,
                149364933, -1767142986, 1950272542, -819076405, 687992680, 1960992977, 1342528780, -2110840904,
                340172712, -486861654 };

The war of the random number generators

These days, there seems to be some sort of small war to define what is a modern good random number generators to advise for simulations. Historically, the Mersenne-Twister (MT thereafter) won this war. It is used by default in many scientific libraries and software, even if there has been a few issues with it:

  1. A bad initial seed may make it generate a sequence of low quality for at least as many as 700K numbers.
  2. It is slow to jump-ahead, making parallelization not so practical.
  3. It fails some TestU01 Bigcrush tests, mostly related to the F2 linear algebra, the algebra of the Mersenne-Twister.

It turns out, that before MT (1997), a lot of the alternatives were much worse, except, perhaps, RANLUX (1993), which is quite slow due to the need of skipping many points of the generated sequence.

The first issue has been mostly solved by more modern variants such as MT-64 or Well19937a or Memt19997. The warm-up needed has been thus significantly shortened, and a better seed initialization is present in those algorithms. It is not clear however that it has been fully solved, as there are very few studies analyzing the quality with many different seeds, I found only a summary of one test, p.7 of this presentation.

The second issue may be partly solved by considering a shorter period, such as in the Well1024 generator.

The third issue may or may not be a real issue in practice. Those tests can be seen as taylored to make MT (and F2 algebra based generators) fail and not be all that practical. However, Vigna exposes the problem on some more concrete examples in his recent paper. The title of this paper has the provocative title It Is High Time We Let Go Of The Mersenne Twister. Before that paper, Vigna and her arch-enemy O’Neil, regularly advised to let go of the Mersenne-Twister and use a generator they created instead. For Vigna, the generator is some variant of xorshift, the most recent being xoroshiro256**, and for O’Neil, it is one of her numerous PCG algorithms. In both cases, a flurry of generators is proposed, and it seems that a lot of them are poor (Vigna criticizes strongly PCG on his personal page; O’Neil does something similar against xorshift variants sometimes with the help of Lemire). The recommended choice for each has evolved over the years. For a reader or a user, it looks then that both are risky/unproven. The authors of MT recently also added their own opinion on a specific xorshift variant (xorshift128+), with their papers Again, random numbers fall mainly in the planes: xorshift128+ generators and Pseudo random number generators: attention for a newly proposed generator. An important insight of that latter paper, is to insist that it is not enough to pass a good test suite like BigCrush for a generator to be good.

So what is recommended then?

A good read on the subject is another paper, with the title Pseudorandom Number Generation: Impossibility and Compromise, also from the MT authors, explaining the challenge of defining what is a good random number generator. It ignores however the MRG family of generator studied by L’Ecuyer, whose MRG32k3a is also relatively widely used, and has been there for a while now without any obvious defect against it being proven (good track record). This generator has a relatively fast jump-ahead, which is one of the reasons why it regained popularity with the advent of GPUs and does not fail TestU01 BigCrush. It is a bit slower than MT, but not much, especially with this implementation from Vigna (3x faster than original double based implementation).

There are not many studies on block based crypto-generator such as AES or Chacha for simulation, which become a bit more trendy (thanks to Salmons paper) as they are trivial use in a parallel Monte-Carlo simulation (jump-ahead as fast as generation of one number). In theory the uniformity should be good, since otherwise that would be a channel of attack.

The conclusion of the presentation referenced earlier in this post, is also very relevant:

  • use the best sequential generators (i.e. MT, MRG32k3a or some Well),
  • test the stochastic variability by changing generator,
  • do not parallelize by inputing different seeds (prefer a jump-ahead or a tested substream approach).

Previous

Next