Sobol with 64-bits integers

A while ago, I wondered how to make some implementation of Sobol support 64-bits integers (long) and double floating points. Sobol is the most used quasi random number generator (QRNG) for (quasi) Monte-Carlo simulations.

The standard Sobol algorithms are all coded with 32-bits integers and lead to double floating point numbers which can not be smaller than \( 2^{-31} \). I was recently looking back at the internals at Sobol generators, and noticed that generating with 64-bits integers would not help much.

A key to understanding why is to analyze exactly what is the output that Sobol generates. If one asks for a sequence of N numbers (in dimension D), the output will be a multiple of \( 2^{-L} \) where L is the log-2 of N. The 32 bits only become useful when \( N > 2^{31} \). An implementation with 64-bit integers becomes only useful if we query an extremely long sequence (longer than 1 billion numbers), which is not all that practical in reality. Furthermore, the direction numbers would then require double the amount of memory (which may be relatively large for 20K dimensions).

Another interesting detail I learnt recently was to avoid skipping the first point, when scrambling (or randomization) is applied, as per this article from Owen (2020). In a non-randomized or non-scrambled setting, we skip the first point typically because it is 0, and 0 is often problematic, for example if we need to take the inverse cumulative distribution function, to simulate a specific distribution. What I find slightly surprising is that there is no symmetry between 1 and 0: the point (1, 1) is never generated by a two-dimensional Sobol generator, but (0, 0) is the first number. If we apply some sort of inverse cumulative distribution (and no scrambling), it looks like then the result would be skewed towards negative infinity.

Intel failure and the future of computing

What has been happening to the INTC stock today may be revealing of the future. The stock dropped more than 16%, mainly because they announced that their 7nm process does not work (well) and they may rely on an external foundry for their processors. Initially, in 2015, they thought they would have 8nm process by 2017, and 7nm by 2018. They are more than 3 years late.

Intel used to be a leader in the manufacturing process for microprocessor. While the company has its share of internal problems, it may also be that we are starting to hit the barrier, where it becomes very difficult to improve on the existing. The end of Moore’s law has been announced many times, it was already a subject 20 years ago. Today, it may be real, if there is only a single company capable of manufacturing processor using a 5nm process (TSMC).

It will be interesting to see what this means for software in general. I suppose clever optimizations and good parallelization may start playing a much more important role. Perhaps we will see also more enthusiasm towards specialized processors, somewhat similar to GPUs and neural network processors.

March 9, 2020 crash - where will CAC40 go?

The stock market crashed by more than 7% on March 9, 2020. It is one of the most important drop since September 2001. I looked at BNP warrant prices on the CAC40 French index, with a maturity of March 20, 2020 , to see what they would tell about the market direction on the day of the crash. This is really a not-so-scientific experiment.

The quotes I got were quite noisy. I applied a few different techniques to imply the probability density from the option prices:

  • The one-step Andreasen-Huge with some regularization. The regularization is a bit too mild, although, in terms of implied vol, this was the worst fit among the techniques.
  • The stochastic collocation towards a septic polynomial, no regularization needed here. The error in implied volatilities is similar to Andreasen-Huge, even though the implied density is much smoother. I however discovered a small issue with the default optimal monotonic fit, and had to tweak a little bit the optimal polynomial, more on this later in this post.
  • Some RBF collocation of the implied vols. Best fit, with regularization, and very smooth density, which however becomes negative in the high strikes.
  • Some experimental Andreasen-Huge like technique, with a minimal grid and good regularization.

Implied probability density for March 20, CAC40 warrants.

The implied forward price was around 4745.5. Interestingly, this corresponds to the first small peak visible on the blue and red plots. The strongest peak is located a little beyond 5000, which could mean that the market believes that the index will go back up above 5000. This does not mean that you should buy however, as there are other, more complex explanations. For example, the peak could be a latent phenomenon related to the previous days.

Now here is how the plot is with the “raw” optimal collocation polynomial:

Implied probability density for March 20, CAC40 warrants with raw optimal polynomial collocation.

Notice the spurious peak. In terms of implied volatility, this corresponds to a strange, unnatural angle in the smile. The reason for this unnatural peak lies in the details of the stochastic collocation technique: the polynomial we fit is monotonic, but ends up with slope close to zero at some point in order to better fit the data. If the slope is exactly zero, there is a discontinuity in the density. Here the very low slope tranlates to the peak. The fix is simply to impose a floor on the slope (although it may not be obvious in advance to know how much this floor should be).

Collocation polynomials for March 20, CAC40 warrants.

And for the curious, here are the implied vol plots:

Implied vol fits for March 20, CAC40 warrants.

42

Today my 6-years old son came with a math homework. The stated goal was to learn the different ways to make 10 out of smaller numbers. I was impressed. Immediately, I wondered

how many ways are there to make 10 out of smaller numbers?

This is one of the beauties of maths: a very simple problem, which a 6-years old can understand, may actually be quite fundamental. If you want to solve this in the general case, for any number instead of 10, you end up with the partition function. And in order to find this, you will probably learn recurrence relations. So what is the answer for 10?

42

Then I looked at one exercise they did in class, which was simply to find different ways to pay 10 euros with bills of 10, 5 and coins of 2 and 1 euro(s).

Numba, Pypy Overrated?

Many benchmarks show impressive performance gains with the use of Numba or Pypy. Numba allows to compile just-in-time some specific methods, while Pypy takes the approach of compiling/optimizing the full python program: you use it just like the standard python runtime. From those benchmarks, I imagined that those tools would improve my 2D Heston PDE solver performance easily. The initialization part of my program contains embedded for loops over several 10Ks elements. To my surprise, numba did not improve anything (and I had to isolate the code, as it would not work on 2D numpy arrays manipulations that are vectorized). I surmise it does not play well with scipy sparse matrices. Pypy did not behave better, the solver became actually slower than with the standard python interpreter, up to twice as slow, for example, in the case of the main solver loop which only does matrix multiplications and LU solves sparse systems. I did not necessarily expect any performance improvement in this specific loop, since it only consists in a few calls to expensive scipy calculations. But I did not expect a 2x performance drop either.

While I am sure that there are good use cases, especially for numba, I was a bit disappointed that it likely would require to significantly change my code to have any effect (possibly to not do the initialization with scipy sparse matrices, but then, it is not so clear how). Also, I find most benchmarks dumb. For example, the 2D ODE solver of the latter uses a simple dense 2D numpy array. On real code, things are not as simple.

Next I should try Julia again out of curiosity. I tried it several years ago in the context of a simple Monte-Carlo simulation and my experiment at the time was not all that encouraging, but it may be much better at building PDE solvers and not too much work to port my python code. I am curious to see if it ends up being faster, and whether the accessible LU solvers are any good. If Julia delivers, it’s a bit of a shame for python, since it has so many excellent high quality numerical libraries. Also when I look back at my old Monte-Carlo test, I bet it would benefit greatly from numba, somewhat paradoxically, compared to my current experiment.

About

I just moved my blog to a static website, created with Hugo, I explain the reasons why here. You can find more about me on my linkedin profile. In addition you might find the following interesting:

Fixing NaNs in Quadprog

Out of curiosity, I tried quadprog as open-source quadratic programming convex optimizer, as it is looks fast, and the code stays relatively simple. I however stumbled on cases where the algorithm would return NaNs even though my inputs seemed straighforward. Other libraries such as CVXOPT did not have any issues with those inputs.

Searching on the web, I found that I was not the only one to stumble on this kind of issue with quadprog. In particular, in 2014, Benjamen Tyner gave a simple example in R, where solve.QP returns NaNs while the input is very simple: an identity matrix with small perturbations out of the diagonal. Here is a copy of his example:

    library(quadprog)

    n <- 66L

    set.seed(6860)
    X <- matrix(1e-20, n, n)
    diag(X) <- 1
    Dmat <- crossprod(X)
    y <- seq_len(n)
    dvec <- crossprod(X, y)

    Amat <- diag(n)
    bvec <- y + runif(n)

    sol <- solve.QP(Dmat, dvec, Amat, bvec, meq = n)

    print(sol$solution) # this gives all NaNs

Other people stumbled on similar issues.

In my specific case, I was able to debug the quadprog algorithm and find the root cause: two variables \(g_c\) and \(g_s\) can become very small, and their square becomes essentially zero, creating a division by zero. If, instead of computing \( \frac{g_s^2}{g_c^2} \) we compute \( \left(\frac{g_s}{g_c}\right)^2 \), then the division by zero is avoided as the two variables are of the same order.

Sample code change [on github](https://github.com/cran/quadprog/pull/1/commits/7f51915f7c662c7fac3d4e2ab067cfbc292767f8).

While it probably does not catter for all the possible NaN use cases, it did fix all the cases I stumbled upon.

On the Probability of a Netflix Stock Crash

This is a follow up to my previous post where I explore the probability of a TSLA stock crash, reproducing the results of Timothy Klassen.

According to the implied cumulative probability density, TSLA has around 15% chance of crashing below $100. Is this really high compared to other stocks? or is it the interpretation of the data erroneous?

Here I take a look at NFLX (Netflix). Below is the implied volatility according to three different models.

Black volatility implied from Jan 2020 NFLX options.

Again, the shape is not too difficult to fit, and all models give nearly the same fit, within the bid-ask spread as long as we include an inverse relative bid-ask spread weighting in the calibration.

The corresponding implied cumulative density is

Cumulative density implied from Jan 2020 NFLX options.

More explicitely, we obtain

Model probability of NFLX < 130
Collocation 3.3%
Lognormal Mixture 3.6%
Andreasen-Huge regularized 3.7%

The probability of NFLX moving below $130 is significantly smaller than the probability of TSLA moving below $100 (which is also around the same 1/3 of the spot price).

For completeness, the implied probability density is

Probability density implied from Jan 2020 NFLX options.

I did not particularly optimize the regularization constant here. The lognormal mixture can have a tendency to produce too large bumps in the density at specific strikes, suggesting that it could also benefit of some regularization.

What about risky stocks like SNAP (Snapchat)? It is more challenging to imply the density for SNAP as there are much fewer traded options on the NASDAQ: less strikes, and a lower volume. Below is an attempt.

Black volatility implied from Jan 2020 SNAP options.

Cumulative density implied from Jan 2020 SNAP options.

We find a probability of moving below 1/3 of the current price lower than with TSLA: around 12% SNAP vs 15% for TSLA.

On the Probability of a TSLA Stock Crash

Timothy Klassen had an interesting post on linkedin recently, with the title “the options market thinks there is a 16% chance that Tesla will not exist in January 2020”. As I was also recently looking at the TSLA options, I was a bit intrigued. I looked at the option chain on July 10th, and implied the European volatility from the American option prices. I then fit a few of my favorite models: Andreasen-Huge with Tikhonov regularization, the lognormal mixture, and a polynomial collocation of degree 7. This results in the following graph

Black volatility implied from Jan 2020 TSLA options.

The shape is not too difficult to fit, and all models give nearly the same fit, within the bid-ask spread as long as we include an inverse relative bid-ask spread weighting in the calibration. Note that I removed the quote for strike 30, as the bid-ask spread was, unusually extremely large and would only introduce noise.

Like Timothy Klassen, we can look at the implied cumulative density of each model.

Cumulative density implied from Jan 2020 TSLA options.

More explicitely, we obtain

Model probability of TSLA < 100 probability of TSLA < 15
Collocation 15.4% 7.2%
Lognormal Mixture 15.2% 8.0%
Andreasen-Huge regularized 15.0% 7.7%

It would have been great of Timothy Klassen had shown the implied density as well, in the spirit of my earlier posts on a similar subject.

Probability density implied from Jan 2020 TSLA options.

In particular, the choice of model has a much stronger impact on the implied density. Even though Andreasen-Huge has more modes, its fit in terms of volatilities is not better than the lognormal mixture model. We could reduce the number of modes by increasing the constant of the Tikhonov regularization, at the cost of a slightly worse fit.

The collocation produces a significantly simpler shape. This is not necessarily a drawback, since the fit is quite good and more importantly, it does not have a tendency to overfit (unlike the two other models considered).

Model weighted root mean square error in vols
Collocation 0.00484
Lognormal Mixture 0.00432
Andreasen-Huge regularized 0.00435

The most interesting from T. Klassen plot, is the comparison with the stock price across time. It is expected that the probability of going under $100 will be larger when the stock price moves down, which is what happen around March 27, 2018. But it is not so expected that when it comes back up higher (above $350 in mid June), the probability of going under $100 stays higher than it was before March 27, 2018, where the stock price was actually lower. In fact, there seems to be a signal in his time-serie, where the probability of going under $100 increases significantly, one week before March 27, that is one week before the actual drop, suggesting that the drop was priced in the options.

The Fourth Moment of the Normal SABR Model

I was wondering if I could use the SABR moments to calibrate a model to SABR parameters directly. It turns out that the SABR moments have relatively simple expressions when \(\beta=0\), that is, for the normal SABR model (with no absorption). This is for the pure SABR stochatic volatility model, not the Hagan approximation. For the Hagan approximation, we would need to use the replication by vanilla options to compute the moments.

I first saw a simple derivation of the fourth moment in Xavier Charvet and Yann Ticot 2011 paper Pricing with a smile: an approach using Normal Inverse Gaussian distributions with a SABR-like parameterisation. I discovered a more recent (2013) paper from Lorella Fatone et al. The use of statistical tests to calibrate the normal SABR model with simple formulae for the third and fourth moments (sci-hub.nu is your friend to read this). I had actually derived the third moment myself previously, based on the straighforward approach from Xavier Charvet, which consists merely in applying the Ito-Doeblin formula several times. Funnily, Xavier Charvet and Yann Ticot make it Irish, by calling it the Ito-Dublin formula.

Now the two papers lead to a different formula for the fourth moment. I did not have the courage to find out where exactly was the mistake of Lorella Fatone et al., as their paper is much more abstract, and follow a somewhat convoluted path to derive the formula. But I could not find a mistake in Xavier Charvet and Yann Ticot paper. They don’t directly give the fourth moment, but it is trivial to derive the centered fourth moment from their formulae. By centered, I mean the fourth moment of the process \(F(t)-F(0)\) where \(F(0)\) is the initial forward price.

$$ \frac{A}{\nu^2}\left(e^{6\nu^2 t}-1\right)+2\frac{B}{\nu^2}\left(e^{3\nu^2 t}-1\right)+6\frac{C}{\nu^2}\left(e^{\nu^2 t}-1\right) $$ with $$ A = \frac{\alpha^4 (1+ 4\rho^2)}{5 \nu^2},,$$ $$ B = -\frac{2 \rho^2 \alpha^4}{\nu^2},,$$ $$ C = - A - B ,.$$

Fatone et al. includes a factor \(\frac{1}{3}\) in front of \(\rho^2\), which I believe is a mistake. There is however no error in their third moment.

Unfortunately, my quick mapping idea does not appear to work so well for long maturities.

Previous

Next