Choi, Huh and Su have a very good paper entitled Tighter uniform bounds for Black–Scholes implied volatility and the applications to root-finding. What’s particularly great is that it gives both a decent lower bound and a proof a monotone convergence using Newton’s method starting from this lower bound.
The industry standard for solving the Black-Scholes implied volatility is Peter Jäckel Let’s be rational. I was wondering if one could not create a robust mathematically backed solver at least as fast and accurate as Jäckel’s algorithm.
It turns out that it is not so difficult to extend the monotone convergence proof to the Euler-Chebyshev method on the logarithm of normalized price. This is however not numerically robust enough when the normalized call option price (c) is close to 1: c is between 0 and 1, and when close to 1, the misbehaves numerically due to the limited floating point accuracy. It is not a convergence issue, but a true floating point arithmetic issue. A remedy is to switch to the log of the complementary normalized price (1-c). But then the Euler-Chebyshev method is not monotone on it. Fortunately, it turns out that Halley’s method is monotone on the logarithm of the complementary normalized price. Overall, when combined, the methods lead to a much faster pricing: 3 iterations are enough to reach machine epsilon level of accuracy.
Newton, Euler-Chebyshev and Halley's methods from Iterative Methods for the Solution of Equations by Traub (1964).
For a truly robust solver, this is not enough. There are more floating point or practical convergence issues: a Bachelier based refinement for tiny prices is necessary, and polishing via the very accurate Jäckel Black-Scholes price function allows to reach closer to machine epsilon accuracy. This last polishing step end up relatively expensive compared to the rest (20% of computational time).
Above, I have described the journey behind creating this new solver. The reverse journey also tells an interesting tale. We could start from the Black-Scholes option price, and notice that a really accurate Black-Scholes formula implementation (close to machine epsilon) is not trivial. The textbook formula based on the normal CDF, often used in various software systems used at banks or hedge funds, is not very accurate in many cases. One should really use Peter Jäckel’s implementation everywhere, otherwise it is pointless to attempt to solve with such a high accuracy (in the paper, I thus make the Jäckel polishing step optional).
Error in the Black-Scholes price with different formulas: textbook CDF, more advanced erfcx, and Jäckel's expansion. On those realistic examples, we hit Jäckel's expansion zone.
I named the resulting solver with the somewhat cryptic name Thiophene, as Thiophene is a chemical compound with formula \[ C_4 - H_4 - S \], and C-H-S are the initials of Choi, Huh and Su.
Conclusion
A mathematical proof of convergence is great to have, but far from enough to obtain a robust solver. The paper on the Thiophene implied volatlity solver is available at https://arxiv.org/abs/2605.22427 and example Java code here.
Several years ago, I had explored accuracy and performance of different ways to imply the Black-Scholes volatility. Jherek Healy proposed some improvements over my naive algorithm on his blog. Recently, a Linkedin post mentioned a new paper from Wolfgang Schadner which presents an almost explicit formula for the implied volatility. Almost because it actually relies on some implementation of the quantile function for the inverse Gaussian distribution: it requires a specialized numerical algorithm in practice, such as the one from Giner and Smyth.
The idea is actually not new, the same formula has been used back in 2016 by Jaehyuk Choi in his github repository.
If we have a good inverse Gaussian distribution function, the algorithm is clean and simple:
using Distributions
export impliedVolatilityIG
const N = Normal()
function impliedVolatilityIG(isCall::Bool, price::T, forward, strike, timeToExpiry, df) where {T}
c, ex = normalizePrice(isCall, price, forward, strike, df)
if c >=1/ ex || c >=1 throw(DomainError(c, string("price higher than intrinsic value ", 1/ ex)))
elseif c <=0 throw(DomainError(c, "price is negative"))
end x = log(ex)
returnif abs(x) <4* eps(T)
2* quantile(N, (c +1) /2) / sqrt(timeToExpiry)
else mu =2/ abs(x)
u = cquantile(InverseGaussian(mu), c)
2/ sqrt(u * timeToExpiry)
endend
function normalizePrice(isCall::Bool, price, f, strike, df)
c = price / (f * df)
ex = f / strike
if!isCall
if ex <=1 c = c +1-1/ ex # put call parityelse#duality + put call parity c = ex * c
ex =1/ ex
endelseif ex >1# use c(-x0, v) c = (f * (c -1) + strike) / strike # in out duality, c = ex*c + 1 - ex ex =1/ ex
endendreturn c, ex
end
I was wondering how accurate it was especially for very out-of-the-money options, and how fast it was in practice. Below I compare various implementations in Julia:
Peter Jaeckel, which is arguably the industry standard.
Jherek Healy, which consists in (log-)Householder iterations on top of a good explcit initial guess.
The inverse Gaussian approach using Julia’s Distribution.jl package (as in the above code).
The inverse Gaussian approach using a basic port of statsmod R algorithm, adapted to compute the complementary quantile function. This corresponds to a different implementation of cquantile(InverseGaussian(mu),c).
I consider call optiun prices for a fixed strike K = 200 for a forward F = 100, and a time to maturity T = 1.0 year, varying the volatility from 0.02 to 4.0 in steps of 0.01. This constitute 399 samples.
Algorithm
vol RMSD
vol MAXAD
Price MAXAD
Rel. Price MAXAD
Time
Jaeckel
8.66e-16
4.00e-15
4.26e-14
1.81e-12
104 μs
Healy
1.57e-15
9.77e-15
8.53e-14
3.02e-10
107 μs
IG Distribution
7.37e-16
5.41e-15
2.84e-14
8.99e-11
246 μs
IG statsmod
7.77e-16
7.55e-15
8.24e-13
3.03e-10
311 μs
The inverse Gaussian approach is competitive but still twice slower than Peter Jaeckel’s algorithm on this example. How does the error in volatility actually look like?
Error in impled volatility of the various algorithms
The accuracy of the inverse Gaussian looks as good as Jackel’s algorithm, when using Julia’s package Distibutions.jl (it is slightly less accurate towards low vol, but slightly more accurate for high vols).
It is revealing to also look at the case of varying strikes for a fixed volatility. We consider a volatility fixed to 0.1 (or 10%) and vary the strike from 100 to 500 by steps of 1.
Algorithm
vol RMSD
vol MAXAD
Price MAXAD
Rel. Price MAXAD
Time
JAECKEL
6.50e-16
2.73e-15
7.11e-15
6.59e-12
154 μs
HEALY
6.56e-16
2.73e-15
3.55e-14
6.59e-12
110 μs
IG Distribution
1.21e-15
7.76e-15
1.78e-14
1.95e-11
1704 μs
IG statsmod
8.68e-16
4.27e-15
1.06e-14
1.32e-11
288 μs
On this example the Julia Distributions.jl leads to a performance degradation by a factor of 10 and is less accurate. It would be worth investigating exactly where lies the bottleneck, as statsmod custom implementation keeps a good performance profile.
Error in impled volatility of the various algorithms
Because it originates from the CG community, I had assumed that this was faster than the more classic scrambling ACM Algorithm 823 by Hickernell and Hong. I was wrong. It may be faster for specific use cases, but in general it is (much) slower, especially for the kind of (quasi) Monte-Carlo simulation that we use in quantitative finance.
The slowness is because we can’t just generate the next number from the sequence, we must jump to each number instead. The shuffling is implemented by jumping to some hash based location. And jumping is somewhat significantly slower than getting the next number. Furthermore, the grey code optimization can not be applied anymore.
In my code, for arbirary dimension (on the example to integrate successively a function from dimension 1 to dimension 5000 using a 100 steps on the dimension side), the Burley implementation (actually optimized to skip faster a la Quantlib instead of the original code from Burley) is 3 times slower than Algorithm 823.
It is also not obviously better on the example of Ilya M. Sobol’, Danil Asotsky in Construction and Comparison
of High-Dimensional Sobol’ Generators (Equation 6.1 and Figures 8 and 9), also reproduced by Peter Caspers.
$$
I_1 = \int_{[0,1]^d} \prod_{i=1}^d \left( 1 + 0.01(x_i - 0.5) \right) dx_1 dx_2 \dots dx_d .
$$
Absolute error in integral I1 computed with different Sobol sequences
In the Figure above, I did not use 30031 points of the sequence as in the original test, but the closest power of 2, that is 32768, which is supposed to be more optimal (the choice of 30031 is strange but actually does not really change the outcome). Except for Sobol-Harase, all use Joe Kuo new-joe-kuo-6.21201 direction numbers. Sobol-Harase uses Shin Harase niederreiter-nut-s21201.txt, which interestingly, does not fare as well as the more classic Joe Kuo set.
The results vary quite a bit depending on the random number generator used, or its seed. In the above, the Well1024a generator with default seed for the seeds of Burley’s algorithm has the lowest maximum error, but if we change the seed to 123456, this is not true anymore. Algorithm 823 (Owen and Owen-Faure-Tezuka) is not worse, and its results are also dependent on the seed (although I am under the impression that the impact is not as large). This is something that was not shown in Peter Caspers Blog.
I struggled a bit having Jack Audio Connection Kit working in Opensuse Tumbleweed.
My error was to install the jack package. The solution is actually extremely simple: use pipewire-jack instead of jack.
The state of the art of Sobol scrambling has changed slightly recently, thanks to the paper from Brent Burley of Walt Disney Studios Practical Hash-based Owen Scrambling.
Before that, ACM Algorithm 823 by Hickernell and Hong was the usual reference. Brent Burley’s algorithm is supposedly both faster and with better properties. In particular, it performs both shuffling and scrambling.
Peter Caspers implemented the new algorithm in Quantlib. It’s neat to have some decent implementation in CPP. In applying the algorithm of the paper, a few sub-optimal choices were made:
The base RNG for seeds (for the shuffling and scrambling) is Mersenne-Twister. Contrary to what I thought at the beginning, it is not used for skipping/jumping as the seeds are only set in the initialization, and are then constant for all numbers in the sequence. The skipping/jumping logic is however absent from the Quantlib CPP code, which is a bit unfortunate, but is almost trivial to code (just update the nextSequenceCounter_ variable).
There is a 4-by-4 seeding logic which looks odd. The paper code processes 4 dimensions as it is specialized to CG, although even then, it processes those 4 dimensions slightly differently: a much simpler hash function is used (it is given in the code URL inside the paper) and the seed value is not updated by the hash function: the seed is merely hashed with the dimension number, for 4 dimensions.
I don’t really see what there is to gain to follow this 4-by-4 seeding. It would be simpler and less concerning to simply use the RNG to produce N seeds where N is the number of dimensions (instead of N/4). The only possible gain with the 4-by-4 seeding is some memory (as only N/4 seeds are kept in memory), but the total amount is not significant: for 21000 dimensions, keeping all in memory takes around 656 KB (yes, 640 KB is not all we need!). The cost of 4-by-4 seeding is a potentially less random scrambling.
On the same subject, there is another new low discrepancy sequence which appears to have some good properties as well as being as fast to generate as Sobol, it’s called SZ Sequence.
Recently, I have spent some time on simple neural networks. The idea is to employ them as universal function approximators for some problems appearing in quantitative finance. There are some great papers on it such as the one from Liu et al. (2019) or Horvath et al. (2019) Deep Learning Volatility or Rosenbaum & Zhang (2021).
Incidentally, I met Liu back when I was finishing my PhD in TU Delft around 2020.
I thought I would try out what Julia offers in terms of library for neural networks. Being a very trendy subject, and Julia a modern language for the scientific community, I had imagined the libraris to be of good quality (like the many I have been using in the past). Surprisingly, I was wrong.
First, I tried SimpleChain, which just crashed (core dumped!) on a very simple example. I did not bother finding the root cause, I decided to look for another library. I then tried LUX. The execution kept going forever without returning with the @compile keyword. I probably was not being lucky, even though my code was only 10s of lines long. So I decided to use Flux, which actually is as simple to use as Pytorch, a well-known library in the Python world.
Things works with Flux, and I do manage to do many experiments. But the performance is not great. This was another surprise: Pytorch was actually faster for many tasks on the CPU (no GPU). For example, to train a multi-layer-perceptron with 4 hidden layers of 200 neurons, Julia was taking several hours, until I hit CTRL+C and launched the same training in Pytorch (which took 15 minutes). I think it may be something with AdamW optimizer and relatively wide (but not really compared to LMM stuff) networks.
Overall, the experience is pretty disappointing. Maybe it shows how much effort has been put in the Python ecosystem.
The optimization is on the x’s instead of the y’s, meaning the strike axis is fixed.
An original approach to avoid spurious modes, although I would have liked more details on it, with concrete examples of the penalty. Penalties are often challenging to get right.
A nice analysis of the asymptotic behavior in the wings, showing that an exponentially quadratic form for extrapolation corresponds to linear slopes in implied variance.
I had also explored a fixed strike axis initially (even back in 2014 - I uploaded some old notes on SSRN around this). I found it to be somewhat unstable back then, but it may well be that I did not put enough efforts into it, especially since optimizing the y’s instead of the x’s allowed for a straightforward use of B-splines, which is very attractive. It is very direct to impose monotonicity with B-splines.
The main advantage of optimizing on the x’s instead of the y’s is that the knots are defined in the more usual strike space.
Now I find interesting that someone else stumbled upon essentially the same idea starting from another point of view. There may be more merits to this parameterization than I thought.
The challenges I found with stochastic collocations were:
spurious spike when the collocation function derivative is close to zero which happens on occasion. This is not really like an extra mode - it is not smooth. Penalty or minimal slope are possible workarounds, but challenging to make robust.
dependency on the choice of knots, especially if nearly exact interpolation is required.
more minor, non-predictable runtime: the non-linear optimization is sometimes very fast, sometimes much slower. This is the case of most arbitrage-free interpolation techniques.
Leif Andersen and Mark Lake recently proposed the use of Non-Uniform Fast Fourier Transform for option pricing via the characteristic function. Fourier techniques are most commonly used for pricing vanilla options under the Heston model, in order to calibrate the model. They can be applied to other models, typically with known characteristic function, but also with numerically solved characteristic function as in the rough Heston model, and to different kind of payoffs, for example variance/volatility swaps, options. The subject has been vastly explored already so what’s new with this paper?
At first, it was not obvious to me. The paper presents 3 different approaches, two where the probability density and cumulative density is first computed at a (not so sparse) set of knots and the payoff is integrated over it. The remaining approach is the classic one presented in the Carr and Madan paper from 1999 as well as in most of the litterature on the subject. The only additional trick is really the use of NUFFT along with some clever adaptive quadrature to compute the integral.
I thought the main cost in the standard Fourier techniques was the evaluation of the characteristic function at many points, because the characteristic function is usually relatively complicated. And, in practice, it mainly is. So what does the new NUFFT algorithm bring? Surely the characteristic function must still be evaluated at the same points.
The NUFFT becomes interesting to compute the option price at many many strikes at the same time. The strikes do not need to be equidistant and can be (almost) arbitrarily chosen. In Andersen and Lake paper, even a very fast technique such as the COS method reaches a threshold of option per second throughput, as the number of strikes is increased, mainly because of the repeated evaluation of the sum over different strikes, so the cost becomes proportional to the number of strikes.
Evaluating at many many points becomes not much more expensive than evaluating at a few points only. This is what opens up the possibilities for the first 2 approaches based on the density.
It turns out that it is not very difficult to rewrite the COS method so that it can make use of the NUFFT as well. I explain how to do it here. Below is the throughput using NFFT.jl Julia package:
Throughput on the Heston model with M=256 points.
While a neat trick, it is however not 100% clear to me at this point where this is really useful: calibration typically involve a small number of strikes per maturity.
According to the litterature, the initial forward variance curve is typically built from the implied volatilities through the variance swap replication: for each maturity, the price of a newly issued variance swap is computed to this maturity, using the implied volatilities for this maturity. Then we may compute the forward variance by interpolating linearly the variance swap prices and differentiating. This leads to the plot below.
Forward variance curves for SPX500 as of October 2024.
One immediate question is how much does the truncation range play a role in this forward variance curve? Above, we plot the replications prices based on filtering the input data up to 10 Delta / 90 Delta for each maturity. Beyond this range, liquidity is usually much lower and the reliability of the input quotes may not be as good. The short end, and not so short forward variance up to 4 years is quite different. The long end is more aligned with something that looks like a basis spread between the two curves.
We may also calibrate the forward variance curve model directly to the implied volatilities during the model calibration, at the same time as the other stochastic volatility parameters (using a time-dependent correlation and vol-of-vol). This time-dependent Heston parameterization has effectively 3 parameters per expiry. In this case, we may want to also filter the input data to focus on the 10 Delta / 90 Delta range as we expect the stochastic volatility model to fit well where it matters the most - where the vanillas are liquid.
In the figure above, we also plot the forward variance curve implied by the model, through a calibration against vanilla options. We can see it is again quite different.
If we were to use the true replication based forward variance curve, we would effectively attempt to fit far away vols in the wings, which somehow seems wrong. Now if the model was able to fit everything well there would be no issue, but with 3 parameters per maturity, a choice has to be made. And the choice of the full replication does not look particularly great, as we fit exactly something that is not liquid.
I recently saw a news about a great new simulation scheme for the Heston model by Abi Jaber.
The paper suggests it is better than the popular alternatives such as the QE scheme of Leif Andersen. Reading it quickly, perhaps too quickly, I had the impression it would be more accurate especially when the number of time-steps is small.
The scheme is simple to implement so I decided to spend a few minutes to try it out. I had some test example for the DVSS2X scheme pricing a vanilla at-the-money option with Heston parameters v0=0.04, kappa=0.5, theta=0.04, rho=-0.9, sigma=1.0, and a time to maturity of 10 years. I don’t remember exactly where those parameters come from, possibly from Andersen paper. My example was using 8 time-steps per year, which is not that much. And here are the results with 1M paths (using scrambled Sobol):
N
Scheme
Price
Error
1M
DVSS2X
13.0679
-0.0167
IVI
13.0302
-0.0545
4M
DVSS2X
13.0645
-0.0202
IVI
13.0416
-0.0431
With Sobol scrambling and 1 million paths, the standard error of the Monte-Carlo simulation is lower than 0.01 and the error with this new IVI scheme is (much) larger than 3 standard deviations, indicating that the dominating error in the simulation is due to the discretization.
It is not only less accurate, but also slower, because it requires 3 random numbers per time-step, compared to 2 random numbers for QE or DVSS2X. The paper is very well written, and this small example may not be representative but it does cast some doubts about how great is this new scheme in practice.
While writing this, I noticed that the paper actually uses this same example, it corresponds to their Case 3 and it is indeed not obvious from the plots in the paper that this new IVI scheme is significanly better. There is one case, deep in the money (strike=60%), and very few time-steps (2 per year for example):
N
Steps/Year
Scheme
Price
Error
4M
2
DVSS2X
44.1579
-0.1721
IVI
44.2852
-0.0449
4
DVSS2X
44.2946
-0.0353
IVI
44.3113
-0.0187
8
DVSS2X
44.3275
-0.0025
IVI
44.3239
-0.0061
So the new scheme works reasonably well for (very) large time-steps, better than DVSS2 and likely better than QE (although, again, it is around 1.5x more costly). For smaller steps (but not that small), it may not be as accurate as QE and DVSS2. This is why QE was such a big deal at the time, it was significantly more accurate than a Euler discretization and allowed to use much less time-steps: from 100 or more to 10 (a factor larger than 10). IVI may be an improvement for very large step sizes, but it will matter much less for typical exotics pricing where observation dates are at worst yearly.
Update June 19, 2025
Out of curiosity I wondered how it behaved on my forward start option test. In the Table below I use 4M paths.
Scheme
Steps/Year
Price
DVSS2X
4+1
0.0184
80
0.0196
QE
4+1
0.0190
160
0.0196
IVI
4+1
0.0116
80
0.0185
160
0.0191
Clearly, the IVI scheme is not adequate here, it seems to converge very slowly. The price with 4+1 steps is very off, especially compared to the other schemes. The implementation is fairly straighforward, so the IVI scheme may well have a flaw.