Calibrating Heston to Variance Swaps - a bad idea?

An interesting idea to calibrate the Heston model in a more stable manner and reduce the calibration time is to make use of variance swap prices. Indeed, there is a simple formula for the theoretical price of a variance swap in the Heston model.

It is not perfect since it approximates the variance swap price by the expectation of the integrated variance process over time. In particular it does not take into account eventual jumps (obviously), finiteness of replication, and discreteness of observations. But it may be good enough. Thanks to this formula, we can calibrate three parameters of the Heston model: the initial variance, the long-term mean variance, and the speed of mean reversion to the term-structure of variance swaps. We do not need market prices of variance swaps, we may simply use a replication based on market vanilla options prices, such as the model-free replication of Fukasawa.

The idea was studied in the paper Heston model: the variance swap calibration by F. Guillaume and W. Schoutens back in 2014. The authors however only used the variance swap calibration as an initial guess. How bad would it be to fix the parameters altogether, as in a variance curve approach?

It turns out it can be quite bad since the Heston model does not allow to represent many shapes of the variance swap term structure. Below is an example of term-structure calibration on SPX as of October 2024.

Term-structure of variance swap prices.

The term-structure is V-shaped, and Heston can not fit well to this. The best fit leads to non-sensical parameters with a nearly zero kappa (mean reversion speed) and an exploding long-term mean theta. The next figure shows why it is non-sensical: because the kappa is very small, the variance will very often reach zero.

Sample path of the variance process (in terms of vol) using the Euler scheme with full truncation.

The calibration using inverse vega weights on vanilla option prices leads a not much worse fit of the variance swap term structure, but exhibits a much to high, unrealistic vol-of-vol of 184%, while a calibration on equally weighted option prices does not fit the term structure well at all.

Somewhat interestingly, the Double-Heston model allows to fit the term-structure much better, but it is far from obvious that the resulting calibrated parameters are much more realistic as it typically leads to process with very small kappa or a process with very small theta (but as there are two processes, it may be more acceptable).

Previously, I had explored a similar subject for the Schobel-Zhu model. It turns out that Heston is not much more practical either.

OpenSuse Tumbleweed to the Rescue

I recently upgraded a desktop computer, and to my surprise, the new motherboard was not fully supported by most Linux distributions. The main culprit was the network adapter, although the secure boot setup gave me lots of troubles as well. I had only a small usb key (2GB) and most (all?) live distributions do not fit on 2GB anymore. With the exception of Ubuntu images, I did not manage to boot from USB hard drive where I dumped the live image ISO, due to Secure boot related issues, even when disabling the feature or changing the settings in the BIOS. Using the small USB key worked with Secure boot enabled only.

So I had to find a distribution with a small size and with a new kernel which supported the network adapter (6.13+). I tried the rawhide Fedora, which has a small network install image, but that just did not work, because it’s too alpha.

I thought most rolling distributions would fit the bill. It turns out they don’t, EndeavourOS or Manjaro have relatively recent live install but (a) they are not small enough to fit on the USB key and (b) the images are a few months old where the kernel is too old.

OpenSuse Tumbleweed provides a daily updated network installer iso (and a live CD), this is the only thing that worked, and it worked well.

Overall I was a bit surprised that it was so challenging to find an installable distribution with the latest kernel.

Saving Us - Book Review

I had the opportunity to receive a free book on climate change, through the company I am working for. I had not heard of that book before, it called Saving Us and is written by an actual climate scientist (Katharine Hayhoe). Unfortunately, written by does not mean that it is a scientific book, and it’s not. The author does not spend much effort explaining the physics or the reports, but focuses on how to convince people this is an important problem to tackle. It is mildly interesting at first, as it presents the problem from a psychological angle. But it becomes quickly repetitive. The method is always the same, connect with what’s important to an audience who initially rejects climate warming (mostly the phenomenon, sometimes proposals around it), and make them understand that climate warming plays a role in their life, in the very matters they care about. It is presented as a series of personal experiences of the author, a list of examples.

There is a small part on taxing carbon, it’s only 3 pages, and quite optimistic. It consists in a socialist approach (which I am not necessarily opposed to). There is no mention of carbon trades, no mention of what can go wrong with the taxing or the trade of carbon.

On the global warming subject, I much preferred the more controversial book Unsettled of Steve Koonin, sometimes quoted by climate warming deniers (who may have not read it at all) and often denounced by climate warming supporters (who may have read it too closely). The arguments made were clearer, more scientific. Even if the book was not that great either, I found it helped discovering the subject from various angles.

Overall it’s a bit strange that this kind of book is given away by companies. On one hand, I can see how it resonates with business values such as how to deal with antagonistic people, how to talk to people around you. On the other hand, it feels like a waste of time and money. There are much better books or courses that can teach you more in a shorter time. For example, I attended by accident two 4h courses around Myers-Briggs, and found it to make a much bigger impact.

Monotonicity of the Black-Scholes Option Prices in Practice

It is well known that vanilla option prices must increase when we increase the implied volatility. Recently, a post on the Wilmott forums wondered about the true accuracy of Peter Jaeckel implied volatility solver, whether it was truely IEEE 754 compliant. In fact, the author noticed some inaccuracy in the option price itself. Unfortunately I can not reply to the forum, its login process does not seem to be working anymore, and so I am left to blog about it.

It can be quite challenging to evaluate the accuracy. For example, one potential error that the author makes is to use numbers that are not exactly representable in IEEE 754 standard such as 4.45 or 0.04. In Julia, and I believe in Python mpmath as well, there is a difference between a BigFloat(4.45) and a BigFloat(“4.45”):

BigFloat(4.45) = 4.45000000000000017763568394002504646778106689453125 BigFloat("4.45")= 4.449999999999999999999999999999999999999999999999999999999999999999999999999972

If we work only on usual 64-bit floats, we likely want to estimate the accuracy of an implementation using BigFloat(4.45) as input rather than BigFloat(“4.45”). To compute a vanilla option price, the naive way would be through the cumulative normal distribution function (the textbook way), which looks like

function bs(strike, forward, voltte)
    d1 = (log(forward/strike)+voltte^2 / 2)/voltte
    return forward*cdf(Normal(),d1)-strike*cdf(Normal(),d1-voltte)
end

It turns out that this produces relatively high error for out of the money prices. In fact, the prices are far from being monotonic if we increase the vol by 1 ulp a hundred times we obtain:

Peter Jaeckel proposes several improvements, the first simple one is to use the scaled complementary error function erfcx rather than the cumulative normal distribution function directly. A second improvement is to use Taylor expansions for small prices. It turns out that on the example with strike K=2.0, forward F=1.0, vol=0.125, the Taylor expansion is being used.

The error is much lower, with Taylor being even a little bit better. But the price is still not monotonic.

How would the implied vol corresponding to those prices look like? If we apply Peter Jaeckel implied vol solver to the naive Black-Scholes formula, we end up with an accuracy limited by the naive formula.

If we apply it to the exact prices or to erfcx or to Taylor, we still end up with a non monotonic implied vol:

There is no visible difference in accuracy if we start from the erfcx prices and apply the SR Householder solver of Jherek Healy:

It should not be too surprising: if we compose the Black-Scholes formula with the associated solver from Peter Jaeckel, we do not obtain the identity function. Because the option prices are not monotonic with the vol, there are several vols which may be solution to the same price, and the solver is guaranteed to not solve exactly, which leads to the numerical noise we see on the plots. In contrast, it is remarkable that if we compose exp with log we have a nice monotonic shape:

The prerequisite for a more accurate solver would be a Black-Scholes function which is actually monotonic.

Copilot vs ChatGPT on the Optimal Finite Difference Step-Size

When computing the derivative of a function by finite difference, which step size is optimal? The answer depends on the kind of difference (forward, backward or central), and the degree of the derivative (first or second typically for finance).

For the first derivative, the result is very quick to find (it’s on wikipedia). For the second derivative, it’s more challenging. The Lecture Notes of Karen Kopecky provide an answer. I wonder if Copilot or ChatGPT would find a good solution to the question:

“What is the optimal step size to compute the second derivative of a function by centered numerical differentiation?”

Here is the answer of copilot:

and the one of chatGPT:
and a second answer from chatGPT:

Copilot always fails to provide a valid answer. ChatGPT v4 proves to be quite impressive: it is able to reproduce some of the reasoning presented in the lecture notes.

Interestingly, with centered difference, for a given accuracy, the optimal step size is different for the first and for the second derivative. It may, at first, seem like a good idea to use a different size for each. In reality, when both the first and second derivative of the function are needed (for example the Delta and Gamma greeks of a financial product), it is rarely a good idea to use a different step size for each. Firstly, it will be slower since more the function will be evaluated at more points. Secondly, if there is a discontinuity in the function or in its derivatives, the first derivative may be estimated without going over the discontinuity while the second derivative may be estimated going over the discontinuity, leading to puzzling results.

News on the COS Method Truncation

The COS method is a fast way to price vanilla European options under stochastic volatility models with a known characteristic function. There are alternatives, explored in previous blog posts. A main advantage of the COS method is its simplicity. But this comes at the expense of finding the correct values for the truncation level and the (associated) number of terms.

A related issue of the COS method, or its more fancy wavelet cousin the SWIFT method, is to require a huge (>65K) number of points to reach a reasonable accuracy for some somewhat extreme choices of Heston parameters. I provide an example in a recent paper (see Section 5).

Gero Junike recently wrote several papers on how to find good estimates for those two parameters. Gero derives a slightly different formula for the put option, by centering the distribution on \( \mathbb{E}[\ln S] \). It is closer to my own improved COS formula, where I center the integration on the forward. The estimate for the truncation is larger than the one we are used to (for example using the estimate based on 4 cumulants of Mike Staunton), and the number of points is very conservative.

The bigger issue with this new estimate, is that it relies on an integration of a function of the characteristic function, very much like the original problem we are trying to solve (the price of a vanilla option). This is in order to estimate the \( ||f^{(20)}||_{\infty} \). Interestingly, evaluating this integral is not always trivial, the double exponential quadrature in Julia fails. I found that reusing the transform from \( (0,\infty) \) to (-1,1) of Andersen and Lake along with a Gauss-Legendre quadrature on 128 points seemed to be ok (at least for some values of the Heston parameters, it may be fragile, not sure).

While very conservative, it seems to produce the desired accuracy on the extreme example mentioned in the paper, it leads to N=756467 points and a upper truncation at b=402.6 for a relative tolerance of 1E-4. Of course, on such examples the COS method is not fast anymore. For comparison, the Joshi-Yang technique with 128 points produces the same accuracy in 235 μs as the COS method in 395 ms on this example, that is a factor of 1000 (on many other examples the COS method behaves significantly better of course).

Furthermore, as stated in Gero Junike’s paper, the estimate fails for less smooth distributions such as the one of the Variance Gamma (VG) model.

Variance Swap Term-Structure under Schobel-Zhu

I never paid too much attention to it, but the term-structure of variance swaps is not always realistic under the Schobel-Zhu stochastic volatility model.

This is not fundamentally the case with the Heston model, the Heston model is merely extremely limited to produce either a flat shape or a downward sloping exponential shape.

Under the Schobel-Zhu model, the price of a newly issued variance swap reads $$ V(T) = \left[\left(v_0-\theta\right)^2-\frac{\eta^2}{2\kappa}\right]\frac{1-e^{-2\kappa T}}{2\kappa T}+2\theta(v_0-\theta)\frac{1-e^{-\kappa T}}{\kappa T}+\theta^2+\frac{\eta^2}{2\kappa},,$$ where \( \eta \) is the vol of vol.

When \( T \to \infty \), we have \( V(T) \to \theta^2 + \frac{\eta^2}{2\kappa} \). Unless \( \kappa \) is very large, or the vol of vol is very small, the second term will often dominate. In plain words, the prices of long-term variance swaps are almost purely dictated by the vol of vol when the speed of mean reversion is lower than 1.0.

Below is an example of fit to the market. If the vol of vol and kappa are exogeneous and we calibrate only the initial vol v0, plus the long term vol theta to the term-structure of variance swaps, then we end up with upward shapes for the term-structure, regardless of theta. Only when we add the vol of vol to the calibration, we find a reasonable solution. The solution is however not really better than the one corresponding to the Heston fit with only 2 parameters. It thus looks overparameterized.

Term-structure of variance swap prices on Russell 2000 index

There are some cases where the Schobel-Zhu model allows for a better fit than Heston, and makes use of the flexibility due to the vol of vol.

Term-structure of variance swap prices on SPX 500 index

It is awkward that the variance swap term-structure depends on the vol of vol in the Schobel-Zhu model.

New Basket Expansions and Cash Dividends

In the previous post, I presented a new stochastic expansion for the prices of Asian options. The stochastic expansion is generalized to basket options in the paper, and can thus be applied on the problem of pricing vanilla options with cash dividends.

I have updated the paper with comparisons to more direct stochastic expansions for pricing vanilla options with cash dividends, such as the one of Etoré and Gobet, and my own refinement on it.

The second and third order basket expansions turn out to be significantly more accurate than the previous, more direct stochastic expansions, even on extreme examples.

Extreme case with large volatility (80% on 1 year) and two unrealistically large dividends of 25 (spot=100). VGn and VLn are stochastic expansions of order n using two different proxies. EG3 and LL3 are the third order expansions of Etore-Gobet and Le Floc'h. Deelstra is the refined Curran moment matching technique.

Long maturity (10y) with dividend (amount=2) every six month. VGn and VLn are stochastic expansions of order n using two different proxies. EG3 and LL3 are the third order expansions of Etore-Gobet and Le Floc'h. Deelstra is the refined Curran moment matching technique.

New Approximations for the Prices of Asian and basket Options

Many years ago, I had applied the stochastic expansion technique of Etore and Gobet to a refined proxy, in order to produce more accurate prices for vanilla options with cash dividends under the Black-Scholes model with deterministic jumps at the dividend dates. Any approximation for vanilla basket option prices can also be applied on this problem, and the sophisticated Curran geometric conditioning was found to be particularly competitive in The Pricing of Vanilla Options with Cash Dividends as a Classic Vanilla Basket Option Problem.

Recently, I had the idea of applying the same stochastic expansion technique to the prices of Asian options, and more generally to the prices of vanilla basket options. It works surprising well for Asian options. A main advantage is that the expansion is straightforward to implement: there is no numerical solving or numerical quadrature needed.

VGn and VLn are stochastic expansions of order n using two different proxies

It works a little bit less well for basket options. Even though I found a better proxy for those, the expansions behave less well with large volatility (really, large total variance), regardless of the proxy. I notice now this was also true for the case of discrete dividends where, clearly the accuracy deteriorates somewhat significantly in “extreme” examples such as a vol of 80% for an option maturity of 1 year (see Figure 3).

I did not compare the use of the direct stochastic expansion for discrete dividends, and the one going through the vanilla basket expansion, maybe for a next post.

Easy Mistake With the Log-Euler Discretization On Black-Scholes

In the Black-Scholes model with a term-structure of volatilities, the Log-Euler Monte-Carlo scheme is not necessarily exact.

This happens if you have two assets \(S_1\) and \(S_2\), with two different time varying volatilities \(\sigma_1(t), \sigma_2(t) \). The covariance from the Ito isometry from \(t=t_0\) to \(t=t_1\) reads $$ \int_{t_0}^{t_1} \sigma_1(s)\sigma_2(s) \rho ds, $$ while a naive log-Euler discretization may use $$ \rho \bar\sigma_1(t_0) \bar\sigma_2(t_0) (t_1-t_0). $$ In practice, the \( \bar\sigma_i(t_0) \) are calibrated such that the vanilla option prices are exact, meaning $$ \bar{\sigma}_i^2(t_0)(t_1-t_0) = \int_{t_0}^{t_1} \sigma_i^2(s) ds.$$

As such the covariance of the log-Euler scheme does not match the covariance from the Ito isometry unless the volatilities are constant in time. This means that the prices of European vanilla basket options is going to be slightly off, even though they are not path dependent.

The fix is of course to use the true covariance matrix between \(t_0\) and \(t_1\).

The issue may also happen if instead of using the square root of the covariance matrix in the Monte-Carlo discretization, the square root of the correlation matrix is used.

Next