Dearbitraging a weak smile on SVI with Damghani's method

Yesterday, I wrote about some calendar spread arbitrages with SVI. Today I am looking at the famous example of butterfly spread arbitrage from Axel Vogt. $$(a, b, m, \rho, \sigma) = (−0.0410, 0.1331, 0.3586, 0.3060, 0.4153)$$ The parameters obey the weak no-arbitrage constraint of Gatheral, and yet produce a negative density, or equivalently, a negative denominator in the local variance Dupire formula. Those parameters are mentioned in Jim Gatheral and Antoine Jacquier paper on arbitrage free SVI volatility surfaces and also in Damghani’s paper dearbitraging a weak smile.

Gatheral proposes to adjust just two of the parameters (corresponding to call wing and minimum variance) and run an optimizer with a large penalty on arbitrage to find the best-fit SVI parameters. The arbitrage can be easily detected by just computing the local variance denominator, which is not much more costly than computing the implied variance (the analytical derivatives come almost for free):

It is also possible include this penalty directly in the Nelder-Mead minimization of Zeliade’s quasi explicit calibration. In this case, all the parameters will be optimized. It is also not much slower than the more classic minimization without penalty. While it results a much lower RMSE, the solution might not be as visually pleasing as Gatheral’s one.

I was curious to see what Damghani’s technique would produce on this same example. Instead of doing the minimization with the full denominator evaluation, Damghani uses a simpler no-arbitrage necessary constraint on the first derivative:

Now, it seems twisted to not enforce directly the real no arbitrage constraint in the minimization, which is actually not much more complex to evaluate, and to instead rely on some weak condition, that we make stronger by steps. There would be no loop over the minimization needed. Does this produce any good result?

implied variance with Axel Vogt SVI parameters

The graph is clear, the technique produces crap. This is the local variance denominator:

local variance denominator g with Axel Vogt SVI parameters

Maybe I misunderstood something big in this paper, but so far it was just a waste of time.

Arbitrage in Zeliade's SVI example

Zeliade wrote an excellent paper about the calibration of the SVI parameterization for the volatility surface in 2008. I just noticed recently that their example calibration actually contained strong calendar spread arbitrages. This is not too surprising if you look at the parameters, they vary wildly between the first and the second expiry.

T a b rho m s
0.082 0.027 0.234 0.068 0.100 0.028
0.16 0.030 0.125 -1.0 0.074 0.050
0.26 0.032 0.094 -1.0 0.093 0.041

The calendar spread arbitrage is very visible in total variance versus log-moneyness graph: in those coordinates if lines crosses, there is an arbitrage. This is because the total variance should be increasing with the expiry time.

Arbitrage in Zeliade's example

Why does this happen?

This is typically because the range of moneyness of actual market quotes for the first expiry is quite narrow, and looks more like a smile than a skew. The problem is that SVI is then quite bad at extrapolating this smile, likely because the SVI wings were used to fit well the curvature and have nothing to do with any actual market wings.

A consequence is that that the local volatility will be undefined in the right wing of the second and third expiries, if we keep the first expiry. It is interesting to look at what happens to the implied volatility if we decide either to:

  • set the undefined local volatility to zero
  • take the absolute value of the local variance
  • search for the closest defined local volatility on the log-moneyness axis
  • search for the closest positive local variance on the expiry axis and interpolate it linearly.

Total variance after flooring the local variance at zero.

Setting the undefined local volatility to zero makes the second, third, fourth expiry wings higher: the total variance is close to constant between expiries. One could have expected that a zero vol would lead to a lower implied vol, the opposite happens, because the original local vol is negative, so flooring it zero is like increasing it.

We can now deduce that taking the absolute value of the local variance is just going to push the implied variance even higher, and will therefore create a larger bias. Similarly, searching for closest local volatility is not going to improve anything.

Fixing the local volatility after the facts produces only a forward looking fix: the next expiries are going to be adjusted as a result.

Instead, in this example, the first maturity should be adjusted. A simple adjustment would be to cap the total variance of the first expiry so that it is never higher than the next expiry, before computing the local volatility. Although the resulting implied volatility will not be C2 or even C1 at the cap, the local volatility can be computed analytically on the left side, before the cap, and also analytically after the cap, on the right side. More care needs to be taken if the next expiry also needs to be capped (for example because there is another calendar spread arbitrage between expiry two and expiry three). In this case, the analytical calculation must be split in three zones: first-nocap + second-nocap, first-cap+second-nocap, first-cap+second-cap. So in reality having non C2 extrapolation can work well with local volatility if we are careful enough to avoid the artificial spike at the points of discontinuity.

There is yet another solution to produce a similar effect while still working at the local volatility level: if there is an arbitrage with the previous expiry at a given moneyness, we compute the local volatility ignoring the previous expiry (eventually extrapolating in constant manner) and we override the previous expiry local volatility for this moneyness. In terms of implied variance, this would correspond to removing the arbitrageable part of a given expiry, and replacing it with a linear interpolation between encompassing expiries, working backwards in time.

Dupire Local Volatility with Cash Dividends Part 2

I had a look at how to price under Local Volatility with Cash dividends in my previous post. I still had a somewhat large error in my FDM price. After too much time, I managed to find the culprit, it was the extrapolation of the prices when applying the jump continuity condition \(V(S,t_\alpha^-) = V(S-\alpha, t_\alpha^+) \) for an asset \(S\) with a cash dividend of amount \(\alpha\) at \( t_\alpha \).

I stumbled in the meantime on another alternative to compute the Dupire local volatility with cash dividends in the book of Lorenzo Bergomi, one can rely on the classic Gatheral formulation in terms of total implied variance \(w(y,T)\) function of log-moneyness:

In this case the total implied variance corresponds to the Black volatility of the pure process (the process without dividend jumps), that is, the Black volatility corresponding to (shifted) market option prices. If the reference data consists of model volatilities for the spot model (with known jumps at dividend dates), the market prices can be obtained by using a good approximation of the spot model, for example the improved Etore-Gobet expansion of this paper, not with the Black formula directly.

In theory, it should be more robust to work directly with implied variances as there is not the problem of dealing with a very small numerator and denominator of the option prices equivalent formula. In practice, if we rely on one of the Etore-Gobet expansions, it can be much faster to work directly with option prices if we are careful when computing the ratio, as this ratio can be obtained in closed form. In theory as well, we need to use a fine discretisation in time to represent the pure Black equivalent smile accurately. In practice, if we are not too bothered by a mismatch with the true theoretical model, introducing volatility slices just before/at the dividends is enough to reproduce the market prices exactly, as long as we make sure that those obey the option price continuity at the dividend \(C(S_0,K,t_{\alpha}^-) = C(S_0, K-\alpha,t_{\alpha}^+)\). The difference is that the interpolation in time (linear in total variance) is going to describe a slightly different dynamic from the true spot process. The advantage, is that then, it is much faster.

I thought it would be interesting to have a look at the various volatilities. My initial spot volatility is a simple smile constant in time:

Model volatility. Before,At = before the dividend, at the dividend. Log scale for strikes.

Actually it’s not all that constant since there is the need to introduce a shift at the dividend date to obey the option price continuity relationship. The pure process model volatility is however constant.

Pure process model volatility.

The equivalent pure process Black vols looks like this:

Pure process Black volatility

Notice how it does not jump at the dividend date, and how the cash dividend results in a lower Black volatility at maturity. The local volatility is:

Dupire Local Volatility under the spot model with jump at dividend date = 3.5

While the one of the pure process is:

Pure Process Local Volatility under the spot model with jump at dividend date = 3.5

It falls towards zero for low pure strikes or equivalently near the market strike corresponding to the dividend amount.

So far, this is not too far from the textbook theory. Now it happens that there is another subtlety related to the dividend policy. The dividend policy defines what happens when the spot price is lower than the cash dividend. Haug, Haug and Lewis defined two policies, liquidator and survivor as

\(V(S,K,t_\alpha^-) = V(0,K, t_\alpha^+) \) for the liquidator - the stock drops to zero. \(V(S,K,t_\alpha^-) = V(S,K, t_\alpha^+) \) for the survivor - the dividend is not paid.

Not applying any particular policy would mean that the stock price can become negative: \(V(S,K,t_\alpha^-) = V(S-\alpha,K, t_\alpha^+) \) also when \( S < \alpha \).

Which one should we use? The approximation formulae (when not adjusted by the call price at strike 0) actually correspond to the “no dividend policy” rule. It can be verified numerically on extreme scenarios. It appears then natural that the finite difference scheme should also follow the same policy. And the volatility slice after the dividend date is really a constant size shift of the volatility slice just before.

It becomes however more surprising if the model of reference is the liquidator (or the survivor) policy, that is, the market option prices are computed with a nearly exact numerical method according to the liquidator policy. In this case the volatility slice after the dividend date is not merely a constant size shift anymore, as the dividend policy will impact the option price continuity relationship.

Liquidator model volatility. Before,At = before the dividend, at the dividend. Notice the difference in the BeforeShifted curve compared to the graph with no dividend policy.

It turns out, that then, using the same policy in the FDM scheme at the dividend dates will actually create a bias, while discarding the dividend policy will make the method converge to the correct price. How can this be? My interpretation is that the Dupire local volatility already includes the dividend policy effect, and therefore it should not be taken into account once more in the finite difference scheme dividend jump condition. It is only a partial explanation, since imposing a liquidator policy via the jump condition seems to actually never work with Dupire (on non flat vols), even if the Dupire local volatility does not include the dividend policy effect.

Dupire Local Volatility under the spot model with liquidator policy and jump at dividend date = 3.5

Of course, this is not true when the volatility is constant until the option maturity (no smile, no Dupire). In this case, the policy must be enforced via the jump condition in the finite difference scheme.

Note that this effect is not always simple to see, in my example, it is visible, especially on Put options of low strikes, because the volatility surface wings imply a high volatility when the strike is low and the option maturity is relatively long (5 years). For short maturities or lower volatilities, the dividend policy impact on the price is too small to be noticed.

Dupire Local Volatility with Cash Dividends

The Dupire equation for local volatility has been derived under the assumption of Martingality, that means no dividends or interest rates. The extension to continuous dividend yield is described in many papers or books:

With cash dividends however, the Black-Scholes formula is not valid anymore if we suppose that the asset jumps at the dividend date of the dividend amount. There are various relatively accurate approximations available to price an option supposing a constant (spot) volatility and jumps, for example, this one.

Labordère, in his paper Calibration of local stochastic volatility models to market smiles, describes a mapping to obtain the market local volatility corresponding to the model with jumps, from the local volatility of a pure Martingale. Assuming no interest rates and no proportional dividend, the equations looks particularly simple, it can be simplified to:

While it appears very simple, in practice, it is not so much. For example, let’s consider a single maturity smile (a smile, constant in time) as the market reference spot vols. Which volatility should be used in the formula for \( \hat{C} \)? logically, it should be the volatility corresponding to the market option price of strike K and maturity T. The numerical derivatives will therefore make use of 4 distinct volatilities for K, K+dK, K-dK and T+dT. In the pricing formula, we can wonder if at T+dT, the price should include the eventual additional dividend or not (as it is infinitesimal, probably not).

It turns out that applying the above formula leads to jumps in time in the local volatility, around the dividend date, even though our initial market vols were flat in time.

Dupire Local Volatility under the spot model with jump at dividend date = 3.5 on a single constant in time spot smile.

The mistake is not that clear. It turns out that, when using a single volatility slice, extrapolated in constant manner, the option continuity relationship around the dividend maturity is not respected. We must have

Dupire Local Volatility under the spot model with jump at dividend date = 3.5, introducing a slice before the dividend date to enforce the price continuity relationship.

Look, when we shift by the dividend!

The slightly funny shape of the local volatility in the left wing, before the dividend, is due to linear extrapolation use.

Interestingly, there is quite a difference in the local volatility for low strikes, depending on the dividend policy, here is how it looks for a single dividend with liquidator vs survivor policy.

Dupire Local Volatility under the spot model with jump at dividend date = 3.5 on a single constant in time spot smile.

The analytical approximations lead to another different local volatility from both policies, but don’t differ much from each other. Guyon and Labordere approximation based on the skew averaging technique of Piterbarg, not displayed here, is the worst. Gocsei-Sahel approximation has some issues for the lower strikes, the Etore-Gobet expansions on strike or on Lehman and the Zhang approximation are very stable and accurate, except for the very low strikes, where the dividend policy starts playing a more important role. Those differences however don’t impact the prices very much as the prices with the various methods (excepting the Guyon-Labordère approximation) are very close, and differ of a magnitude much smaller than the error to the true price.

The continuous yield approach consist in first building the equivalent Black volatility and use the regular Dupire on it. This is qualitatively different from a cash dividend jump, and is closer to a proportional dividend jump. Note that there is still a jump in the continuous yield local volatility:

Dupire Local Volatility under the forward model.

The huge spike at 3.5 don’t allow us to see much about what’s happening around:

Dupire Local Volatility under the forward model.

Overall, when the dividends happens early, the Dupire formula for cash dividends works well, but when the dividends are closer to maturity there is a marked bias in the prices, that does not disappear with more steps in the FDM. Typically, at \(0.7T\), the absolute price error is around 0.01 and more or less constant accross strikes. In contrast, the classic continuous yield Dupire behaves well, despite the spike, and the error decreases with the number of steps, I obtain around 5E-4 with 500 steps.

I would have expected a much better accuracy from Labordère’s approach, it’s still not entirely clear to me if there is not an error lurking somewhere. This is how an apparently simple formula can become a nightmare to use in practice.

Update: A follow-up to this post where I resolve the mysterious remaining error.

SVI, SABR, or parabola on AAPL?

In a previous post, I took a look at least squares spline and parabola fits on AAPL 1m options market volatilities. I would have imagined SVI to fit even better since it has 5 parameters, and SABR to do reasonably well.

It turns out that the simple parabola has the lowest RMSE, and SVI is not really better than SABR on that example.

SVI, SABR, least squares parabola fitted to AAPL 1m options

Note that this is just one single example, unlikely to be representative of anything, but I thought this was interesting that in practice, a simple parabola can compare favorably to more complex models.

Adaptive Filon quadrature for stochastic volatility models

A while ago, I have applied a relatively simple adaptive Filon quadrature to the problem of volatility swap pricing. The Filon quadrature is an old quadrature from 1928 that allows to integrate oscillatory integrand like \(f(x)\cos(k x) \) or \(f(x)\sin(k x) \). It turns out that combined with an adaptive Simpson like method, it has many advantages over more generic adaptive quadrature methods like Gauss-Lobatto, which is often used on similar problems.

Flinn derived a similar quadrature, but taking into account the derivative values, which are likely to be easily available on those problems. This produces a even more robust quadrature and of higher convergence order.

I was curious how practical this would be on Heston or other stochastic volatility models with a known characteristic function. It turns out it produces a very competitive performance-wise and very stable option pricing method: it can be up to five times faster than an adaptive Gauss-Lobatto within a calibration. I found it faster and more robust than the Cos method (especially as it is quite tricky to guess properly the truncation of the Cos method).

If you are interested, you will find much more details in this document.

Least Squares Rational Function

In my paper “Fast and Accurate Analytic Basis Point Volatility”, I use a table of Chebyshev polynomials to provide an accurate representation of some function. This is an idea I first saw in the Faddeeva package to represent the cumulative normal distribution with high accuracy, and high performance. It is also simple to find out the Chebyshev polynomials, and which intervals are the most appropriate for those, which makes this technique quite appealing.

Still, it would have been nice to have also a more visually appealing rational function representation, as W. Shaw did for the cumulative normal distribution (again). Popular algorithms to find the best rational function representation seem to be minimax or Remez. But I could not find an open-source library where those were implemented. There is an interesting implementation in chebfun but this depends on Matlab.

The Numerical recipe book provides a simple algorithm in chapter 5.13, not looking for the best possible rational function, but just for one that would be “good enough”. Interestingly, the first part of the algorithm merely computes a least squares solution on some chebyshev like nodes. I however quickly noticed funny behaviors with the code: it could produce a worse fit for a higher order numerator or denominator. Then I tried some least squares python code which ended up being just buggy: I am not sure what the code actually does with all the numpy and scipy magic, it gives solutions with poles in the data, and clearly not the least squares solution. I can’t fully understand why one would propose such a code.

It turns out that I had an alternative very basic least squares polynomial fit implementation, which is based on this matrix representation. I wondered if it would be as prone to errors as Numerical recipe code (where they use SVD internally to solve).

The answer is: it depends. It depends on the solver used. If I use a QR solver, then the implementation looks robust on my test data, much more than Numerical recipe code. If I use LU, it just fails in some cases. If I use SVD, it is sometimes better sometimes worse than Numerical recipe.

Interestingly, I thought, that, maybe, a gradient descent could allow to regain the lost accuracy with SVD, using as starting point, the SVD guess. However, it does not work, it just converges more or less to the same (bad) solution.

Another interesting point, is that the using QR decomposition (instead of SVD) in the Numerical recipe implementation resulted in a much better solution, better even than my basic least squares fit, which looked robust, but was actually not so much.

Armed with this improved Numerical recipe code (labeled NRI), here is a comparison of fit of my naive code (with QR) against the improved numerical recipe (NRI) for a polynomial of degree 20. We can visually see the difference (when we zoom)!

Least squares polynomial fit of degree 20 zoomed.

The RMSE difference on the full interval [0,1] on 1000 equidistant points is huge:

Method RMSE
Polynomial Naive 0.234
Polynomial NRI 0.039
Rational NRI 0.001

A 10/10 rational function (numerator of degree 10, denominator of degree 10) gives a much better fit than a polynomial of degree 20. It is interesting to look at the error visually to understand how large is the difference:

Least squares polynomial fit error of degree 20.

I still can draw a conclusion to my quest for a rational function approximation: I won’t find a good one with the change of variables I am using in my paper, as I would imagine the least squares solution to be at worst around one order of magnitudes away from the minimax. The errors I got suggest I would need a few rational functions, maybe 3 or more, and then it does not look all that appealing compared to the table of Chebyshev polynomials.

I thought this was a good example of how relatively simple numerical tasks can be challenging in practice.

Update - The paper now contains a solution for the normal volatility inversion problem with only 3 rational functions.

Least Squares Spline for Volatility Interpolation

I am experimenting a bit with least squares splines. Existing algorithms (for example from the NSWC Fortran library) usually work with B-splines, a relatively simple explanation of how it works is given in this paper (I think this is how De Boor coded it in the NSWC library). Interestingly there is an equivalent formulation in terms of standard cubic splines, although it seems that the pseudo code on that paper has errors.

Least squares splines give a very good fit for option implied volatilities with only a few parameters. In theory, the number of parameters is N+2 where N is the number of interpolation points. I tried on some of my AAPL 1 month option chain, with only 3 points (so 2 splines or 5 free parameters).

least squares spline on 1m AAPL options.

It would be interesting to add the natural constraints so that it can be linearly extrapolated. Maybe for next time.

The Mystic Parabola

I recently had some fun trying to work directly with the option chain from the Nasdaq website. The data there is quite noisy, but a simple parabola can still give an amazing fit. I will consider the options of maturity two years as illustration. I also relied on a simple implied volatility algorithm that can be summarized in the following steps:

  • Compute a rough guess for the forward price by using interest, borrow curves and by extrapolating the dividends.
  • Imply the forward from the European Put-Call parity relationship on the mid prices of the two strikes closes to the rough forward guess. A simple linear interpolation between the two strikes can be used to compute the forward.
  • Compute the Black implied volatilities as if the option were European using P. Jaeckel algorithm.
  • Calibrate the proportional dividend amount or the growth rate by minimizing, for example with a Levenberg-Marquardt minimizer, the difference between model and mid-option prices corresponding to the three strikes closest to the forward. The parameters in this case are effectively the dividend amount and the volatilities for Put and Call options (the same volatility is used for both options). The initial guess stems directly from the two previous steps. American option prices are computed by the finite difference method.
  • Solve numerically the volatilities one by one with the TOMS748 algorithm so that the model prices match the market mid out-of-the-money option prices.

Then I just fit a least squares parabola in variance on log-moneyness, using options trading volumes as weights and obtain the following figure:

least squares parabola on 2y AAPL options.

Isn’t the fit just amazing? Even if I found it surprising, it’s probably not so surprising. The curve has to be smooth, somewhat monotone, and will be therefore like a parabola near the money. While there is no guarantee it will fit that well far away, it’s actually a matter of scale. Short maturities will lead to not so great fit in the wings, while long maturities will correspond to a narrower range of scaled strikes and match better a parabola.

Yahoo Finance Implied Volatility

The option chain on Yahoo finance shows an implied volatility number for each call or put option in the last column. I was wondering a bit how they computed that number. I did not exactly find out their methodology, especially since we don’t even know the daycount convention used, but I did find that it was likely just garbage.

A red-herring is for example the large discrepancy between put vols and call vols. For example strike 670, call vol=50%, put vol=32%. This suggests that the two are completely decoupled, and they use some wrong forward (spot price?) to obtain those numbers. If I compute the implied volatilities using put-call parity close to the money to find out the implied forward price, I end up with ask vols of 37% and 34% or call and put mid vols of 33%. By considering the put-call parity, I assume European option prices, which is not correct in this case. It turns out however, that with the low interest rates we live in, there is nearly zero additional value due to the American early exercise.

I am not sure what use people can have of Yahoo implied volatilities.

Previous

Next