Yesterday, I wrote about some calendar spread arbitrages with SVI. Today I am looking at the famous example of butterfly spread arbitrage from Axel Vogt.
$$(a, b, m, \rho, \sigma) = (−0.0410, 0.1331, 0.3586, 0.3060, 0.4153)$$
The parameters obey the weak no-arbitrage constraint of Gatheral, and yet produce a negative density, or equivalently, a negative denominator in the local variance Dupire formula.
Those parameters are mentioned in Jim Gatheral and Antoine Jacquier paper on arbitrage free SVI volatility surfaces and also in Damghani’s paper dearbitraging a weak smile.
Gatheral proposes to adjust just two of the parameters (corresponding to call wing and minimum variance) and run an optimizer with a large penalty on arbitrage to find the best-fit SVI parameters.
The arbitrage can be easily detected by just computing the local variance denominator, which is not much more costly than computing the implied variance (the analytical derivatives come almost for free):
It is also possible include this penalty directly in the Nelder-Mead minimization of Zeliade’s quasi explicit calibration. In this case, all the parameters will be optimized. It is also not much slower than the more classic minimization without penalty.
While it results a much lower RMSE, the solution might not be as visually pleasing as Gatheral’s one.
I was curious to see what Damghani’s technique would produce on this same example. Instead of doing the minimization with the full denominator evaluation, Damghani uses a simpler no-arbitrage necessary constraint
on the first derivative:
Now, it seems twisted to not enforce directly the real no arbitrage constraint in the minimization, which is actually not much more complex to evaluate, and to instead rely on some weak condition, that we make stronger by steps.
There would be no loop over the minimization needed. Does this produce any good result?
The graph is clear, the technique produces crap. This is the local variance denominator:
Maybe I misunderstood something big in this paper, but so far it was just a waste of time.
Zeliade wrote an excellent paper about the calibration of the SVI parameterization for the volatility surface in 2008. I just noticed recently
that their example calibration actually contained strong calendar spread arbitrages. This is not too surprising if you look at the parameters,
they vary wildly between the first and the second expiry.
T
a
b
rho
m
s
0.082
0.027
0.234
0.068
0.100
0.028
0.16
0.030
0.125
-1.0
0.074
0.050
0.26
0.032
0.094
-1.0
0.093
0.041
The calendar spread arbitrage is very visible in total variance versus log-moneyness graph:
in those coordinates if lines crosses, there is an arbitrage. This is because the total variance should be increasing with the expiry time.
Why does this happen?
This is typically because the range of moneyness of actual market quotes for the first expiry is quite narrow, and looks more like a smile than a skew. The problem is that
SVI is then quite bad at extrapolating this smile, likely because the SVI wings were used to fit well the curvature and have nothing to do with any actual market wings.
A consequence is that that the local volatility will be undefined in the right wing of the second and third expiries, if we keep the first expiry.
It is interesting to look at what happens to the implied volatility if we decide either to:
set the undefined local volatility to zero
take the absolute value of the local variance
search for the closest defined local volatility on the log-moneyness axis
search for the closest positive local variance on the expiry axis and interpolate it linearly.
Setting the undefined local volatility to zero makes the second, third, fourth expiry wings higher: the total variance is close to constant between expiries. One could have expected
that a zero vol would lead to a lower implied vol, the opposite happens, because the original local vol is negative, so flooring it zero is like increasing it.
We can now deduce that taking the absolute value of the local variance is just going to push the implied variance even higher, and will therefore create a larger bias. Similarly, searching for closest local volatility
is not going to improve anything.
Fixing the local volatility after the facts produces only a forward looking fix: the next expiries are going to be adjusted as a result.
Instead, in this example, the first maturity should be adjusted. A simple adjustment would be to cap the total variance of the first expiry so that it is never higher than the next expiry, before computing the local volatility.
Although the resulting implied volatility will not be C2 or even C1 at the cap, the local volatility can be computed analytically on the left side, before the cap, and also analytically after the cap, on the right side.
More care needs to be taken if the next expiry also needs to be capped (for example because there is another calendar spread arbitrage between expiry two and expiry three). In this case, the analytical calculation must be split in three zones: first-nocap + second-nocap, first-cap+second-nocap, first-cap+second-cap.
So in reality having non C2 extrapolation can work well with local volatility if we are careful enough to avoid the artificial spike at the points of discontinuity.
There is yet another solution to produce a similar effect while still working at the local volatility level: if there is an arbitrage with the previous expiry at a given moneyness,
we compute the local volatility ignoring the previous expiry (eventually extrapolating in constant manner) and we override the previous expiry local volatility for this moneyness.
In terms of implied variance, this would correspond to removing the arbitrageable part of a given expiry, and replacing it with a linear interpolation between encompassing expiries, working backwards in time.
I had a look at how to price under Local Volatility with Cash dividends in my previous post. I still had a somewhat large error in my FDM price. After too much time, I managed to find the culprit, it was the extrapolation of the prices when applying the jump continuity condition \(V(S,t_\alpha^-) = V(S-\alpha, t_\alpha^+) \) for an asset \(S\) with a cash dividend of amount \(\alpha\) at \( t_\alpha \).
I stumbled in the meantime on another alternative to compute the Dupire local volatility with cash dividends in the book of Lorenzo Bergomi, one can rely on the
classic Gatheral formulation in terms of total implied variance \(w(y,T)\) function of log-moneyness:
In this case the total implied variance corresponds to the Black volatility of the pure process (the process without dividend jumps), that is, the Black volatility corresponding to (shifted) market option prices. If the reference data consists of model volatilities for the spot model (with known jumps at dividend dates), the market prices can be obtained by using a good approximation of the spot model, for example the improved Etore-Gobet expansion of this paper, not with the Black formula directly.
In theory, it should be more robust to work directly with implied variances as there is not the problem of dealing with a very small numerator and denominator of the option prices equivalent formula. In practice, if we rely on one of the Etore-Gobet expansions, it can be much faster to work directly with option prices if we are careful when computing the ratio, as this ratio can be obtained in closed form. In theory as well, we need to use a fine discretisation in time to represent the pure Black equivalent smile accurately. In practice, if we are not too bothered by a mismatch with the true theoretical model, introducing volatility slices just before/at the dividends is enough to reproduce the market prices exactly, as long as we make sure that those obey the option price continuity at the dividend \(C(S_0,K,t_{\alpha}^-) = C(S_0, K-\alpha,t_{\alpha}^+)\). The difference is that the interpolation in time (linear in total variance) is going to describe a slightly different dynamic from the true spot process. The advantage, is that then, it is much faster.
I thought it would be interesting to have a look at the various volatilities. My initial spot volatility is a simple smile constant in time:
Actually it’s not all that constant since there is the need to introduce a shift at the dividend date to obey the option price continuity relationship. The pure process model volatility is however constant.
The equivalent pure process Black vols looks like this:
Notice how it does not jump at the dividend date, and how the cash dividend results in a lower Black volatility at maturity. The local volatility is:
While the one of the pure process is:
It falls towards zero for low pure strikes or equivalently near the market strike corresponding to the dividend amount.
So far, this is not too far from the textbook theory. Now it happens that there is another subtlety related to the dividend policy. The dividend policy defines what happens when the spot price is lower than the cash dividend. Haug, Haug and Lewis defined two policies, liquidator and survivor as
\(V(S,K,t_\alpha^-) = V(0,K, t_\alpha^+) \) for the liquidator - the stock drops to zero.
\(V(S,K,t_\alpha^-) = V(S,K, t_\alpha^+) \) for the survivor - the dividend is not paid.
Not applying any particular policy would mean that the stock price can become negative:
\(V(S,K,t_\alpha^-) = V(S-\alpha,K, t_\alpha^+) \) also when \( S < \alpha \).
Which one should we use? The approximation formulae (when not adjusted by the call price at strike 0) actually correspond to the “no dividend policy” rule. It can be verified numerically on extreme scenarios. It appears then natural that the finite difference scheme should also follow the same policy. And the volatility slice after the dividend date is really a constant size shift of the volatility slice just before.
It becomes however more surprising if the model of reference is the liquidator (or the survivor) policy, that is, the market option prices are computed with a nearly exact numerical method according to the liquidator policy. In this case the volatility slice after the dividend date is not merely a constant size shift anymore, as the dividend policy will impact the option price continuity relationship.
It turns out, that then, using the same policy in the FDM scheme at the dividend dates will actually create a bias, while discarding the dividend policy will make the method converge to the correct price. How can this be? My interpretation is that the Dupire local volatility already includes the dividend policy effect, and therefore it should not be taken into account once more in the finite difference scheme dividend jump condition. It is only a partial explanation, since imposing a liquidator policy via the jump condition seems to actually never work with Dupire (on non flat vols), even if the Dupire local volatility does not include the dividend policy effect.
Of course, this is not true when the volatility is constant until the option maturity (no smile, no Dupire). In this case, the policy must be enforced via the jump condition in the finite difference scheme.
Note that this effect is not always simple to see, in my example, it is visible, especially on Put options of low strikes, because the volatility surface wings imply a high volatility when the strike is low and the option maturity is relatively long (5 years). For short maturities or lower volatilities, the dividend policy impact on the price is too small to be noticed.
The Dupire equation for local volatility has been derived under the assumption of Martingality, that means no dividends or interest rates.
The extension to continuous dividend yield is described in many papers or books:
With cash dividends however, the Black-Scholes formula is not valid anymore if we suppose that the asset jumps at the dividend date of the dividend amount. There are various relatively accurate
approximations available to price an option supposing a constant (spot) volatility and jumps, for example, this one.
Labordère, in his paper Calibration of local stochastic volatility models to market smiles, describes a mapping to obtain the market local volatility corresponding to the model with jumps, from the local volatility of a pure Martingale. Assuming no interest rates and no proportional dividend, the equations looks particularly simple, it can be simplified to:
While it appears very simple, in practice, it is not so much. For example, let’s consider a single maturity smile (a smile, constant in time) as the market reference spot vols. Which volatility should be used in the formula for \( \hat{C} \)? logically, it should be the volatility corresponding to the market option price of strike K and maturity T. The numerical derivatives will therefore make use of 4 distinct volatilities for K, K+dK, K-dK and T+dT. In the pricing formula,
we can wonder if at T+dT, the price should include the eventual additional dividend or not (as it is infinitesimal, probably not).
It turns out that applying the above formula leads to jumps in time in the local volatility, around the dividend date, even though our initial market vols were flat in time.
The mistake is not that clear. It turns out that, when using a single volatility slice, extrapolated in constant manner, the option continuity relationship around the dividend maturity is not respected. We must have
The slightly funny shape of the local volatility in the left wing, before the dividend, is due to linear extrapolation use.
Interestingly, there is quite a difference in the local volatility for low strikes, depending on the dividend policy, here is how it looks for a single dividend with liquidator vs survivor policy.
The analytical approximations lead to another different local volatility from both policies, but don’t differ much from each other. Guyon and Labordere approximation based on the skew averaging technique of Piterbarg, not displayed here, is the worst. Gocsei-Sahel approximation has some issues for the lower strikes, the Etore-Gobet expansions on strike or on Lehman and the Zhang approximation are very stable and accurate, except for the very low strikes, where the dividend policy starts playing a more important role.
Those differences however don’t impact the prices very much as the prices with the various methods (excepting the Guyon-Labordère approximation) are very close, and differ of a magnitude much smaller than the error to the true price.
The continuous yield approach consist in first building the equivalent Black volatility and use the regular Dupire on it. This is qualitatively different from a cash dividend jump, and is closer to a proportional dividend jump. Note that there is still a jump in the continuous yield local volatility:
The huge spike at 3.5 don’t allow us to see much about what’s happening around:
Overall, when the dividends happens early, the Dupire formula for cash dividends works well, but when the dividends are closer to maturity there is a marked bias in the prices, that does not disappear with more steps in the FDM. Typically, at \(0.7T\), the absolute price error is around 0.01 and more or less constant accross strikes. In contrast, the classic continuous yield Dupire behaves well, despite the spike, and the error decreases with the number of steps, I obtain around 5E-4 with 500 steps.
I would have expected a much better accuracy from Labordère’s approach, it’s still not entirely clear to me if there is not an error lurking somewhere. This is how an apparently simple formula can become a nightmare to use in practice.
Update: A follow-up to this post where I resolve the mysterious remaining error.
In a previous post, I took a look at least squares spline and parabola fits on AAPL 1m options market volatilities. I would have imagined SVI to fit even better since it has 5 parameters, and SABR to do reasonably well.
It turns out that the simple parabola has the lowest RMSE, and SVI is not really better than SABR on that example.
Note that this is just one single example, unlikely to be representative of anything, but I thought this was interesting that in practice, a simple parabola can compare favorably to more complex models.
A while ago, I have applied a relatively simple adaptive Filon quadrature to the problem of volatility swap pricing. The Filon quadrature is an old quadrature from 1928 that allows to integrate oscillatory integrand like \(f(x)\cos(k x) \) or \(f(x)\sin(k x) \). It turns out that combined with an adaptive Simpson like method, it has many advantages over more generic adaptive quadrature methods like Gauss-Lobatto, which is often used on similar problems.
Flinn derived a similar quadrature, but taking into account the derivative values, which are likely to be easily available on those problems. This produces a even more robust quadrature and of higher convergence order.
I was curious how practical this would be on Heston or other stochastic volatility models with a known characteristic function. It turns out it produces a very competitive performance-wise and very stable option pricing method: it can be up to five times faster than an adaptive Gauss-Lobatto within a calibration. I found it faster and more robust than the Cos method (especially as it is quite tricky to guess properly the truncation of the Cos method).
If you are interested, you will find much more details in this document.
In my paper “Fast and Accurate Analytic Basis Point Volatility”,
I use a table of Chebyshev polynomials to provide an accurate representation of some function. This is
an idea I first saw in the Faddeeva package to
represent the cumulative normal distribution with high accuracy, and high performance. It is also
simple to find out the Chebyshev polynomials, and which intervals are the most appropriate for those, which
makes this technique quite appealing.
Still, it would have been nice to have also a more visually appealing rational function representation, as W. Shaw
did for the cumulative normal distribution (again). Popular algorithms to find the best rational function representation
seem to be minimax or Remez. But I could not find an open-source library where those were implemented. There is an
interesting implementation in chebfun but this depends on Matlab.
The Numerical recipe book provides a simple algorithm in chapter 5.13, not looking for the best possible rational function, but just for one
that would be “good enough”. Interestingly, the first part of the algorithm merely computes a least squares solution
on some chebyshev like nodes. I however quickly noticed funny behaviors with the code: it could produce a worse fit
for a higher order numerator or denominator. Then I tried some least squares python code which ended up being just buggy:
I am not sure what the code actually does with all the numpy and scipy magic, it gives solutions with poles in the data, and clearly not the least squares solution. I can’t fully understand why one would propose such a code.
It turns out that I had an alternative very basic least squares polynomial fit implementation, which is based on this matrix representation.
I wondered if it would be as prone to errors as Numerical recipe code (where they use SVD internally to solve).
The answer is: it depends. It depends on the solver used. If I use a QR solver, then the implementation looks robust on my test data,
much more than Numerical recipe code. If I use LU, it just fails in some cases. If I use SVD, it is sometimes better sometimes worse than Numerical recipe.
Interestingly, I thought, that, maybe, a gradient descent could allow to regain the lost accuracy with SVD, using
as starting point, the SVD guess. However, it does not work, it just converges more or less to the same (bad) solution.
Another interesting point, is that the using QR decomposition (instead of SVD)
in the Numerical recipe implementation resulted in a much better solution, better even than my basic least squares fit,
which looked robust, but was actually not so much.
Armed with this improved Numerical recipe code (labeled NRI), here is a comparison of fit of my naive code (with QR) against the improved numerical recipe
(NRI) for a polynomial of degree 20. We can visually see the difference (when we zoom)!
The RMSE difference on the full interval [0,1] on 1000 equidistant points is huge:
Method
RMSE
Polynomial Naive
0.234
Polynomial NRI
0.039
Rational NRI
0.001
A 10/10 rational function (numerator of degree 10, denominator of degree 10) gives a much better fit than a polynomial of degree 20. It is interesting to look at the error
visually to understand how large is the difference:
I still can draw a conclusion to my quest for a rational function approximation: I won’t find a good one with the
change of variables I am using in my paper, as I would imagine the least squares solution to be at worst around one order of magnitudes
away from the minimax. The errors I got suggest I would need a few rational functions, maybe 3 or more, and then it does not look
all that appealing compared to the table of Chebyshev polynomials.
I thought this was a good example of how relatively simple numerical tasks can be challenging in practice.
Update - The paper now contains a solution for the normal volatility inversion problem with only 3 rational functions.
I am experimenting a bit with least squares splines. Existing algorithms (for example from the NSWC Fortran library) usually work
with B-splines, a relatively simple explanation of how it works is given in this paper (I think this is how De Boor coded it in the NSWC library).
Interestingly there is an equivalent formulation in terms of standard cubic splines, although it seems that the
pseudo code on that paper has errors.
Least squares splines give a very good fit for option implied volatilities with only a few parameters.
In theory, the number of parameters is N+2 where N is the number of interpolation points.
I tried on some of my AAPL 1 month option chain, with only 3 points (so 2 splines or 5 free parameters).
It would be interesting to add the natural constraints so that it can be linearly extrapolated. Maybe for next time.
I recently had some fun trying to work directly with the option chain from the Nasdaq website.
The data there is quite noisy, but a simple parabola can still give an amazing fit. I will consider the options of maturity two years as illustration.
I also relied on a simple implied volatility algorithm that can be summarized in the following steps:
Compute a rough guess for the forward price by using interest, borrow curves and by extrapolating the dividends.
Imply the forward from the European Put-Call parity relationship on the mid prices of the two strikes closes to the rough forward guess. A simple linear interpolation between the two strikes can be used to compute the forward.
Compute the Black implied volatilities as if the option were European using P. Jaeckel algorithm.
Calibrate the proportional dividend amount or the growth rate by minimizing, for example with a Levenberg-Marquardt minimizer, the difference between model and mid-option prices corresponding to the three strikes closest to the forward. The parameters in this case are effectively the dividend amount and the volatilities for Put and Call options (the same volatility is used for both options). The initial guess stems directly from the two previous steps. American option prices are computed by the finite difference method.
Solve numerically the volatilities one by one with the TOMS748 algorithm so that the model prices match the market mid out-of-the-money option prices.
Then I just fit a least squares parabola in variance on log-moneyness, using options trading volumes as weights and obtain the following figure:
Isn’t the fit just amazing?
Even if I found it surprising, it’s probably not so surprising. The curve has to be smooth, somewhat monotone, and will be therefore like a parabola near the money. While there is no guarantee it will fit that well far away, it’s actually a matter of scale. Short maturities will lead to not so great fit in the wings, while long maturities will correspond to a narrower range of scaled strikes and match better a parabola.
The option chain on Yahoo finance shows an implied volatility number for each call or put option in the last column.
I was wondering a bit how they computed that number. I did not exactly find out their methodology, especially since we don’t even know the daycount convention used, but
I did find that it was likely just garbage.
A red-herring is for example the large discrepancy between put vols and call vols. For example strike 670, call vol=50%, put vol=32%.
This suggests that the two are completely decoupled, and they use some wrong forward (spot price?) to obtain those numbers. If I compute
the implied volatilities using put-call parity close to the money to find out the implied forward price, I end up with ask vols of 37% and 34% or call and put mid vols of 33%.
By considering the put-call parity, I assume European option prices, which is not correct in this case. It turns out however, that with the low interest rates we live in, there is nearly zero additional value due to the American early exercise.
I am not sure what use people can have of Yahoo implied volatilities.