Where I work, there used to be quite a bit of a confusion on which rates one should use as input to a Local Volatility Monte-Carlo simulation.
In particular there is a paper in the Journal of Computation Finance by Andersen and Ratcliffe “The Equity Option Volatility Smile: a Finite Difference Approach” which explains one should use specially tailored rates for the finite difference scheme in order to reproduce exact Bond price and exact Forward contract prices
Code has been updated and roll-backed, people have complained around it. But nobody really made the effort to simply write clearly what’s going on, or even write a unit test around it. So it was just FUD, until this paper.
In short, for log-Euler, one can use the intuitive forward drift rate: r1t1-r0t0 (ratio of discount factors), but for Euler, one need to use a less intuitive forward drift rate to reproduce a nearly exact forward price.
I have been wondering if there was any better alternative than the standard Sobol (+ Brownian Bridge) quasi random sequence generator for the Monte Carlo simulations of finance derivatives.
Here is what I found:
Scrambled Sobol. The idea is to rerandomize the quasi random numbers slightly. It can provide better uniformity properties and allows for a real estimate of the standard error. There are many ways to do that. The simple Cranley Patterson rotation consisting in adding a pseudo random number modulo 1, Owen scrambling (permutations of the digits) and simplifications of it to achieve a reasonable speed. This is all very well described in Owen Quasi Monte Carlo document
Lattice rules. It is another form of quasi random sequences, which so far was not very well adapted to finance problems. A presentation from Giles & Kuo look like it's changing.
Fast PCA. An alternative to Brownian Bridge is the standard PCA. The problem with PCA is the performance in O(n^2). A possible speedup is possible in the case of a equidistant time steps. This paper shows it can be generalized. But the data in it shows it is only advantageous for more than 1024 steps - not so interesting in Finance.
Recently, that I completed a project that I had initially estimated to around 2 months, in nearly 4 hours. This morning I fixed the few remaining bugs. I looked at the clock, surprised it was still so early and I still had so many hours left in the day.
Now I have more time to polish the details and go beyond the initial goal (I think this scares my manager a bit), but I could (and I believe some people do this often) stop now and all the management would be satisfied.
What’s interesting is that everybody bought the 2 months estimate without questions (I almost even believed it myself). This reminded me of my productivity zero post.
Glasserman and Yu (GY) give a relatively simple algorithm to compute lower and upper bounds of a the price of a Bermudan Option through Monte-Carlo.
I always thought it was very computer intensive to produce an upper bound, and that the standard Longstaff Schwartz algorithm was quite precise already. GY algorithm is not much slower than the Longstaff-Schwartz algorithm, but what's a bit tricky is the choice of basis functions: they have to be Martingales. This is the detail I overlooked at first and I, then, could not understand why my results were so bad. I looked for a coding mistake for several hours before I figured out that my basis functions were not Martingales. Still it is possible to find good Martingales for the simple Bermudan Put option case and GY actually propose some in another paper.
Here are some preliminary results where I compare the random number generator influence and the different methods. I include results for GY using In-the-money paths only for the regression (-In suffix) or all (no suffix).
value using 16k training path and Sobol - GY-Low-In is very close to LS.
error using 16k training path - a high number of simulations not that useful
error using 1m simulation paths - GY basis functions require less training than LS
One can see the the upper bound is not that precise compared to the lower bound estimate, and that using only in the money paths makes a big difference. GY regression is good with only 1k paths, LS requires 10x more.
Surprisingly, I noticed that the Brownian bridge variance reduction applied to Sobol was increasing the GY low estimate, so as to make it sometimes slightly higher than Longstaff-Schwartz price.
Already the name United Nations should be suspicious, but now they are shown to have spread Cholera to Haiti, as if the country did not have enough suffering. They have a nice building in New-York, and used to have a popular representative, but unfortunately, for poor countries, they never really achieved much. In Haiti, there were many stories of rapes and corruption by U.N. members more than 10 years ago. The movie The Whistleblower suggests it was the same in the Balkans. I am sure it did not change much since.
Dropbox worked well, but the company decided to blacklist it. I suppose some people abused it. While looking for an alternative, I found Ubuntu One. It’s funny I never tried it before even though I use Ubuntu. I did not think it was a dropbox replacement, but it is. And you get 5GB instead of Dropbox 2GB limit, which is enough for me (I was a bit above the 2GB limit). It works well under Linux but as well on Android and there is an iOS app I have not yet tried. It also works on Windows and Mac.
In the book Monte Carlo Methods in Financial Engineering, Glasserman explains that if one reuses the paths used in the optimization procedure for the parameters of the exercise boundary (in this case the result of the regression in Longstaff-Schwartz method) to compute the Monte-Carlo mean value, we will introduce a bias: the estimate will be biased high because it will include knowledge about future paths.
However Longstaff and Schwartz seem to just reuse the paths in their paper, and Glasserman himself, when presenting Longstaff-Schwartz method later in the book just use the same paths for the regression and to compute the Monte-Carlo mean value.
How large is this bias? What is the correct methodology?
I have tried with Sobol quasi random numbers to evaluate that bias on a simple Bermudan put option of maturity 180 days, exercisable at 30 days, 60 days, 120 days and 180 days using a Black Scholes volatility of 20% and a dividend yield of 6%. As a reference I use a finite difference solver based on TR-BDF2.
I found it particularly difficult to evaluate it: should we use the same number of paths for the 2 methods or should we use the same number of paths for the monte carlo mean computation only? Should we use the same number of paths for regression and for monte carlo mean computation or should the monte carlo mean computation use much more paths?
I have tried those combinations and was able to clearly see the bias only in one case: a large number of paths for the Monte-Carlo mean computation compared to the number of paths used for the regression using a fixed total number of paths of 256*1024+1, and 32*1024+1 paths for the regression.
Those numbers are too good to be a real. If one reduces too much the total number of paths or the number of paths for the regression, the result is not precise enough to see the bias. For example, using 4K paths for the regression leads to 2.83770 vs 2.83767. Using 4K paths for regression and only 16K paths in total leads to 2.8383 vs 2.8387. Using 32K paths for regressions and increasing to 1M paths in total leads to 2.838539 vs 2.838546.
For this example the Longstaff-Schwartz price is biased low, the slight increase due to path reuse is not very visible and most of the time does not deteriorate the overall accuracy. But as a result of reusing the paths, the Longstaff-Schwartz price might be higher than the real value.
In finance, because one often dicretize the log process instead of the direct process for Monte-Carlo simulation, the Math.exp function can be called a lot (millions of times for a simulation) and can be a bottleneck. I have noticed that the simpler Euler discretization was for local volatility Monte-Carlo around 30% faster, because it avoids the use of Math.exp.
Can we improve the speed of exp over the JDK one? At first it would seem that the JDK would just call either the processor exp using an intrinsic function call and that should be difficult to beat. However what if one is ok for a bit lower accuracy? Could a simple Chebyshev polynomial expansion be faster?
Out of curiosity, I tried a Chebyshev polynomial expansion with 10 coefficients stored in a final double array. I computed the coefficient using a precise quadrature (Newton-Cotes) and end up with 1E-9, 1E-10 absolute and relative accuracy on [-1,1].
Here are the results of a simple sum of 10M random numbers:
0.75s for Math.exp sum=1.7182816693332244E7 0.48s for ChebyshevExp sum=1.718281669341388E7 0.40s for FastMath.exp sum=1.7182816693332244E7
So while this simple implementation is actually faster than Math.exp (but only works within [-1,1]), FastMath from Apache commons maths, that relies on a table lookup algorithm is just faster (in addition to being more precise and not limited to [-1,1]).
Of course if I use only 5 coefficients, the speed is better, but the relative error becomes around 1e-4 which is unlikely to be satisfying for a finance application.
0.78s for Math.exp sum=1.7182816693332244E7 0.27s for ChebyshevExp sum=1.718193001875838E7 0.40s for FastMath.exp sum=1.7182816693332244E7
Scaling: scaling the call price appropriately allows to increase the maximum precision significantly, because the Carr-Madan formula operates on log(Forward) and log(Strike) directly, but not the ratio, and alpha is multiplied by the log(Forward). I simply scale by the spot, the call price is (S_0*max(S/S_0-K/S0)). Here are the results for Lord-Kahl, Kahl-Jaeckel (the more usual way limited to machine epsilon accuracy), Forde-Jacquier-Lee ATM implied volatility without scaling for a maturity of 1 day:
Strike
Lord-Kahl
Kahl-Jaeckel
Forde-Jacquier-Lee
62.5
2.919316809400033E-34
8.405720564041985E-12
0.0
68.75
-8.923683388191852E-28
1.000266536266281E-11
0.0
75.0
-3.2319611910032E-22
2.454925152051146E-12
0.0
81.25
1.9401743410877718E-16
2.104982854689297E-12
0.0
87.5
-Infinity
-1.6480150577535824E-11
0.0
93.75
Infinity
1.8277663826893331E-9
1.948392142070432E-9
100.0
0.4174318393886519
0.41743183938679845
0.4174314959743768
106.25
1.326968012594355E-11
7.575717830832218E-11
1.1186618909114702E-11
112.5
-5.205783145942609E-21
2.5307755890935368E-11
6.719872683111381E-45
118.75
4.537094156599318E-25
1.8911094912255066E-11
3.615356241778357E-114
125.0
1.006555799739525E-27
3.2365221613872563E-12
2.3126009701775733E-240
131.25
4.4339539263484925E-31
2.4794388764348696E-11
0.0
One can see negative prices and meaningless prices outside ATM. With scaling it changes to:
Strike
Lord-Kahl
Kahl-Jaeckel
Forde-Jacquier-Lee
62.5
2.6668642552659466E-182
8.405720564041985E-12
0.0
68.75
7.156278101597845E-132
1.000266536266281E-11
0.0
81.25
7.863105641534119E-55
2.104982854689297E-12
0.0
87.5
7.073641308465115E-28
-1.6480150577535824E-11
0.0
93.75
1.8375145950924849E-9
1.8277663826893331E-9
1.948392142070432E-9
100.0
0.41743183938755385
0.41743183938679845
0.4174314959743768
106.25
1.3269785342953315E-11
7.575717830832218E-11
1.1186618909114702E-11
112.5
8.803247187972696E-42
2.5307755890935368E-11
6.719872683111381E-45
118.75
5.594342441346233E-90
1.8911094912255066E-11
3.615356241778357E-114
125.0
7.6539757567179276E-149
3.2365221613872563E-12
2.3126009701775733E-240
131.25
0.0
2.4794388764348696E-11
0.0
One can now now see that the Jacquier-Lee approximation is quickly not very good.
Put: the put option price can be computed using the exact same Carr-Madan formula, but using a negative alpha instead of a positive alpha. When I derived this result (by just reproducing the Carr-Madan steps with the put payoff instead of the call payoff), I was surprised, but it works.
In my previous post, I explored the Lord-Kahl method to compute the call option prices under the Heston model. One of the advantages of this method is to go beyond machine epsilon accuracy and be able to compute very far out of the money prices or very short maturities. The standard methods to compute the Heston price are based on a sum/difference where both sides are far from 0 and will therefore be limited to less than machine epsilon accuracy even if the integration is very precise.
However the big trick in it is to find the optimal alpha used in the integration. A suboptimal alpha will often lead to high inaccuracy, because of some strong oscillations that will appear in the integration. So the method is robust only if the root finding (for the optimal alpha) is robust.
The original paper looks the Ricatti equation for B where B is the following term in the characteristic function:
$$\phi(u) = e^{iuf+A(u,t)+B(u,t)\sigma_0}$$
The solution defines the \(\alpha_{max}\) where the characteristic function explodes. While the Ricatti equation is complex but not complicated:
$$ dB/dt = \hat{\alpha}(u)-\beta(u) B+\gamma B^2 $$
I initially did not understand its role (to compute \(\alpha_{max}\)), so that, later, one can compute alpha_optimal with a good bracketing. The bracketing is particularly important to use a decent solver, like the Brent solver. Otherwise, one is left with, mostly, Newton’s method. It turns out that I explored a reduced function, which is quite simpler than the Ricatti and seems to work in all the cases I have found/tried: solve $$1/B = 0$$ If B explodes, \(\phi\) will explode. The trick, like when solving the Ricatti equation, is to have either a good starting point (for Newton) or, better, a bracketing. It turns out that Lord and Kahl give a bracketing for (1/B), even if they don’t present it like this: their \(\tau_{D+}\) on page 10 for the lower bracket, and \(\tau_+\) for the upper bracket. \(\tau_+\) will make \(1/B\) explode, exactly. One could also find the next periods by adding \(4\pi/t\) instead of \(2\pi/t\) like they do to move from \(\tau_{D+}\) to \(\tau_+\). But this does not have much interest as we don’t want to go past the first explosion.
It’s quite interesting to see that my simple approach is actually closely related to the more involved Ricatti approach. The starting point could be the same. Although it is much more robust to just use Brent solver on the bracketed max. I actually believe that the Ricatti equation explodes at the same points, except, maybe for some rare combination of Heston parameters.
From a coding perspective, I found that Apache commons maths was a decent library to do complex calculus or solve/minimize functions. The complex part was better than some in-house implementation: for example the square root was more precise in commons maths, and the solvers are robust. It even made me think that it is often a mistake to reinvent to wheel. It’s good to choose the best implementations/algorithms as possible. But reinventing a Brent solver??? a linear interpolator??? Also the commons maths library imposes a good structure. In house stuff tends to be messy (not real interfaces, or many different ones). I believe the right approach is to use and embrace/extends Apache commons maths. If some algorithms are badly coded/not performing well, then write your own using the same kind of interfaces as commons maths (or some other good maths library).The next part of this series on Lord-Kahl method is here.