The Role of Erfcx and Erfc in the Black-Scholes Formula

This note compares special-function choices at the level that matters to the implied volatility solvers: the normalized Black beta price, not standalone erfcx(x).

For x = log(F/K) <= 0 and total volatility s = sigma sqrt(T), the beta-space OTM call price is

B(x, s) = 0.5 * (exp(x/2) * erfc(q1) - exp(-x/2) * erfc(q2))
q1 = -(x/s + s/2) / sqrt(2)
q2 = -(x/s - s/2) / sqrt(2)

The corresponding forward-normalized OTM-call price is c = B(x, s) / sqrt(ex), with ex = exp(x).

Implementations Compared

Mixed Erfc/Erfcx Fallback Logic

The mixed formula starts from the direct beta-space Black expression:

2B = exp(x/2) * erfc(q1) - exp(-x/2) * erfc(q2)

with

h = x / s
t = s / 2
q1 = -(h + t) / sqrt(2)
q2 = -(h - t) / sqrt(2)

For either term, the stable tail identity is:

exp( x/2) * erfc(q1) = exp(-0.5 * (h^2 + t^2)) * erfcx(q1)
exp(-x/2) * erfc(q2) = exp(-0.5 * (h^2 + t^2)) * erfcx(q2)

because q1^2 = 0.5 * (h + t)^2, q2^2 = 0.5 * (h - t)^2, and x = h*s = 2*h*t. The mixed implementation uses that identity term by term instead of forcing both terms through one representation.

The branch threshold is Cody’s 0.46875, the same split used by the classic error-function approximations. Arguments below it are kept in the ordinary erfc representation; arguments above it use scaled erfcx so that the small tail probability is not represented as a nearly-underflowed erfc multiplied by a compensating exponential.

if q1 < 0.46875 and q2 < 0.46875:
    2B = exp(x/2) * erfc(q1)
       - exp(-x/2) * erfc(q2)

if q1 < 0.46875 and q2 >= 0.46875:
    2B = exp(x/2) * erfc(q1)
       - exp(-0.5 * (h^2 + t^2)) * erfcx(q2)

if q1 >= 0.46875 and q2 < 0.46875:
    2B = exp(-0.5 * (h^2 + t^2)) * erfcx(q1)
       - exp(-x/2) * erfc(q2)

if q1 >= 0.46875 and q2 >= 0.46875:
    2B = exp(-0.5 * (h^2 + t^2)) * (erfcx(q1) - erfcx(q2))

On the valid Black domain, the third formal case is actually unreachable. Since t = s/2 >= 0,

q2 - q1 = sqrt(2) * t >= 0,

so q1 >= rho implies q2 >= rho as well. The erfcx, erfc branch would require q1 >= rho and q2 < rho at the same time; that would need negative t, not merely positive h = x/s. It is kept in the branch table only as the symmetric algebraic case.

So “mixed” does not mean that erfc is a fallback after erfcx fails. It means each of the two Black terms is routed to the representation that is locally better conditioned. Negative or central arguments stay with erfc, where erfc(q) is order one and cheap. Positive tail arguments switch to erfcx, where the scaled value remains order 1/q and avoids underflow.

This explains two patterns in the tables below. In the low-volatility OTM tail, direct erfc loses many ulps because it subtracts two tiny tail probabilities after ordinary scaling; the mixed erfcx versions are much better. In the high-volatility upper-price window, the pure-erfcx identity is worse because it exposes the cancellation in erfcx(q1) - erfcx(q2) even when ordinary erfc would have kept one side well behaved.

Reference And Local Windows

Reference values are Julia BigFloat at 512-bit precision, rounded back to Float64. ULP error is the positive-Float64 bit distance to that rounded reference. Each window contains 512 consecutive Float64 values in either s or x, with the other coordinate fixed.

id zone axis x s
central-atm central s -1.000000e-2 2.000000e-1
tiny-near-atm tiny-near-atm s -1.000000e-8 1.000000e-4
erfc-threshold branch-transition s -1.000000e-1 1.364500e-1
moderate-otm moderate-otm s -1.000000e0 5.000000e-1
low-vol-otm lower-tail s -1.000000e-1 2.000000e-2
deep-otm lower-tail s -3.000000e0 7.000000e-1
far-otm lower-tail s -6.000000e0 1.000000e0
high-vol upper-price s -1.000000e-2 1.000000e1
very-high-vol upper-price s -1.000000e0 5.000000e1
near-overflow-vol upper-price s -1.000000e0 7.500000e1
x-central x-window x -1.000000e-2 2.000000e-1
x-lower-tail x-window x -4.000000e0 8.000000e-1

Global Accuracy

backend samples invalid exact_bits p50 p95 p99 max max_rel worst_window
Default/current 6144 0 2301 1 15 24 40 7.4e-15 far-otm
Jaeckel acc libm+Commons 6144 0 2301 1 15 24 40 7.4e-15 far-otm
Jaeckel acc Cody+Cody 6144 0 2291 1 14 22 40 7.4e-15 far-otm
Jaeckel acc Johnson+Johnson 6144 0 2296 1 15 25 42 7.8e-15 far-otm
Jaeckel acc libm+Cody 6144 0 2291 1 14 22 40 7.4e-15 far-otm
Jaeckel acc libm+Johnson 6144 0 2296 1 15 25 42 7.8e-15 far-otm
Jaeckel I/II + pure Commons 6144 0 1277 4 686 1202 1570 2.9e-13 near-overflow-vol
Commons erfcx-mixed 6144 0 1212 5 2823 3019 3068 5.2e-13 tiny-near-atm
IG statmod upper 6144 0 1158 7 6372 6568 7666 1.5e-12 low-vol-otm
Cody erfcx-mixed 6144 0 1180 6 2823 3019 3068 5.2e-13 tiny-near-atm
Johnson erfcx-mixed 6144 0 1161 6 2823 3019 3068 5.2e-13 tiny-near-atm
libm erfc direct 6144 0 1116 10 3049 5952 10329 2.0e-12 low-vol-otm
Commons pure-erfcx 6144 0 255 13 1776 1972 2021 3.4e-13 tiny-near-atm
Cody pure-erfcx 6144 0 174 15 9968 10164 10213 1.7e-12 tiny-near-atm
Johnson pure-erfcx 6144 0 257 19 59120 59316 59365 1.0e-11 tiny-near-atm

The headline is different from the raw erfcx comparison. In the normalized Black formula, the largest errors in these local windows are not the raw negative-erfcx overflow tail. They are tiny near-ATM and low-volatility OTM cancellation zones. The default is much better there because it uses Jaeckel Region I/II expansions before falling back to the mixed erfc/erfcx formula. Once that region-dispatched formula is kept, backend choice barely moves the result: all Jaeckel-accurate variants have global p50 = 1 ulp and max = 40 to 42 ulps on this sample set.

Jaeckel Accurate Backend Sensitivity

The Jaeckel accurate path is not just the mixed formula with a different erfcx. It routes first through:

That dispatch is why the special-function backend differences collapse compared with the generic mixed or pure-erfcx formulas.

Selected Jaeckel-accurate rows:

backend global p50 p95 p99 max tiny max low-vol max erfc-threshold max far-otm max
Default/current 1 15 24 40 1 17 2 40
Jaeckel acc libm+Commons 1 15 24 40 1 17 2 40
Jaeckel acc Cody+Cody 1 14 22 40 1 17 2 40
Jaeckel acc Johnson+Johnson 1 15 25 42 1 17 2 42
Jaeckel acc libm+Cody 1 14 22 40 1 17 2 40
Jaeckel acc libm+Johnson 1 15 25 42 1 17 2 42

The paired Cody rows match the libm+Cody rows in aggregate, and the paired Johnson rows match the libm+Johnson rows in aggregate. In these windows, the small residual backend sensitivity comes from erfcx, not from swapping the ordinary erf/erfc implementation. Cody is fractionally better than Commons by p95/p99 in the far-OTM-dominated aggregate; Johnson is fractionally worse, with a 42-ulp maximum instead of 40. Those differences are tiny next to the thousands of ulps lost by the non-Jaeckel mixed/direct formulas in the tiny near-ATM and low-vol OTM windows.

Jaeckel I/II With Pure-Erfcx Fallback

A tempting simplification is to keep Jaeckel Regions I and II, but replace the Region III/middle fallback with the pure-erfcx identity:

B = 0.5 * exp(-0.5 * (h^2 + t^2)) * (erfcx(q1) - erfcx(q2))

With Apache Commons erfcx, this is not as accurate as the default (Apache Commons erfc and erfcx with all Jaeckel zones) overall:

backend global p50 p95 p99 max tiny max low-vol max high-vol max very-high max near-overflow max
Default/current 1 15 24 40 1 17 1 0 0
Jaeckel acc libm+Commons 1 15 24 40 1 17 1 0 0
Jaeckel I/II + pure Commons 4 686 1202 1570 1 17 60 745 1570

The simplification keeps the important Region II tiny near-ATM behavior and the Region I/II lower-tail behavior, so those windows match the default. The loss comes from middle/fallback cases, especially the high-volatility upper-price windows. There the pure-erfcx identity exposes cancellation in erfcx(q1) - erfcx(q2), while the mixed fallback keeps the locally better erfc representation for negative or central arguments.

It is not meaningfully faster in this diagnostic either. On the same optimized local-window timing run, the pure fallback row was 24.83 ns/call, compared with 24.80 ns/call for the same included Jaeckel formula with the mixed Commons fallback and 25.07 ns/call for the Default/current path. That difference is inside normal microbenchmark noise, and the generic pure-erfcx kernel was slower than the generic mixed Commons kernel (21.32 vs 19.80 ns/call).

Inverse Gaussian Formula Experiment

For x < 0, the Black beta price can be written as an inverse-Gaussian upper tail. In the statmod mean-one parameterization, let

q = -2x / s^2
dispersion = -2 / x
Y ~ IG(mean = 1, dispersion)

Then

B(x, s) = exp(x/2) * P(Y > q).

The statmod upper-tail formula is

P(Y > q) = Phi(-(q-1)/sqrt(dispersion*q))
         - exp(2/dispersion) * Phi(-(q+1)/sqrt(dispersion*q)).

Substituting q=-2x/s^2 and dispersion=-2/x gives the ordinary Black expression in a different parameterization. The temporary Rust harness ports the statmod .pinvgauss core branch for this scalar mean-one upper tail, including its asymptotic right-tail override, and evaluates the normal log tails with the local Apache Commons erfcx support.

One Rust-side numerical improvement was kept: statmod writes the upper-tail combination as a + log1p(-exp(b-a)); the harness uses the equivalent a + log1mexp(b-a), switching to log(-expm1(b-a)) when b-a is close to zero. That slightly reduces cancellation in the tiny near-ATM and low-volatility OTM windows, but it does not change the conclusion.

The one public-wrapper branch that looks relevant at first glance is statmod’s small-CV gamma approximation:

if mean * dispersion < 1e-14:
    pgamma(q, shape=1/(mean*dispersion), scale=(mean*dispersion)*mean)

For this Black identity mean=1 and dispersion=-2/x, so the branch would need x < -2e14. A finite Float64 normalized Black input derived from ex=exp(x)>0 has x >= log(5e-324) ~= -744.44. The gamma approximation therefore cannot trigger on the valid Black domain tested here; implementing it would not change these results.

After installing R, the temporary harness was checked against the actual R code by sourcing the CRAN statmod 1.5.2 R/invgauss.R file. The statmod package was not installed as an R library on this machine, so this is a source-level check of the same public R implementation rather than a namespace-loaded package check. On the 6144 Black-mapped local-window samples, R’s public pinvgauss(..., lower.tail=FALSE, log.p=TRUE) and the literal .pinvgauss core agree exactly in price space, confirming that no wrapper special case is active on this domain.

The remaining Rust-vs-R price differences come from the normal log-tail backend and the final cancellation. Comparing prices directly:

comparison exact p50 p95 p99 max max_rel worst
Rust log1mexp price vs R pinvgauss 3020 1 648 2406 7348 1.4e-12 low-vol-otm
Rust literal price vs R core literal 4671 0 64 2461 7421 1.4e-12 low-vol-otm
R wrapper vs R core literal price 6144 0 0 0 0 0.0e0 central-atm
R log1mexp core vs R literal price 4115 0 648 648 648 1.1e-13 tiny-near-atm

Against the Julia BigFloat references, literal R/statmod does not improve the Black price accuracy versus the literal Rust port; it lands on the same conclusion as the Rust experiment:

backend/formula exact p50 p95 p99 max max_rel worst
Rust IG with log1mexp 1158 7 6372 6568 7666 1.5e-12 low-vol-otm
Rust literal statmod 1174 7 7017 7212 7739 1.5e-12 low-vol-otm
R pinvgauss literal statmod 1691 8 7017 7214 7739 1.5e-12 low-vol-otm
R core with log1mexp 1665 8 6373 6570 7666 1.5e-12 low-vol-otm

So R fidelity is good for the relevant scalar statmod formula, and the only measurable Rust-side deviation kept here is the log1mexp substitution. It slightly improves the IG formula relative to literal R/statmod, but the formula remains far less accurate than the Jaeckel Region I/II expansions in the sensitive Black windows.

This representation is not Jaeckel-accurate on the sensitive windows:

backend global p50 p95 p99 max tiny max low-vol max high-vol max very-high max near-overflow max
Default/current 1 15 24 40 1 17 1 0 0
Commons erfcx-mixed 5 2823 3019 3068 3068 593 1 0 0
IG statmod upper 7 6372 6568 7666 6608 7666 1 0 0

The inverse-Gaussian form is fine in the high-volatility upper-price windows, but it is worse than the mixed erfc/erfcx formula in the tiny near-ATM and low-volatility OTM cancellation windows. It is also slower in this scalar Rust port: 63.00 ns/call on the same local-window timing harness, compared with about 25 ns/call for the Jaeckel-dispatched default and about 20 ns/call for the generic Commons mixed formula.

Selected Windows

Black formula local window examples.

Selected 512-ULP window stats:

window = tiny-near-atm

backend p50 p95 p99 max
Default/current 0 1 1 1
Commons erfcx-mixed 2864 3048 3064 3068
IG statmod upper 6404 6588 6604 6608
Cody erfcx-mixed 2864 3048 3064 3068
Johnson erfcx-mixed 2864 3048 3064 3068
libm erfc direct 2864 3048 3064 3068
Johnson pure-erfcx 59161 59345 59361 59365

window = low-vol-otm

backend p50 p95 p99 max
Default/current 6 14 16 17
Commons erfcx-mixed 146 429 548 593
IG statmod upper 2299 5108 7130 7666
Cody erfcx-mixed 254 683 885 1345
Johnson erfcx-mixed 180 493 649 831
libm erfc direct 3526 6657 7242 10329

window = erfc-threshold

backend p50 p95 p99 max
Default/current 1 2 2 2
Commons erfcx-mixed 7 18 23 28
IG statmod upper 9 24 28 32
Cody erfcx-mixed 8 22 28 31
Johnson erfcx-mixed 20 43 57 74
libm erfc direct 7 21 27 34

window = high-vol

backend p50 p95 p99 max
Default/current 1 1 1 1
Commons erfcx-mixed 1 1 1 1
IG statmod upper 1 1 1 1
Cody erfcx-mixed 1 1 1 1
Johnson erfcx-mixed 1 1 1 1
libm erfc direct 1 1 1 1
Johnson pure-erfcx 39 56 56 57

Interpretation by region:

Timing

Timing was measured on the same 6144 local-window samples, best of five passes with 2000 sweeps per pass, using cargo run --release.

backend ns/call
Default/current 25.07
Jaeckel acc libm+Commons 24.80
Jaeckel acc Cody+Cody 26.55
Jaeckel acc Johnson+Johnson 25.39
Jaeckel acc libm+Cody 23.44
Jaeckel acc libm+Johnson 24.28
Jaeckel I/II + pure Commons 24.83
Commons erfcx-mixed 19.80
IG statmod upper 63.00
Cody erfcx-mixed 17.92
Johnson erfcx-mixed 19.27
libm erfc direct 26.89
Commons pure-erfcx 21.32
Cody pure-erfcx 19.20
Johnson pure-erfcx 19.59

These timings are for the formula kernels on this artificial local-window mix, not for the full IV solvers. The temporary Jaeckel backend rows include a small runtime selector used only by the diagnostic harness. The default is slower than the generic mixed formulas because it does more routing and uses the Region I/II expansion machinery, but the accuracy gain in tiny near-ATM and low-volatility OTM zones is large.

Takeaways

References

(…) I originally used an erfcx routine derived from DERFC in SLATEC, but I have since replaced it with a much faster routine written by me which uses a combination of continued-fraction expansions and a lookup table of Chebyshev polynomials. For speed, I implemented a similar algorithm for Im[w(x)] of real x, since this comes up frequently in the other error functions.

Comments

comments powered by Disqus