The Role of Erfcx and Erfc in the Black-Scholes Formula

Sat, May 30, 2026

This note compares special-function choices at the level that matters to the implied volatility solvers: the normalized Black beta price, not standalone erfcx(x).

For x = log(F/K) <= 0 and total volatility s = sigma sqrt(T), the beta-space OTM call price is

B(x, s) = 0.5 * (exp(x/2) * erfc(q1) - exp(-x/2) * erfc(q2))
q1 = -(x/s + s/2) / sqrt(2)
q2 = -(x/s - s/2) / sqrt(2)

The corresponding forward-normalized OTM-call price is c = B(x, s) / sqrt(ex), with ex = exp(x).

Implementations Compared

Default/current: Jaeckel Region I/II expansions and falls back to the Apache Commons mixed erfc/erfcx formula in the middle region.
Jaeckel acc libm+Commons: the same Jaeckel region-dispatched, included into the temporary harness, using system erf/erfc and the local Apache Commons erfcx.
Jaeckel acc Cody+Cody: the same Jaeckel region-dispatched formula, using Cody erf, erfc, and erfcx from jaeckel = 0.2.0.
Jaeckel acc Johnson+Johnson: the same Jaeckel region-dispatched formula, using Johnson/Faddeeva erf, erfc, and erfcx through errorfunctions = 0.2.0.
Jaeckel acc libm+Cody and Jaeckel acc libm+Johnson: the same formula with system erf/erfc fixed and only erfcx swapped, to isolate the scaled-erfc backend effect.
Jaeckel I/II + pure Commons: Jaeckel Region I/II expansions are kept, but the Region III/middle fallback is replaced by the always-erfcx identity with the local Apache Commons erfcx.
IG statmod upper: the inverse-Gaussian upper-tail representation of the Black beta price, using a scalar Rust port of the CRAN statmod 1.5.2 .pinvgauss mean-one upper-tail log-probability core, with the final log(1-exp(r)) evaluated by a Rust log1mexp helper near cancellation. This is not a full statmod port: it omits R vectorization, NA/Inf handling, finite/infinite-mean special cases, and R’s exact pnorm implementation. The public-wrapper small-CV gamma approximation is also omitted because, under the Black mapping here, it would require dispersion=-2/x < 1e-14, or x < -2e14, far outside the finite Float64 x=log(ex) domain. No purpose-built Rust inverse-Gaussian crate was found in cargo search inverse-gaussian / cargo search invgauss; statrs 0.18.0 source also did not contain an inverse-Gaussian/Wald distribution implementation.
Commons erfcx-mixed: the Cody/Jaeckel threshold formula with the local Apache Commons Numbers erfcx port and system erfc.
Cody erfcx-mixed: the same threshold formula using jaeckel = 0.2.0 erfcx_cody.
Johnson erfcx-mixed: the same threshold formula using errorfunctions = 0.2.0 Johnson/Faddeeva erfcx.
libm erfc direct: the direct formula above, using system C erfc through FFI. Rust std does not provide erfc or erfcx.
Commons/Cody/Johnson pure-erfcx: the always-erfcx identity 0.5 * exp(-0.5*(h^2+t^2)) * (erfcx(q1) - erfcx(q2)), included to show why the mixed formula avoids some high-volatility tail problems.

Mixed Erfc/Erfcx Fallback Logic

The mixed formula starts from the direct beta-space Black expression:

2B = exp(x/2) * erfc(q1) - exp(-x/2) * erfc(q2)

with

h = x / s
t = s / 2
q1 = -(h + t) / sqrt(2)
q2 = -(h - t) / sqrt(2)

For either term, the stable tail identity is:

exp( x/2) * erfc(q1) = exp(-0.5 * (h^2 + t^2)) * erfcx(q1)
exp(-x/2) * erfc(q2) = exp(-0.5 * (h^2 + t^2)) * erfcx(q2)

because q1^2 = 0.5 * (h + t)^2, q2^2 = 0.5 * (h - t)^2, and x = h*s = 2*h*t. The mixed implementation uses that identity term by term instead of forcing both terms through one representation.

The branch threshold is Cody’s 0.46875, the same split used by the classic error-function approximations. Arguments below it are kept in the ordinary erfc representation; arguments above it use scaled erfcx so that the small tail probability is not represented as a nearly-underflowed erfc multiplied by a compensating exponential.

if q1 < 0.46875 and q2 < 0.46875:
    2B = exp(x/2) * erfc(q1)
       - exp(-x/2) * erfc(q2)

if q1 < 0.46875 and q2 >= 0.46875:
    2B = exp(x/2) * erfc(q1)
       - exp(-0.5 * (h^2 + t^2)) * erfcx(q2)

if q1 >= 0.46875 and q2 < 0.46875:
    2B = exp(-0.5 * (h^2 + t^2)) * erfcx(q1)
       - exp(-x/2) * erfc(q2)

if q1 >= 0.46875 and q2 >= 0.46875:
    2B = exp(-0.5 * (h^2 + t^2)) * (erfcx(q1) - erfcx(q2))

On the valid Black domain, the third formal case is actually unreachable. Since t = s/2 >= 0,

q2 - q1 = sqrt(2) * t >= 0,

so q1 >= rho implies q2 >= rho as well. The erfcx, erfc branch would require q1 >= rho and q2 < rho at the same time; that would need negative t, not merely positive h = x/s. It is kept in the branch table only as the symmetric algebraic case.

So “mixed” does not mean that erfc is a fallback after erfcx fails. It means each of the two Black terms is routed to the representation that is locally better conditioned. Negative or central arguments stay with erfc, where erfc(q) is order one and cheap. Positive tail arguments switch to erfcx, where the scaled value remains order 1/q and avoids underflow.

This explains two patterns in the tables below. In the low-volatility OTM tail, direct erfc loses many ulps because it subtracts two tiny tail probabilities after ordinary scaling; the mixed erfcx versions are much better. In the high-volatility upper-price window, the pure-erfcx identity is worse because it exposes the cancellation in erfcx(q1) - erfcx(q2) even when ordinary erfc would have kept one side well behaved.

Reference And Local Windows

Reference values are Julia BigFloat at 512-bit precision, rounded back to Float64. ULP error is the positive-Float64 bit distance to that rounded reference. Each window contains 512 consecutive Float64 values in either s or x, with the other coordinate fixed.

id	zone	axis	x	s
central-atm	central	s	-1.000000e-2	2.000000e-1
tiny-near-atm	tiny-near-atm	s	-1.000000e-8	1.000000e-4
erfc-threshold	branch-transition	s	-1.000000e-1	1.364500e-1
moderate-otm	moderate-otm	s	-1.000000e0	5.000000e-1
low-vol-otm	lower-tail	s	-1.000000e-1	2.000000e-2
deep-otm	lower-tail	s	-3.000000e0	7.000000e-1
far-otm	lower-tail	s	-6.000000e0	1.000000e0
high-vol	upper-price	s	-1.000000e-2	1.000000e1
very-high-vol	upper-price	s	-1.000000e0	5.000000e1
near-overflow-vol	upper-price	s	-1.000000e0	7.500000e1
x-central	x-window	x	-1.000000e-2	2.000000e-1
x-lower-tail	x-window	x	-4.000000e0	8.000000e-1

Global Accuracy

backend	samples	exact_bits	p50	p95	p99	max	max_rel	worst_window
Default/current	6144	2301	1	15	24	40	7.4e-15	far-otm
Jaeckel acc libm+Commons	6144	2301	1	15	24	40	7.4e-15	far-otm
Jaeckel acc Cody+Cody	6144	2291	1	14	22	40	7.4e-15	far-otm
Jaeckel acc Johnson+Johnson	6144	2296	1	15	25	42	7.8e-15	far-otm
Jaeckel acc libm+Cody	6144	2291	1	14	22	40	7.4e-15	far-otm
Jaeckel acc libm+Johnson	6144	2296	1	15	25	42	7.8e-15	far-otm
Jaeckel I/II + pure Commons	6144	1277	4	686	1202	1570	2.9e-13	near-overflow-vol
Commons erfcx-mixed	6144	1212	5	2823	3019	3068	5.2e-13	tiny-near-atm
IG statmod upper	6144	1158	7	6372	6568	7666	1.5e-12	low-vol-otm
Cody erfcx-mixed	6144	1180	6	2823	3019	3068	5.2e-13	tiny-near-atm
Johnson erfcx-mixed	6144	1161	6	2823	3019	3068	5.2e-13	tiny-near-atm
libm erfc direct	6144	1116	10	3049	5952	10329	2.0e-12	low-vol-otm
Commons pure-erfcx	6144	255	13	1776	1972	2021	3.4e-13	tiny-near-atm
Cody pure-erfcx	6144	174	15	9968	10164	10213	1.7e-12	tiny-near-atm
Johnson pure-erfcx	6144	257	19	59120	59316	59365	1.0e-11	tiny-near-atm

The headline is different from the raw erfcx comparison. In the normalized Black formula, the largest errors in these local windows are not the raw negative-erfcx overflow tail. They are tiny near-ATM and low-volatility OTM cancellation zones. The default is much better there because it uses Jaeckel Region I/II expansions before falling back to the mixed erfc/erfcx formula. Once that region-dispatched formula is kept, backend choice barely moves the result: all Jaeckel-accurate variants have global p50 = 1 ulp and max = 40 to 42 ulps on this sample set.

Jaeckel Accurate Backend Sensitivity

The Jaeckel accurate path is not just the mixed formula with a different erfcx. It routes first through:

Region I: asymptotic expansion times the normalized vega. This path is independent of erf, erfc, and erfcx.
Region II: small-t expansion times the normalized vega. This path uses erfcx only in the central y_prime branch; deeper left-tail y_prime is rational.
Region III/middle: the mixed Cody/Jaeckel erfc/erfcx formula described above.

That dispatch is why the special-function backend differences collapse compared with the generic mixed or pure-erfcx formulas.

Selected Jaeckel-accurate rows:

backend	global p50	p95	p99	max	tiny max	low-vol max	erfc-threshold max	far-otm max
Default/current	1	15	24	40	1	17	2	40
Jaeckel acc libm+Commons	1	15	24	40	1	17	2	40
Jaeckel acc Cody+Cody	1	14	22	40	1	17	2	40
Jaeckel acc Johnson+Johnson	1	15	25	42	1	17	2	42
Jaeckel acc libm+Cody	1	14	22	40	1	17	2	40
Jaeckel acc libm+Johnson	1	15	25	42	1	17	2	42

The paired Cody rows match the libm+Cody rows in aggregate, and the paired Johnson rows match the libm+Johnson rows in aggregate. In these windows, the small residual backend sensitivity comes from erfcx, not from swapping the ordinary erf/erfc implementation. Cody is fractionally better than Commons by p95/p99 in the far-OTM-dominated aggregate; Johnson is fractionally worse, with a 42-ulp maximum instead of 40. Those differences are tiny next to the thousands of ulps lost by the non-Jaeckel mixed/direct formulas in the tiny near-ATM and low-vol OTM windows.

Jaeckel I/II With Pure-Erfcx Fallback

A tempting simplification is to keep Jaeckel Regions I and II, but replace the Region III/middle fallback with the pure-erfcx identity:

B = 0.5 * exp(-0.5 * (h^2 + t^2)) * (erfcx(q1) - erfcx(q2))

With Apache Commons erfcx, this is not as accurate as the default (Apache Commons erfc and erfcx with all Jaeckel zones) overall:

backend	global p50	p95	p99	max	tiny max	low-vol max	high-vol max	very-high max	near-overflow max
Default/current	1	15	24	40	1	17	1	0	0
Jaeckel acc libm+Commons	1	15	24	40	1	17	1	0	0
Jaeckel I/II + pure Commons	4	686	1202	1570	1	17	60	745	1570

The simplification keeps the important Region II tiny near-ATM behavior and the Region I/II lower-tail behavior, so those windows match the default. The loss comes from middle/fallback cases, especially the high-volatility upper-price windows. There the pure-erfcx identity exposes cancellation in erfcx(q1) - erfcx(q2), while the mixed fallback keeps the locally better erfc representation for negative or central arguments.

It is not meaningfully faster in this diagnostic either. On the same optimized local-window timing run, the pure fallback row was 24.83 ns/call, compared with 24.80 ns/call for the same included Jaeckel formula with the mixed Commons fallback and 25.07 ns/call for the Default/current path. That difference is inside normal microbenchmark noise, and the generic pure-erfcx kernel was slower than the generic mixed Commons kernel (21.32 vs 19.80 ns/call).

Inverse Gaussian Formula Experiment

For x < 0, the Black beta price can be written as an inverse-Gaussian upper tail. In the statmod mean-one parameterization, let

q = -2x / s^2
dispersion = -2 / x
Y ~ IG(mean = 1, dispersion)

Then

B(x, s) = exp(x/2) * P(Y > q).

The statmod upper-tail formula is

P(Y > q) = Phi(-(q-1)/sqrt(dispersion*q))
         - exp(2/dispersion) * Phi(-(q+1)/sqrt(dispersion*q)).

Substituting q=-2x/s^2 and dispersion=-2/x gives the ordinary Black expression in a different parameterization. The temporary Rust harness ports the statmod .pinvgauss core branch for this scalar mean-one upper tail, including its asymptotic right-tail override, and evaluates the normal log tails with the local Apache Commons erfcx support.

One Rust-side numerical improvement was kept: statmod writes the upper-tail combination as a + log1p(-exp(b-a)); the harness uses the equivalent a + log1mexp(b-a), switching to log(-expm1(b-a)) when b-a is close to zero. That slightly reduces cancellation in the tiny near-ATM and low-volatility OTM windows, but it does not change the conclusion.

The one public-wrapper branch that looks relevant at first glance is statmod’s small-CV gamma approximation:

if mean * dispersion < 1e-14:
    pgamma(q, shape=1/(mean*dispersion), scale=(mean*dispersion)*mean)

For this Black identity mean=1 and dispersion=-2/x, so the branch would need x < -2e14. A finite Float64 normalized Black input derived from ex=exp(x)>0 has x >= log(5e-324) ~= -744.44. The gamma approximation therefore cannot trigger on the valid Black domain tested here; implementing it would not change these results.

After installing R, the temporary harness was checked against the actual R code by sourcing the CRAN statmod 1.5.2 R/invgauss.R file. The statmod package was not installed as an R library on this machine, so this is a source-level check of the same public R implementation rather than a namespace-loaded package check. On the 6144 Black-mapped local-window samples, R’s public pinvgauss(..., lower.tail=FALSE, log.p=TRUE) and the literal .pinvgauss core agree exactly in price space, confirming that no wrapper special case is active on this domain.

The remaining Rust-vs-R price differences come from the normal log-tail backend and the final cancellation. Comparing prices directly:

comparison	exact	p50	p95	p99	max	max_rel	worst
Rust log1mexp price vs R pinvgauss	3020	1	648	2406	7348	1.4e-12	low-vol-otm
Rust literal price vs R core literal	4671	0	64	2461	7421	1.4e-12	low-vol-otm
R wrapper vs R core literal price	6144	0	0	0	0	0.0e0	central-atm
R log1mexp core vs R literal price	4115	0	648	648	648	1.1e-13	tiny-near-atm

Against the Julia BigFloat references, literal R/statmod does not improve the Black price accuracy versus the literal Rust port; it lands on the same conclusion as the Rust experiment:

backend/formula	exact	p50	p95	p99	max	max_rel	worst
Rust IG with log1mexp	1158	7	6372	6568	7666	1.5e-12	low-vol-otm
Rust literal statmod	1174	7	7017	7212	7739	1.5e-12	low-vol-otm
R pinvgauss literal statmod	1691	8	7017	7214	7739	1.5e-12	low-vol-otm
R core with log1mexp	1665	8	6373	6570	7666	1.5e-12	low-vol-otm

So R fidelity is good for the relevant scalar statmod formula, and the only measurable Rust-side deviation kept here is the log1mexp substitution. It slightly improves the IG formula relative to literal R/statmod, but the formula remains far less accurate than the Jaeckel Region I/II expansions in the sensitive Black windows.

This representation is not Jaeckel-accurate on the sensitive windows:

backend	global p50	p95	p99	max	tiny max	low-vol max	high-vol max
Default/current	1	15	24	40	1	17	1
Commons erfcx-mixed	5	2823	3019	3068	3068	593	1
IG statmod upper	7	6372	6568	7666	6608	7666	1

The inverse-Gaussian form is fine in the high-volatility upper-price windows, but it is worse than the mixed erfc/erfcx formula in the tiny near-ATM and low-volatility OTM cancellation windows. It is also slower in this scalar Rust port: 63.00 ns/call on the same local-window timing harness, compared with about 25 ns/call for the Jaeckel-dispatched default and about 20 ns/call for the generic Commons mixed formula.

Selected Windows

Selected 512-ULP window stats:

window = tiny-near-atm

backend	p50	p95	p99	max
Default/current	0	1	1	1
Commons erfcx-mixed	2864	3048	3064	3068
IG statmod upper	6404	6588	6604	6608
Cody erfcx-mixed	2864	3048	3064	3068
Johnson erfcx-mixed	2864	3048	3064	3068
libm erfc direct	2864	3048	3064	3068
Johnson pure-erfcx	59161	59345	59361	59365

window = low-vol-otm

backend	p50	p95	p99	max
Default/current	6	14	16	17
Commons erfcx-mixed	146	429	548	593
IG statmod upper	2299	5108	7130	7666
Cody erfcx-mixed	254	683	885	1345
Johnson erfcx-mixed	180	493	649	831
libm erfc direct	3526	6657	7242	10329

window = erfc-threshold

backend	p50	p95	p99	max
Default/current	1	2	2	2
Commons erfcx-mixed	7	18	23	28
IG statmod upper	9	24	28	32
Cody erfcx-mixed	8	22	28	31
Johnson erfcx-mixed	20	43	57	74
libm erfc direct	7	21	27	34

window = high-vol

backend	p50	p95	p99	max
Default/current	1	1	1	1
Commons erfcx-mixed	1	1	1	1
IG statmod upper	1	1	1	1
Cody erfcx-mixed	1	1	1	1
Johnson erfcx-mixed	1	1	1	1
libm erfc direct	1	1	1	1
Johnson pure-erfcx	39	56	56	57

Interpretation by region:

Tiny near-ATM: backend choice is almost irrelevant for the mixed formula; all mixed/direct forms lose about 3000 ulps. The default Region II expansion stays within 1 ulp.
Low-volatility OTM tail: direct erfc is worst because it subtracts two close tail probabilities. The erfcx-mixed formulas are much better, and Commons is best among the three erfcx backends in this window. The default expansion is better again.
Branch transition: this is where backend differences are visible but modest. Commons and Cody are close; Johnson is weaker but still far from the direct-erfc lower-tail failure.
Upper-price high-volatility: the mixed formula and direct-erfc formula are all within 1 ulp in this window. The pure-erfcx identity is worse because it exposes cancellation/rounding in erfcx(q1) - erfcx(q2) instead of routing negative-side arguments through erfc.

Timing

Timing was measured on the same 6144 local-window samples, best of five passes with 2000 sweeps per pass, using cargo run --release.

backend	ns/call
Default/current	25.07
Jaeckel acc libm+Commons	24.80
Jaeckel acc Cody+Cody	26.55
Jaeckel acc Johnson+Johnson	25.39
Jaeckel acc libm+Cody	23.44
Jaeckel acc libm+Johnson	24.28
Jaeckel I/II + pure Commons	24.83
Commons erfcx-mixed	19.80
IG statmod upper	63.00
Cody erfcx-mixed	17.92
Johnson erfcx-mixed	19.27
libm erfc direct	26.89
Commons pure-erfcx	21.32
Cody pure-erfcx	19.20
Johnson pure-erfcx	19.59

These timings are for the formula kernels on this artificial local-window mix, not for the full IV solvers. The temporary Jaeckel backend rows include a small runtime selector used only by the diagnostic harness. The default is slower than the generic mixed formulas because it does more routing and uses the Region I/II expansion machinery, but the accuracy gain in tiny near-ATM and low-volatility OTM zones is large.

Takeaways

A raw erfcx(x) benchmark overstates the relevance of the negative-erfcx tail for the stable Black price. The mixed formula avoids much of that tail by using erfc on negative-side arguments.
The most important Black-formula accuracy zones here are tiny near-ATM and low-volatility OTM, where formula structure dominates backend choice.
For a generic erfcx-mixed Black formula, Apache Commons is the best of the three tested erfcx backends on the low-vol OTM window, Cody is usually close, and Johnson is somewhat weaker near the branch transition.
For the full Jaeckel accurate formula, all tested backend combinations are essentially tied: global p50 = 1 ulp, max = 40 to 42 ulps, and the sensitive tiny near-ATM window stays within 1 ulp for every backend.
In these sampled windows, swapping Cody or Johnson erf/erfc together with erfcx gives the same aggregate result as keeping libm erf/erfc and swapping only erfcx.
Keeping Jaeckel Regions I/II but using a pure Commons-erfcx fallback is simpler, but not accuracy-equivalent to the default: it matches the default in tiny near-ATM and lower-tail windows, then loses up to 1570 ulps in the high-volatility upper-price fallback region.
The pure-erfcx fallback is not a clear speed win either; in this release-mode diagnostic it was effectively tied with the mixed Jaeckel fallback, while the generic pure-erfcx kernel was slower than the generic mixed Commons kernel.
The inverse-Gaussian/statmod representation is a valid Black formula, but not an accuracy replacement for Jaeckel’s expanded evaluator here: even with a Rust log1mexp cancellation tweak, it loses up to 6608 ulps in the tiny near-ATM window and 7666 ulps in the low-volatility OTM window, and the straightforward scalar port is much slower.
The default remains the accuracy-favored choice because the Jaeckel Region I/II expansions reduce errors from thousands of ulps to tens of ulps or better in the sensitive windows.

References

For the Cody Netlib Slatec code Rational Chebyshev approximations for the error function by W. J. Cody, Math. Comp., 1969, PP. 631-638
For the Apache Commons Number code, this issue description gives a good overview of the implementation, which is a mix of Boost rational functions and Cody rational function to obtain the best accuracy.
Steven G. Johnson mentions in his code the following:

(…) I originally used an erfcx routine derived from DERFC in SLATEC, but I have since replaced it with a much faster routine written by me which uses a combination of continued-fraction expansions and a lookup table of Chebyshev polynomials. For speed, I implemented a similar algorithm for Im[w(x)] of real x, since this comes up frequently in the other error functions.

TensorFlow makes use of a Chebyshev approximation from M. Shepherd and J. Laframboise, Chebyshev approximation of (1 + 2 * x) * exp(x^2) * erfc(x) (1981)