Which erfcx?

Sun, May 24, 2026

The use of erfcx instead of direct erfc or CDF in a Black-Scholes implied volatility solver leads to gain in accuracy and performance in general. But which erfcx should we use?

This note compares practical erfcx implementations for Rust implied volatility solvers:

Commons: the local Rust port of Apache Commons Numbers BoostErf.erfcx
Cody: the Cody rational approximation exposed by the jaeckel = 0.2.0 crate during this comparison, originally coming from Netlib SLATEC library.
Johnson: Steven G. Johnson’s Faddeeva implementation, as exposed by errorfunctions = 0.2.0, faddeeva-sys = 0.1.0, and Julia SpecialFunctions.erfcx(::Float64)

The Commons-style implementation is a local Rust port of the Apache Commons Numbers/BoostErf approach. It computes the small central region through the local erf rational approximation and uses exp_m1(x*x) to avoid cancellation in exp(x^2) * erfc(x) near zero. For positive inputs it uses several rational subintervals: a central polynomial/rational erf branch, erfc-style rational branches around 0.5..1.5, 1.5..2.5, and 2.5..4, and a rational function taken from Rational Chebyshev approximations for the error function by W. J. Cody, Math. Comp., 1969, PP. 631-638 beyond that. For very large positive x, it returns 1 / (sqrt(pi) * x). For negative x, it uses the reflected form with a split-square exponential, returns Inf below the finite overflow cutoff, and otherwise computes 2 exp(x^2) - erfcx(|x|) in a way that preserves more bits near the overflow boundary.

The Cody implementation used here is the jaeckel crate’s Cody/CALERF-style implementation. It has three main positive-side rational regions: AB for |x| <= 0.46875, CD for 0.46875 < |x| <= 4, and PQ as an asymptotic rational in 1/x^2 for |x| > 4. It uses smoothed exponentials by splitting x onto a 1/16 grid before applying the residual exponential. For negative x, it reflects with erfcx(-x) = 2 exp(x^2) - erfcx(x) and clamps below its XNEG threshold to f64::MAX.

The rare Cody spike in the finite negative overflow-boundary window is not caused by using the wrong rational interval but is deliberate. Cody/CALERF does use the PQ asymptotic rational for the positive |x| > 4 part, but the negative-side fix-up is still the reflected 2 exp(x^2) - erfcx(|x|) form with an XNEG overflow guard. The published IEEE double CALERF constant is the rounded XNEG = -26.628D0; the public Rust crate uses a refined XNEG = -26.6287357137514 with a source comment noting that the original value was -26.628. The plot spike is therefore boundary/guard behavior inherited from the CALERF-style negative branch, with a refined but still conservative cutoff near the true finite Float64 overflow boundary. A literal Cody reference would clamp even earlier rather than use a separate high-accuracy negative-tail expansion in that range.

errorfunctions = 0.2.0 is a close Rust translation of Steven G. Johnson’s Faddeeva C/C++ package. Its real erfcx implementation uses:

a 100-interval Chebyshev table after the map y = 4 / (4 + x) for 0 <= x <= 50
a continued-fraction expansion for x > 50
the reflection identity erfcx(-x) = 2 exp(x^2) - erfcx(x) for negative x

Julia SpecialFunctions.erfcx(::Float64) calls Faddeeva_erfcx_re from libopenspecfun, and faddeeva-sys binds the same Johnson C symbol. On the sampled Float64 grid below, errorfunctions, faddeeva-sys, and Julia SpecialFunctions were bit-identical.

Generic Erfcx Accuracy

Reference: Julia BigFloat erfcx(x) at 256-bit precision, rounded back to Float64. The grid used 7021 total samples, of which 6272 had finite rounded Float64 references.

backend	samples	max_ulp	max_rel	max_abs	worst_x
Commons	6272	1	1.2e-16	4.990e291	-2.6600845000e1
Cody	6272	337	3.7e-14	6.726e294	-2.6628735714e1
Johnson	6272	491	5.5e-14	1.614e212	-2.2769845000e1

The large absolute errors occur where erfcx(x) itself is enormous. The relative errors are still small, but the ULP differences matter because the IV tests include very sensitive tiny and saturated cases.

512-ULP Local Window Accuracy

The coarse grid above is useful for broad coverage, but it does not answer the more local question: what happens across consecutive representable inputs near the sensitive regions? I ran a temporary Rust verifier that generated 512 consecutive Float64 inputs around 37 representative centers, including the old coarse-grid worst tail locations and the finite overflow boundary. References are Julia BigFloat erfcx(x) at 256-bit precision rounded back to Float64. ULP error is the positive-Float64 bit-distance from that rounded reference.

The windows covered 18,944 input values. The finite-reference tables below use 18,176 values; 768 inputs from the overflow-boundary windows round to Inf and are excluded from the finite-error percentiles.

Global local-window stats:

backend	samples	exact_bits	p50	p95	p99	max	max_rel	worst_x
Commons	18176	13682	0	1	1	2	3.6e-16	2.5000000000e0
Cody	18176	10375	0	2	3	41238	4.6e-12	-2.6628735714e1
Johnson	18176	6490	1	395	509	511	5.7e-14	-2.6628700000e1

The p95/p99 picture is clearer than the max alone. Commons-style stays within 1 ulp at p99 and within 2 ulps at max. Cody is usually close, but has a rare near-overflow-tail spike. Johnson is the opposite shape: no huge single outlier here, but the negative tail is broadly hundreds of ulps away.

In the mid negative tail around x = -24, Johnson is consistently about 510 ulps away while Cody and Default/Commons stay close to the rounded BigFloat reference.

region	backend	samples	p95	p99	max
negative near-overflow tail	Commons	256	1	1	1
negative near-overflow tail	Cody	256	19083	36126	41238
negative near-overflow tail	Johnson	256	487	506	511
negative tail	Commons	512	1	1	1
negative tail	Cody	512	4	5	5
negative tail	Johnson	512	510	510	510

Zone definitions:

neg-near-overflow x < -26
neg-tail -26 <= x < -6.1
neg-transition -6.1 <= x < -0.5
central -0.5 <= x <= 0.5
pos-core 0.5 < x < 4
pos-tail 4 <= x <= 50
pos-far-tail x > 50

Per-zone local-window stats:

zone	backend	samples	exact_bits	p50	p95	p99	max	max_rel	worst_x
neg-near-overflow	Commons	3584	2682	0	1	1	1	1.9e-16	-2.6e1
neg-near-overflow	Cody	3584	1611	1	2	3	41238	4.6e-12	-2.7e1
neg-near-overflow	Johnson	3584	34	218	460	496	511	5.7e-14	-2.7e1
neg-tail	Commons	3584	2885	0	1	1	1	2.1e-16	-6.1e0
neg-tail	Cody	3584	2069	0	2	4	5	6.3e-16	-2.4e1
neg-tail	Johnson	3584	872	26	509	510	510	5.7e-14	-2.4e1
neg-transition	Commons	2048	1446	0	1	1	2	2.3e-16	-5.0e-1
neg-transition	Cody	2048	1155	0	1	2	3	3.4e-16	-5.0e-1
neg-transition	Johnson	2048	1291	0	15	24	26	3.6e-15	-6.1e0
central	Commons	2049	1649	0	1	1	2	3.6e-16	5.0e-1
central	Cody	2049	1313	0	1	2	3	3.6e-16	-5.0e-1
central	Johnson	2049	902	1	3	4	5	7.2e-16	2.5e-1
pos-core	Commons	2559	1772	0	1	1	2	3.6e-16	2.5e0
pos-core	Cody	2559	982	1	3	4	5	6.6e-16	2.5e0
pos-core	Johnson	2559	793	1	2	3	4	7.2e-16	5.0e-1
pos-tail	Commons	2049	1294	0	1	1	1	2.0e-16	5.0e1
pos-tail	Cody	2049	1291	0	1	1	1	2.0e-16	5.0e1
pos-tail	Johnson	2049	1047	0	1	2	2	4.1e-16	2.0e1
pos-far-tail	Commons	2303	1954	0	1	1	1	1.9e-16	1.0e8
pos-far-tail	Cody	2303	1954	0	1	1	1	1.9e-16	1.0e8
pos-far-tail	Johnson	2303	1551	0	1	1	2	3.1e-16	1.0e2

Local-window timing on the same 18,944 samples, best of five passes with 1000 sweeps per pass:

backend	ns/call
Commons	9.86
Cody	10.04
Johnson	6.28

So yes: the tail is the right place to look. The worst Johnson behavior is the negative tail and near-overflow tail. Cody has a rare sharper spike right at the finite overflow boundary. Commons is the most stable over these consecutive-ULP windows.

Generic Erfcx Timing

Mixed grid, 24003 samples, best of five timing passes, 100 sweeps per pass.

backend	ns/call
Commons	6.29
Cody	6.55
Johnson/errorfunctions	6.05
Johnson/faddeeva-sys	4.92
Julia SpecialFunctions	4.62

The Julia number is measured inside one Julia process launched by the Rust diagnostic program; it is a Julia-loop function-call timing, not per-call Rust-to-Julia FFI overhead.

Impact on ThiopheneIV Implied Volatility Solver

Values are max error in ULPs of reference total volatility on a specific set of options (see the ThiopheneIV paper for the definition of the exact zones). The rust implementation of the solver is available here

ThiopheneIV max ulp

backend	CLY-3D	CLY-20	CLY-80	Jaeckel	Market	Corners	Stress	HighVol
Cody	190	52	8	142	520	557	285	2
Commons	133	49	8	63	177	329	138	2
Johnson	400	71	16	97	1210	537	484	3

ThiopheneIV+ max ulp

backend	CLY-3D	CLY-20	CLY-80	Jaeckel	Market	Corners	Stress	HighVol
Cody	23	5	4	14	30	41	33	2
Commons	23	5	4	13	29	41	33	2
Johnson	22	5	7	13	29	41	32	3

The polished ThiopheneIV+ path is robust across erfcx choices. The unpolished fast solver is where Commons is clearly better for benchmark.

Impact on ThiopheneIV Performance

ThiopheneIV ns/call

backend	CLY-3D	CLY-20	CLY-80	Jaeckel	Market	Corners	Stress	HighVol
Cody	146	146	148	146	147	155	155	150
Commons	155	147	147	156	153	159	157	151
Johnson	179	180	188	211	203	185	204	161

ThiopheneIV+ ns/call

backend	CLY-3D	CLY-20	CLY-80	Jaeckel	Market	Corners	Stress	HighVol
Cody	199	200	199	185	193	210	205	187
Commons	202	203	197	197	200	214	210	189
Johnson	233	229	245	249	251	240	254	210

Cody is slightly faster than Commons for ThiopheneIV, but Commons is close and much more accurate. Johnson is slower in the solver context despite competitive standalone erfcx timing, most likely because its branch/table structure interacts differently with the full pricing paths.

Conclusion

Use the Commons erfcx implementation. It gives the best generic erfcx accuracy against the BigFloat reference among the tested Rust-callable implementations, the most stable 512-consecutive-ULP local-window profile, the best unpolished ThiopheneIV accuracy, and passes the strict reference/corner suite. Cody remains a good fast approximation, but it is of slighly lower accuracy (which impacts non-polished ThiopheneIV implied vols) and has a near-overflow-tail spike (very likely without impact on my use case).