Which erfcx?

The use of erfcx instead of direct erfc or CDF in a Black-Scholes implied volatility solver leads to gain in accuracy and performance in general. But which erfcx should we use?

This note compares practical erfcx implementations for Rust implied volatility solvers:

The Commons-style implementation is a local Rust port of the Apache Commons Numbers/BoostErf approach. It computes the small central region through the local erf rational approximation and uses exp_m1(x*x) to avoid cancellation in exp(x^2) * erfc(x) near zero. For positive inputs it uses several rational subintervals: a central polynomial/rational erf branch, erfc-style rational branches around 0.5..1.5, 1.5..2.5, and 2.5..4, and an asymptotic rational in 1/x^2 beyond that. For very large positive x, it returns 1 / (sqrt(pi) * x). For negative x, it uses the reflected form with a split-square exponential, returns Inf below the finite overflow cutoff, and otherwise computes 2 exp(x^2) - erfcx(|x|) in a way that preserves more bits near the overflow boundary.

The Cody implementation used here is the jaeckel crate’s Cody/CALERF-style implementation. It has three main positive-side rational regions: AB for |x| <= 0.46875, CD for 0.46875 < |x| <= 4, and PQ as an asymptotic rational in 1/x^2 for |x| > 4. It uses smoothed exponentials by splitting x onto a 1/16 grid before applying the residual exponential. For negative x, it reflects with erfcx(-x) = 2 exp(x^2) - erfcx(x) and clamps below its XNEG threshold to f64::MAX.

Cody near-overflow spike.

The rare Cody spike in the finite negative overflow-boundary window is not caused by using the wrong rational interval but is deliberate. Cody/CALERF does use the PQ asymptotic rational for the positive |x| > 4 part, but the negative-side fix-up is still the reflected 2 exp(x^2) - erfcx(|x|) form with an XNEG overflow guard. The published IEEE double CALERF constant is the rounded XNEG = -26.628D0; the public Rust crate uses a refined XNEG = -26.6287357137514 with a source comment noting that the original value was -26.628. The plot spike is therefore boundary/guard behavior inherited from the CALERF-style negative branch, with a refined but still conservative cutoff near the true finite Float64 overflow boundary. A literal Cody reference would clamp even earlier rather than use a separate high-accuracy negative-tail expansion in that range.

errorfunctions = 0.2.0 is a close Rust translation of Steven G. Johnson’s Faddeeva C/C++ package. Its real erfcx implementation uses:

Julia SpecialFunctions.erfcx(::Float64) calls Faddeeva_erfcx_re from libopenspecfun, and faddeeva-sys binds the same Johnson C symbol. On the sampled Float64 grid below, errorfunctions, faddeeva-sys, and Julia SpecialFunctions were bit-identical.

Generic Erfcx Accuracy

Reference: Julia BigFloat erfcx(x) at 256-bit precision, rounded back to Float64. The grid used 7021 total samples, of which 6272 had finite rounded Float64 references.

backend samples invalid max_ulp max_rel max_abs worst_x
Commons 6272 0 1 1.2e-16 4.990e291 -2.6600845000e1
Cody 6272 0 337 3.7e-14 6.726e294 -2.6628735714e1
Johnson 6272 0 491 5.5e-14 1.614e212 -2.2769845000e1

The large absolute errors occur where erfcx(x) itself is enormous. The relative errors are still small, but the ULP differences matter because the IV tests include very sensitive tiny and saturated cases.

512-ULP Local Window Accuracy

The coarse grid above is useful for broad coverage, but it does not answer the more local question: what happens across consecutive representable inputs near the sensitive regions? I ran a temporary Rust verifier that generated 512 consecutive Float64 inputs around 37 representative centers, including the old coarse-grid worst tail locations and the finite overflow boundary. References are Julia BigFloat erfcx(x) at 256-bit precision rounded back to Float64. ULP error is the positive-Float64 bit-distance from that rounded reference.

The windows covered 18,944 input values. The finite-reference tables below use 18,176 values; 768 inputs from the overflow-boundary windows round to Inf and are excluded from the finite-error percentiles.

Global local-window stats:

backend samples invalid exact_bits p50 p95 p99 max max_rel worst_x
Commons 18176 0 13682 0 1 1 2 3.6e-16 2.5000000000e0
Cody 18176 0 10375 0 2 3 41238 4.6e-12 -2.6628735714e1
Johnson 18176 0 6490 1 395 509 511 5.7e-14 -2.6628700000e1

The p95/p99 picture is clearer than the max alone. Commons-style stays within 1 ulp at p99 and within 2 ulps at max. Cody is usually close, but has a rare near-overflow-tail spike. Johnson is the opposite shape: no huge single outlier here, but the negative tail is broadly hundreds of ulps away.

In the mid negative tail around x = -24, Johnson is consistently about 510 ulps away while Cody and Default/Commons stay close to the rounded BigFloat reference.

region backend samples p95 p99 max
negative near-overflow tail Commons 256 1 1 1
negative near-overflow tail Cody 256 19083 36126 41238
negative near-overflow tail Johnson 256 487 506 511
negative tail Commons 512 1 1 1
negative tail Cody 512 4 5 5
negative tail Johnson 512 510 510 510

Zone definitions:

Per-zone local-window stats:

zone backend samples invalid exact_bits p50 p95 p99 max max_rel worst_x
neg-near-overflow Commons 3584 0 2682 0 1 1 1 1.9e-16 -2.6e1
neg-near-overflow Cody 3584 0 1611 1 2 3 41238 4.6e-12 -2.7e1
neg-near-overflow Johnson 3584 0 34 218 460 496 511 5.7e-14 -2.7e1
neg-tail Commons 3584 0 2885 0 1 1 1 2.1e-16 -6.1e0
neg-tail Cody 3584 0 2069 0 2 4 5 6.3e-16 -2.4e1
neg-tail Johnson 3584 0 872 26 509 510 510 5.7e-14 -2.4e1
neg-transition Commons 2048 0 1446 0 1 1 2 2.3e-16 -5.0e-1
neg-transition Cody 2048 0 1155 0 1 2 3 3.4e-16 -5.0e-1
neg-transition Johnson 2048 0 1291 0 15 24 26 3.6e-15 -6.1e0
central Commons 2049 0 1649 0 1 1 2 3.6e-16 5.0e-1
central Cody 2049 0 1313 0 1 2 3 3.6e-16 -5.0e-1
central Johnson 2049 0 902 1 3 4 5 7.2e-16 2.5e-1
pos-core Commons 2559 0 1772 0 1 1 2 3.6e-16 2.5e0
pos-core Cody 2559 0 982 1 3 4 5 6.6e-16 2.5e0
pos-core Johnson 2559 0 793 1 2 3 4 7.2e-16 5.0e-1
pos-tail Commons 2049 0 1294 0 1 1 1 2.0e-16 5.0e1
pos-tail Cody 2049 0 1291 0 1 1 1 2.0e-16 5.0e1
pos-tail Johnson 2049 0 1047 0 1 2 2 4.1e-16 2.0e1
pos-far-tail Commons 2303 0 1954 0 1 1 1 1.9e-16 1.0e8
pos-far-tail Cody 2303 0 1954 0 1 1 1 1.9e-16 1.0e8
pos-far-tail Johnson 2303 0 1551 0 1 1 2 3.1e-16 1.0e2

Local-window timing on the same 18,944 samples, best of five passes with 1000 sweeps per pass:

backend ns/call
Commons 9.86
Cody 10.04
Johnson 6.28

So yes: the tail is the right place to look. The worst Johnson behavior is the negative tail and near-overflow tail. Cody has a rare sharper spike right at the finite overflow boundary. Commons is the most stable over these consecutive-ULP windows.

Generic Erfcx Timing

Mixed grid, 24003 samples, best of five timing passes, 100 sweeps per pass.

backend ns/call
Commons 6.29
Cody 6.55
Johnson/errorfunctions 6.05
Johnson/faddeeva-sys 4.92
Julia SpecialFunctions 4.62

The Julia number is measured inside one Julia process launched by the Rust diagnostic program; it is a Julia-loop function-call timing, not per-call Rust-to-Julia FFI overhead.

Impact on ThiopheneIV Implied Volatility Solver

Values are max error in ULPs of reference total volatility on a specific set of options (see the ThiopheneIV paper for the definition of the exact zones). The rust implementation of the solver is available here

ThiopheneIV max ulp

backend CLY-3D CLY-20 CLY-80 Jaeckel Market Corners Stress HighVol
Cody 190 52 8 142 520 557 285 2
Commons 133 49 8 63 177 329 138 2
Johnson 400 71 16 97 1210 537 484 3

ThiopheneIV+ max ulp

backend CLY-3D CLY-20 CLY-80 Jaeckel Market Corners Stress HighVol
Cody 23 5 4 14 30 41 33 2
Commons 23 5 4 13 29 41 33 2
Johnson 22 5 7 13 29 41 32 3

The polished ThiopheneIV+ path is robust across erfcx choices. The unpolished fast solver is where Commons is clearly better for benchmark.

Impact on ThiopheneIV Performance

ThiopheneIV ns/call

backend CLY-3D CLY-20 CLY-80 Jaeckel Market Corners Stress HighVol
Cody 146 146 148 146 147 155 155 150
Commons 155 147 147 156 153 159 157 151
Johnson 179 180 188 211 203 185 204 161

ThiopheneIV+ ns/call

backend CLY-3D CLY-20 CLY-80 Jaeckel Market Corners Stress HighVol
Cody 199 200 199 185 193 210 205 187
Commons 202 203 197 197 200 214 210 189
Johnson 233 229 245 249 251 240 254 210

Cody is slightly faster than Commons for ThiopheneIV, but Commons is close and much more accurate. Johnson is slower in the solver context despite competitive standalone erfcx timing, most likely because its branch/table structure interacts differently with the full pricing paths.

Conclusion

Use the Commons erfcx implementation. It gives the best generic erfcx accuracy against the BigFloat reference among the tested Rust-callable implementations, the most stable 512-consecutive-ULP local-window profile, the best unpolished ThiopheneIV accuracy, and passes the strict reference/corner suite. Cody remains a good fast approximation, but it is of slighly lower accuracy (which impacts non-polished ThiopheneIV implied vols) and has a near-overflow-tail spike (very likely without impact on my use case).

Comments

comments powered by Disqus