Which erfcx?
The use of erfcx instead of direct erfc or CDF in a Black-Scholes implied volatility solver leads to gain in accuracy and performance in general. But which erfcx should we use?
This note compares practical erfcx implementations for Rust implied volatility solvers:
- Commons: the local Rust port of Apache Commons Numbers
BoostErf.erfcx - Cody: the Cody rational approximation exposed by the
jaeckel = 0.2.0crate during this comparison, originally coming from Netlib. - Johnson: Steven G. Johnson’s Faddeeva implementation, as exposed by
errorfunctions = 0.2.0,faddeeva-sys = 0.1.0, and JuliaSpecialFunctions.erfcx(::Float64)
The Commons-style implementation is a local Rust port of the Apache Commons Numbers/BoostErf approach. It computes the small central region through the local erf rational approximation and uses exp_m1(x*x) to avoid cancellation in exp(x^2) * erfc(x) near zero. For positive inputs it uses several rational subintervals: a central polynomial/rational erf branch, erfc-style rational branches around 0.5..1.5, 1.5..2.5, and 2.5..4, and an asymptotic rational in 1/x^2 beyond that. For very large positive x, it returns 1 / (sqrt(pi) * x). For negative x, it uses the reflected form with a split-square exponential, returns Inf below the finite overflow cutoff, and otherwise computes 2 exp(x^2) - erfcx(|x|) in a way that preserves more bits near the overflow boundary.
The Cody implementation used here is the jaeckel crate’s Cody/CALERF-style implementation. It has three main positive-side rational regions: AB for |x| <= 0.46875, CD for 0.46875 < |x| <= 4, and PQ as an asymptotic rational in 1/x^2 for |x| > 4. It uses smoothed exponentials by splitting x onto a 1/16 grid before applying the residual exponential. For negative x, it reflects with erfcx(-x) = 2 exp(x^2) - erfcx(x) and clamps below its XNEG threshold to f64::MAX.

Cody near-overflow spike.
The rare Cody spike in the finite negative overflow-boundary window is not caused by using the wrong rational interval but is deliberate. Cody/CALERF does use the PQ asymptotic rational for the positive |x| > 4 part, but the negative-side fix-up is still the reflected 2 exp(x^2) - erfcx(|x|) form with an XNEG overflow guard. The published IEEE double CALERF constant is the rounded XNEG = -26.628D0; the public Rust crate uses a refined XNEG = -26.6287357137514 with a source comment noting that the original value was -26.628. The plot spike is therefore boundary/guard behavior inherited from the CALERF-style negative branch, with a refined but still conservative cutoff near the true finite Float64 overflow boundary. A literal Cody reference would clamp even earlier rather than use a separate high-accuracy negative-tail expansion in that range.
errorfunctions = 0.2.0 is a close Rust translation of Steven G. Johnson’s Faddeeva C/C++ package. Its real erfcx implementation uses:
- a 100-interval Chebyshev table after the map
y = 4 / (4 + x)for0 <= x <= 50 - a continued-fraction expansion for
x > 50 - the reflection identity
erfcx(-x) = 2 exp(x^2) - erfcx(x)for negativex
Julia SpecialFunctions.erfcx(::Float64) calls Faddeeva_erfcx_re from libopenspecfun, and faddeeva-sys binds the same Johnson C symbol. On the sampled Float64 grid below, errorfunctions, faddeeva-sys, and Julia SpecialFunctions were bit-identical.
Generic Erfcx Accuracy
Reference: Julia BigFloat erfcx(x) at 256-bit precision, rounded back to Float64. The grid used 7021 total samples, of which 6272 had finite rounded Float64 references.
| backend | samples | invalid | max_ulp | max_rel | max_abs | worst_x |
|---|---|---|---|---|---|---|
| Commons | 6272 | 0 | 1 | 1.2e-16 | 4.990e291 | -2.6600845000e1 |
| Cody | 6272 | 0 | 337 | 3.7e-14 | 6.726e294 | -2.6628735714e1 |
| Johnson | 6272 | 0 | 491 | 5.5e-14 | 1.614e212 | -2.2769845000e1 |
The large absolute errors occur where erfcx(x) itself is enormous. The relative errors are still small, but the ULP differences matter because the IV tests include very sensitive tiny and saturated cases.
512-ULP Local Window Accuracy
The coarse grid above is useful for broad coverage, but it does not answer the more local question: what happens across consecutive representable inputs near the sensitive regions? I ran a temporary Rust verifier that generated 512 consecutive Float64 inputs around 37 representative centers, including the old coarse-grid worst tail locations and the finite overflow boundary. References are Julia BigFloat erfcx(x) at 256-bit precision rounded back to Float64. ULP error is the positive-Float64 bit-distance from that rounded reference.
The windows covered 18,944 input values. The finite-reference tables below use 18,176 values; 768 inputs from the overflow-boundary windows round to Inf and are excluded from the finite-error percentiles.
Global local-window stats:
| backend | samples | invalid | exact_bits | p50 | p95 | p99 | max | max_rel | worst_x |
|---|---|---|---|---|---|---|---|---|---|
| Commons | 18176 | 0 | 13682 | 0 | 1 | 1 | 2 | 3.6e-16 | 2.5000000000e0 |
| Cody | 18176 | 0 | 10375 | 0 | 2 | 3 | 41238 | 4.6e-12 | -2.6628735714e1 |
| Johnson | 18176 | 0 | 6490 | 1 | 395 | 509 | 511 | 5.7e-14 | -2.6628700000e1 |
The p95/p99 picture is clearer than the max alone. Commons-style stays within 1 ulp at p99 and within 2 ulps at max. Cody is usually close, but has a rare near-overflow-tail spike. Johnson is the opposite shape: no huge single outlier here, but the negative tail is broadly hundreds of ulps away.
In the mid negative tail around x = -24, Johnson is consistently about 510 ulps away while Cody and Default/Commons stay close to the rounded BigFloat reference.
| region | backend | samples | p95 | p99 | max |
|---|---|---|---|---|---|
| negative near-overflow tail | Commons | 256 | 1 | 1 | 1 |
| negative near-overflow tail | Cody | 256 | 19083 | 36126 | 41238 |
| negative near-overflow tail | Johnson | 256 | 487 | 506 | 511 |
| negative tail | Commons | 512 | 1 | 1 | 1 |
| negative tail | Cody | 512 | 4 | 5 | 5 |
| negative tail | Johnson | 512 | 510 | 510 | 510 |
Zone definitions:
- neg-near-overflow x < -26
- neg-tail -26 <= x < -6.1
- neg-transition -6.1 <= x < -0.5
- central -0.5 <= x <= 0.5
- pos-core 0.5 < x < 4
- pos-tail 4 <= x <= 50
- pos-far-tail x > 50
Per-zone local-window stats:
| zone | backend | samples | invalid | exact_bits | p50 | p95 | p99 | max | max_rel | worst_x |
|---|---|---|---|---|---|---|---|---|---|---|
| neg-near-overflow | Commons | 3584 | 0 | 2682 | 0 | 1 | 1 | 1 | 1.9e-16 | -2.6e1 |
| neg-near-overflow | Cody | 3584 | 0 | 1611 | 1 | 2 | 3 | 41238 | 4.6e-12 | -2.7e1 |
| neg-near-overflow | Johnson | 3584 | 0 | 34 | 218 | 460 | 496 | 511 | 5.7e-14 | -2.7e1 |
| neg-tail | Commons | 3584 | 0 | 2885 | 0 | 1 | 1 | 1 | 2.1e-16 | -6.1e0 |
| neg-tail | Cody | 3584 | 0 | 2069 | 0 | 2 | 4 | 5 | 6.3e-16 | -2.4e1 |
| neg-tail | Johnson | 3584 | 0 | 872 | 26 | 509 | 510 | 510 | 5.7e-14 | -2.4e1 |
| neg-transition | Commons | 2048 | 0 | 1446 | 0 | 1 | 1 | 2 | 2.3e-16 | -5.0e-1 |
| neg-transition | Cody | 2048 | 0 | 1155 | 0 | 1 | 2 | 3 | 3.4e-16 | -5.0e-1 |
| neg-transition | Johnson | 2048 | 0 | 1291 | 0 | 15 | 24 | 26 | 3.6e-15 | -6.1e0 |
| central | Commons | 2049 | 0 | 1649 | 0 | 1 | 1 | 2 | 3.6e-16 | 5.0e-1 |
| central | Cody | 2049 | 0 | 1313 | 0 | 1 | 2 | 3 | 3.6e-16 | -5.0e-1 |
| central | Johnson | 2049 | 0 | 902 | 1 | 3 | 4 | 5 | 7.2e-16 | 2.5e-1 |
| pos-core | Commons | 2559 | 0 | 1772 | 0 | 1 | 1 | 2 | 3.6e-16 | 2.5e0 |
| pos-core | Cody | 2559 | 0 | 982 | 1 | 3 | 4 | 5 | 6.6e-16 | 2.5e0 |
| pos-core | Johnson | 2559 | 0 | 793 | 1 | 2 | 3 | 4 | 7.2e-16 | 5.0e-1 |
| pos-tail | Commons | 2049 | 0 | 1294 | 0 | 1 | 1 | 1 | 2.0e-16 | 5.0e1 |
| pos-tail | Cody | 2049 | 0 | 1291 | 0 | 1 | 1 | 1 | 2.0e-16 | 5.0e1 |
| pos-tail | Johnson | 2049 | 0 | 1047 | 0 | 1 | 2 | 2 | 4.1e-16 | 2.0e1 |
| pos-far-tail | Commons | 2303 | 0 | 1954 | 0 | 1 | 1 | 1 | 1.9e-16 | 1.0e8 |
| pos-far-tail | Cody | 2303 | 0 | 1954 | 0 | 1 | 1 | 1 | 1.9e-16 | 1.0e8 |
| pos-far-tail | Johnson | 2303 | 0 | 1551 | 0 | 1 | 1 | 2 | 3.1e-16 | 1.0e2 |
Local-window timing on the same 18,944 samples, best of five passes with 1000 sweeps per pass:
| backend | ns/call |
|---|---|
| Commons | 9.86 |
| Cody | 10.04 |
| Johnson | 6.28 |
So yes: the tail is the right place to look. The worst Johnson behavior is the negative tail and near-overflow tail. Cody has a rare sharper spike right at the finite overflow boundary. Commons is the most stable over these consecutive-ULP windows.
Generic Erfcx Timing
Mixed grid, 24003 samples, best of five timing passes, 100 sweeps per pass.
| backend | ns/call |
|---|---|
| Commons | 6.29 |
| Cody | 6.55 |
| Johnson/errorfunctions | 6.05 |
| Johnson/faddeeva-sys | 4.92 |
| Julia SpecialFunctions | 4.62 |
The Julia number is measured inside one Julia process launched by the Rust diagnostic program; it is a Julia-loop function-call timing, not per-call Rust-to-Julia FFI overhead.
Impact on ThiopheneIV Implied Volatility Solver
Values are max error in ULPs of reference total volatility on a specific set of options (see the ThiopheneIV paper for the definition of the exact zones). The rust implementation of the solver is available here
ThiopheneIV max ulp
| backend | CLY-3D | CLY-20 | CLY-80 | Jaeckel | Market | Corners | Stress | HighVol |
|---|---|---|---|---|---|---|---|---|
| Cody | 190 | 52 | 8 | 142 | 520 | 557 | 285 | 2 |
| Commons | 133 | 49 | 8 | 63 | 177 | 329 | 138 | 2 |
| Johnson | 400 | 71 | 16 | 97 | 1210 | 537 | 484 | 3 |
ThiopheneIV+ max ulp
| backend | CLY-3D | CLY-20 | CLY-80 | Jaeckel | Market | Corners | Stress | HighVol |
|---|---|---|---|---|---|---|---|---|
| Cody | 23 | 5 | 4 | 14 | 30 | 41 | 33 | 2 |
| Commons | 23 | 5 | 4 | 13 | 29 | 41 | 33 | 2 |
| Johnson | 22 | 5 | 7 | 13 | 29 | 41 | 32 | 3 |
The polished ThiopheneIV+ path is robust across erfcx choices. The unpolished fast solver is where Commons is clearly better for benchmark.
Impact on ThiopheneIV Performance
ThiopheneIV ns/call
| backend | CLY-3D | CLY-20 | CLY-80 | Jaeckel | Market | Corners | Stress | HighVol |
|---|---|---|---|---|---|---|---|---|
| Cody | 146 | 146 | 148 | 146 | 147 | 155 | 155 | 150 |
| Commons | 155 | 147 | 147 | 156 | 153 | 159 | 157 | 151 |
| Johnson | 179 | 180 | 188 | 211 | 203 | 185 | 204 | 161 |
ThiopheneIV+ ns/call
| backend | CLY-3D | CLY-20 | CLY-80 | Jaeckel | Market | Corners | Stress | HighVol |
|---|---|---|---|---|---|---|---|---|
| Cody | 199 | 200 | 199 | 185 | 193 | 210 | 205 | 187 |
| Commons | 202 | 203 | 197 | 197 | 200 | 214 | 210 | 189 |
| Johnson | 233 | 229 | 245 | 249 | 251 | 240 | 254 | 210 |
Cody is slightly faster than Commons for ThiopheneIV, but Commons is close and much more accurate. Johnson is slower in the solver context despite competitive standalone erfcx timing, most likely because its branch/table structure interacts differently with the full pricing paths.
Conclusion
Use the Commons erfcx implementation. It gives the best generic erfcx accuracy against the BigFloat reference among the tested Rust-callable implementations, the most stable 512-consecutive-ULP local-window profile, the best unpolished ThiopheneIV accuracy, and passes the strict reference/corner suite. Cody remains a good fast approximation, but it is of slighly lower accuracy (which impacts non-polished ThiopheneIV implied vols) and has a near-overflow-tail spike (very likely without impact on my use case).