Intel overstates FPU accuracy
For nearly 20 years Intel has claimed high accuracy for the transcendental floating point instructions in its PC processor products. Intel documentation for the 1993 Pentium states: On the Pentium processor, the worst case error on functions is less than 1 ulp when rounding to the nearest-even and less than 1.5 ulps when rounding in other modes. This claim has never been true for the instructions fsin, fcos, fsincos, and fptan. The red in the plots below show ranges where the error exceeds 1.0 ulp. (see large versions).
The instructions produce large errors when the argument is large.
The instructions produce large errors when the argument is near certain multiples of pi/2.
The instructions produce larger than claimed errors even with small arguments at considerable distance from pi/2 multiples.
While these instructions are documented to support arguments in the range of -2^63 to 2^63, results are not uniformly accurate even when the argument range is restricted to 0 to 2pi.
The 1995 Intel publication Pentium Processor Family Developerís Manual Volume 3: Architecture and Programming Manual (download) states:
On the Pentium processor, the worst case error on functions is less than 1 ulp when rounding to the nearest-even and less than 1.5 ulps when rounding in other modes. The functions are guaranteed to be monotonic, with respect to the input operands, throughout the domain supported by the instruction. See Appendix G for detailed information on transcendental accuracy.
The scatter plots included in appendix G clearly show sin/cos/tan errors of less than one ulp for arguments up to the supported size of 2^63. Yet past and current Intel processors produce large errors when the argument is large. For example, the Intel core-i7 processor error for fcos (9223372035620657689) is 1881514444958111198875746304 ulp.
Here are FPU error scatter plots made from a modern Intel processor. Larger versions are here.
In a single Intel publication, the problem is partially acknowledged. The 1999 edition of Intel Architecture Software Developerís Manual Volume 1: Basic Architecture (download) states:
The trigonometric instructions may use a 66-bit approximation to the true value of pi to reduce the magnitude of the input argument. In this case, the final computed result can vary considerably from the true mathematically precise result.
This statement, which never again appears in a public Intel document, acknowledges the argument reduction problem. However, the entirely separate problem of large errors near argument multiples of pi/2 is not addressed.
A Windows utility demonstrates these larger than expected errors. The utility uses the gnu MPFR library to calculate the expected result. It then executes the Intel FPU instruction and calculates the error in units of ULP, as described in the Intel documentation. Both calculations use round nearest mode, so that the maximum error expected from the Intel instruction is 1.0 ULP.
Scatter plots made from the output of this utility give a visual representation of the results.
Here are exemplary failures from an Intel core-I7 processor.
The accuracy limitations of the Intel FPU trigonometric functions is known in the software community and is acknowledged by Intel engineers:
Processor vendors: Document the limitations of the fsin, fsincos, fcos, and fptan instructions.
Compiler and library vendors: Follow the example of OpenCL and allow the software developer to choose between slower, high accuracy software functions or faster but less accurate hardware instructions. For example, OpenCL function native_cos is fast though not accurate. OpenCL function cos is slower but meets its accuracy specification.