Fast Software LFSR
portable C and x86 optimized versions

 

How fast can a software LFSR run on a modern (2012) x86 processor? Here are measurements from an Intel Core i7-2600K running at 4.0 GHz. The numbers are measured nanoseconds per LFSR update cycle.

 

      LFSR Width    GPR (portable)      XMM  (AVX)       YMM (AVX2)
         bits         MS    gcc         MS    gcc        MS    gcc

          32          1.5   0.8         0.8   0.7         ?     ?
          64          1.5   0.8         1.0   1.0         ?     ?
         128          1.4   1.3         1.3   1.3         ?     ?
         256          2.1   2.2         1.8   2.0         ?     ?
         512          7.8   3.7         4.8   2.9         ?     ?
        1024         21.0  11.3        11.3   5.8         ?     ?
        2048         42.3  22.0        22.3  13.0         ?     ?
        4096         86.0  81.0        48.8  44.4         ?     ?
 

Notes:

1) The LFSR code and speed of execution are independent of polynomial selection.

2) LFSR width 8192, 16384 ... were not coded because primitive polynomials needed for maximal length sequence generation are not available. Once (2^8192 - 1) is successfully factored, LFSR width 8192 will be possible.

3) MS: Microsoft Visual Studio 2012 compiler.

4) gcc: version 4.7.3 (mingw-w64-bin-x86_64-20121103.7z) from http://www.drangon.org/mingw/.
note: gcc 4.8.0 and 4.8.1 produce slower code for this benchmark/processor combination.

5) Results are for x64 build. The code also executes properly when compiled for x86.

6) The YMM (AVX2) columns will be filled out when the Intel Haswell processor is released in 2013. A significant speed improvement can be expected for LFSR width 256 and greater. The AVX2 code is included and works properly with Intel Software Development Emulator 5.31.

7) The software first runs the GPR (general purpose register) version of the code. Then the XMM version is run if both processor and OS support AVX. Finally, the YMM coded functions run if both the processor and OS support AVX2.

 

 Revision History

version 1.0   11/12/2012  Initial version.

version 1.1   11/20/2012  Remove -flto from the gcc build command because AVX instructions were being generated where not intended, causing a crash on systems where AVX is not present.

version 1.2   12/30/2012  Add 4096 bit LFSR. Switch from VS2010 to VS2012 for Microsoft build.

version 1.3   08/14/2013  Correct allocation size for static array in lfsrymm.c. Add version number display to test program startup. Add obj file cleanup to test batch file.