ppsearch 2.4

November 18, 2012: version 2.4 - performance improvements

Execution time in seconds for: "ppsearch.exe bits=2 maxbits=929 loop" Intel Corei7 2600K 4.5GHz version 2.3 version 2.4 SSE AVX SSE AVX Microsoft 402 148 344 122 gcc n/a n/a 348 111

New function shiftLeftThenXor replaces a shiftLeft/xor sequence in the optimized reduction function. Though the total number of shift and xor operations remains the same, elimination of memory load/store for the xor operation improves performance.
Add support for gcc compiler. The gcc code runs 10% faster for AVX execution (the Microsoft build is slightly faster when AVX is not available).
When processor supports AVX, the program runs AVX optimized versions of all functions. While version 2.2 and 2.3 utilize the AVX clmul instruction, they do not enable the AVX enhancements to SSE instructions.
Tune the code that screens candidate polynomials using small divisors: Remove duplicates from the small divisor list. Replace 32-bit registers with 64-bit registers in the function that divides by a small polynomial. Run this screen on all polynomials instead of those of degree 96 or higher.
Correct loop count too big by one in function multiplyPolynomialAvx for a slight performance improvement.
Correct some errors in the debug output logged when option verbose is used.

source code

factor files - These are derived from a list from here, and reformatted with a utility. In addition, selected factorizations beyond the 2^1200 -1 cutoff of the Cunningham tables have been added. Unlike the factorizations through 2^1200-1, these additional factorizations do not come from a recognized source. Therefore, the primeness of these factors should be confirmed before using them.

sample win64 executable built with gcc 4.7.3 compiler.

sample win64 executable built with Microsoft Visual Studio 2010 compiler.