The calculations are performed using the standard representation for multiple-precision real numbers, with radix 10000. Multiple-precision multiplications use a Fast Fourier Transform (FFT) based method. Inverses and square roots are computed with Newton's iterative algorithm.
My goal, when writing Schnell_pi, was to make the program portable on (almost) any platform, but as fast as possible on a decent PC. Hence, the program is entirely written in C (without any line of assembler code), but the design is optimized for the Intel x86 architecture and the Gnu/Linux operating system. Each design decision has been taken in order to improve speed, at the expense of memory savings. I also focussed on the fastest computation of 1M digits; this means that the program could be differently optimized for a larger (or smaller) number of digits, resulting in better performances. However, the gain would be rather small.
Compared to the other AGM programs for computing Pi, Schnell_pi is typically two times faster. This is the consequence of various optimizations, among which the most important ones are:
Although Schnell_pi has been optimized for a Pentium III machine, it should also run on PentiumPro, Pentium II... processors. It has been succesfully tested on Pentium II and Athlon processors. I do not know for Pentium or older processors.
Schnell_pi uses only memory (no disk except for the final writing of the digits). The amount of memory needed is roughly 10.375 times the number of digits. With 256 MBytes of RAM, you may then compute 16 M digits of Pi. It is mandatory to have enough real memory avalaible. Otherwise, paging and swapping will take place decreasing tremendously the speed. Schnell_pi behaves very badly with virtual memory. If it starts swapping, it will basically never end and should be killed.
After the actual calculation of Pi, Schnell_pi writes the computed digits to a file named "pi_final", in a format very similar to the one used by Xavier Gourdon's PiFast program. The size of the file is roughly 1.4 times the number of digits. Warning: for big computations, the computation time might be wrong. Indeed, it uses the C function time(), which, because of coding of time (in microseconds) using a 32-bit integer, is only modulo 4294.97 seconds. Hence, you may have to add an integer multiple of 4294.97 seconds to the time returned in pi_final to get the actual computation time!
Because I want Schnell_pi to be as fast as possible, it uses only FFTs whose length is a power of 2. Hence, the number of computed digits is also a power of 2 (minus the last few digits which are inaccurate because of rounding errors and are not printed). Schnell_pi can compute as little as 1k (=1024) digits (even less, it works down to 16 digits...) and a theoretical maximum of 512 M (about 536 870 900) digits. The upper limitation is because Linux is a 32 bit operating system on the x86 architecture. With Alpha or Itanium processors, the limitation is most probably higher, although I did not yet have time to test it. The calculation of 512 M digits requires more than 5 Gbytes of RAM, which I do not have. Hence, Schnell_pi has been tested up to 256 M digits: the computed digits agree with those computed with Pi_AGM_2.3.1 and with the ones published by Kanada. Even for 256 M digits, the rounding errors in the floating-point FFTs are not a problem, in contrast with Carey Bloodworth's claim.
Schnell_pi is currently (August, 10, 2001) the fastest program in the world on a PC (15 s on a Pentium III 800 MHz for more than one million digits).
There is no version for M$ Windows, and there will never be any. My program is (I hope) well written in C and optimized for the Intel Pentium III architecture and a reasonable operating system, which rules out Windows-bullshit.
Neither the author of Schnell_pi nor the distributor of the binary are responsible if anything goes wrong with your computer!
Back to Schnell_pi homepage | How to run Schnell_pi? | Performances and timings | To do |