FFTW [40, 37], the "Faster Fourier Transform in the West," is arguably the best public-domain FFT package available. It features both real and complex multidimensional transforms and is available in sequential, multithreaded, and parallel versions. FFTW uses runtime optimization of the desired transform to adapt to the runtime platform. Furthermore, it claims that the optimizer will become more sophisticated over time.
Since Fourier transforms are typically executed many times on different data, FFTW has separate create/destroy and execute calls. A notable feature of the create call is a flag with values FFTW_ESTIMATE and FFTW_MEASURE, which determines the dynamic choice of a suitable implementation of the desired transform. With the former value, the package picks an implementation at essentially no cost, but probably with suboptimal performance. The latter value instructs FFTW to run and measure the execution time of several FFTs in order to find the best way to compute the desired transform. This process may take several seconds, depending on the platform and the size of the transform. Further flags FFTW_PATIENT and FFTW_EXHAUSTIVE can give even "more optimal" performance. Transform implementations found through this search mechanism are stored in a datatype fftw_plan; plans can be exported and imported between runs in a mechanism called "wisdom."
Also influencing the speed of FFTW is the fact that it can take advantage of SIMD instructions, such as SSE/SSE2 (Intel), 3DNow! (AMD), and Altivec (PowerPC). The user must align data correctly, as described in the manual.
FFTW is written in C, but wrapper code is provided to facilitate an interface to Fortran.
As of this writing, version 3 of the package does not yet support MPI parallelism, but version 2 does. The two versions have slightly different calling conventions.