3.1. FFTW3 Library#
The FFTW3 library is a C/C++ library for computing DFT (and some other related transforms) in one or more dimensions, of arbitrary input size, and of both real- and complex-valued data. The FFTW3 library has been highly optimized to run with different levels of precision and under a number of CPU architectures. We will employ in most cases the multi-threaded, single-precision, 1-d FFT library functions.
The nice thing about FFTW3 is that a detailed manual [2] is available. One should consult the manual to check out the many different settings, configurations, and usages of the library. Here, we briefly summarize the steps to use FFTW3 to do a single-precision, 1-d FFT calculation:
Allocate memory to hold the input and output arrays
fftwf_complex* in = (fftwf_complex*) fftwf_malloc(sizeof(fftw_complex) * N); fftwf_complex* out = (fftwf_complex*) fftwf_malloc(sizeof(fftw_complex) * N);
The method
fftwf_malloc
should be used to work best with SIMD instructions. The typefftwf_complex
is the compatible withstd::complex<float>
.Instantiate a plan for the FFT using library methods such as
fftwf_plan_dft_1d
:fftwf_plan p = fftwf_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
where
N
specifies the size of the FFT,FFTW_FORWARD
is a flag specifying forward FFT should be calculated, andFFTW_ESTIMATE
is another flag that tells FFTW3 the strategy that it should use to speed up the FFT implementation.Enter the FFT input data to the input array
in
, calculate the FFT by executing the FFT plan:fftwf_execute(p);
and the FFT output will be stored in the array
out
after execution.After all FFT calculations are done, clean up by destroying the plan and freeing up the arrays:
fftwf_destroy_plan(p); fftwf_free(in); fftwf_free(out);
Let \(N\) be the FFT size and \(x[n]\) and \(X_k\) be the inputs of the forward (
FFTW_FORWARD
) and inverse (FFTW_BACKWARD
) FFT plans above, respectively. Executing the forward plan calculates\[\begin{equation*} \sum_{n=0}^{N-1} x_n e^{-j \frac{2\pi n k}{N}}, \end{equation*}\]while executing the inverse plan calculates
\[\begin{equation*} \sum_{k=0}^{N-1} X_k e^{j \frac{2\pi n k}{N}}. \end{equation*}\]Note that one must remember to divide the output of the inverse FFT calculation obtained from executing the inverse plan by \(N\) in order to get the “standard” inverse FFT result.
While FFTW3 allows any FFT size, I would recommend sticking to powers of 2 for most efficient calculations.
In addition to single-precision arithmetic with the
float
type, FFTW3 also supports doing calculation with thedouble
andlong double
types. To usedouble
andlong double
, simply replace the prefixfftwf
in each method or type name byfftw
andfftwl
, respectively.The method
fftwf_plan_dft_1d
above is one of the many basic interfaces that FFTW3 provides. A basic interface computes a single transform of contiguous data. See [2] for other basic interfaces provided. FFTW3 also provides two more levels of interfaces — advanced and guru. An advanced interface computes multiple transforms over contiguous or strided data. A guru interface supports the most general data layouts, multiplicities, and strides.For later purposes, we often need to perform multiple FFT calculations on a batch of samples. To avoid overheads incurred in calling a basic interface method multiple times, it is beneficial to use advanced or guru interface instead. For ease of later use and as examples to show the use of the advanced and guru interface, I have implemented the class
fft
which serves as a simple wrapper to use an advanced interface for doing multi-threaded, single-precision, 1-d FFT calculations on a batch of samples in the header file fft.hpp. There are two constructors implemented:One uses the advanced interface to do multiple FFTs on a long 1-d array of samples.
The other uses the guru interface to do multiple FFTs along the rows of a 2-d array of samples.
Steps 1-2 and 4 above are implemented in the constructors and destructor of
fft
whereas the FFT calculations on the whole batch can be executed by calling the class methodfft::calculate()
. The “divide-by-\(N\)” normalization mentioned above is included infft::calculate()
for inverse FFT.To use the wrapper:
Instantiate the
fft
object:// Example 1: // To perform nblocks forward fftsize-point FFTs on a 1-d // array of samples use the following constructor: fft fwd_fft(nthreads, fftsize, nblocks, fftsize, fftsize, false); // // Example 2: // To perform nblocks inverse fftsize-point FFTs along each // of the D rows of a 2-d array of samples use the following constructor: fft inv_fft(nthreads, fftsize, nblocks, fftsize, fftsize, D, true); // // Both examples use nthreads threads and move forward // fftsize steps to get to the next input and output FFT block
Get pointer to the input and output arrays:
std::complex<float>* in = fwd_fft.get_in(); std::complex<float>* out = fwd_fft.get_out();
Perform FFT calculations:
// Load samples into the array pointed to by in fwd_fft.calculate(); // Output stored in the array pointed to by out
Refer to fft.hpp and the example code
test_fft.cpp
for detailed information about how to use the wrapper.