Fast autocorrelation in R - behind the gist

I was doing some signal processing in R a few weeks ago and was somewhat disappointed in how long it was taking to run my experiments. Digging around a little, it seemed like the vast majority of time was being taken up in autocorrelation and cross-correlation computations.

In particular, I was computing a “Type-II” autocorrelation, defined as

$y_i = \sum_{j=0}^{n-1-i}{x_jx_{j+i}},\; 0 \leq i \lt n$

The inefficient way to compute this in directly in R is

brute force autocorrelation

acf.II <- function(signal)
{
  n <- length(signal)
  sapply(0:(n-1), function(lag) as.vector(signal[1:(n-lag)] %*% signal[(1+lag):n]))
}

But autocorrelation can be computed efficiently via FFT. You can get this efficient computation by calling the convolve function the right way.

autocorrelation using fft

acf.II <- function(signal)
{
  result <- convolve(signal,signal,type='open')
  result[-1:-length(result)/2]
}

But this version was still taking a long time and I had to poke into the convolve implementation to finally figure out why.

When convolve is called for type='open' with real arguments, it simplifies to the following.

convolve simplified

convolve.simplified <- function(x,y)
{
  nx <- length(x)
  ny <- length(y)
  x <- c(rep.int(0, ny - 1), x)
  y <- c(y, rep.int(0, nx - 1))

  x <- fft(fft(x) * Conj(fft(y)), inverse = TRUE)
  Re(x)/(nx+ny-1)
}

Can you see the two problems?

The thing that jumped out at me first is that it computes the same FFT twice when it is used for autocorrelation. That should give a 2x speedup.

The second took me a bit more trial and error investigation, including trying to benchmark the internal fft function against fftw. I was testing them with different sized vectors when it finally dawned on me that the length of my input vector was a power of 2 but the zero padding inside of convolve was creating a length that was no longer a power of 2. The FFT implementation is a bit slower for those lengths so convolve is doing three slow FFT operations when I just need two fast ones.

With that knowledge, I whipped up the following. Finally, I have a fast autocorrelation that runs in about 35% of the time as the convolve version did.

autocorrelation

# fast auto-correlation
acf.II <- function(x)
{
  n <- length(x)
  x <- c(x, rep.int(0, n))
  transform <- fft(x)
  x <- fft(Conj(transform) * transform, inverse = TRUE)
  Re(x)[1:n]/(2*n)
}

# fast cross-correlation
ccf.II <- function(x,y)
{
  nx <- length(x)
  ny <- length(y)
  x <- c(x, rep.int(0, ny))
  y <- c(rep.int(0, nx), y)
  x <- fft(fft(x) * Conj(fft(y)), inverse = TRUE)
  Re(x)[-1]/(nx+ny)
}