behind the gist

thoughts on code, analysis and craft

Correct autocorrelation in Objective-C

After some algorithm design in R, I was ready to implement the result on iOS using the Accelerate framework. The R version already gave me some performance challenges and the Accelerate version had its own twists.

The equivalent R code that I wanted to implement was

fast autocorrelation
1
2
3
4
5
6
7
8
acf.II <- function(x)
{
  n <- length(x)
  x <- c(x, rep.int(0, n))
  transform <- fft(x)
  x <- fft(Conj(transform) * transform, inverse = TRUE)
  Re(x)[1:n]/(2*n)
}

At a high level, the correct Accelerate functions to give the same result are something like the following, but unit testing showed this wasn’t generating the correct result.

simplified Accelerate code
1
2
3
4
5
6
7
8
9
10
11
float dsp_acf_II_fft_unscaled(const FFTSetup setup, float const* signal, DSPSplitComplex* temp, float* acf, vDSP_Length n)
{
  ...
  vDSP_vclr(...); // pad with 0
  vDSP_ctoz(...); // convert to correct representation for fft
  vDSP_fft_zrip(..., FFT_FORWARD); // compute transform
  vDSP_zvcmul(...); // compute Conj(transform) * transform
  vDSP_fft_zrip(..., FFT_INVERSE); // inverse transform
  vDSP_ztoc(...); // convert back to the desired output representation
  ...
}

I tracked down the culprit to the zvcmul function. It actually works fine, when given correct inputs. The problem is that the FFT computation does not generate a “plain” complex vector as expected by zvcmul. Instead, it generates a special packed representation. We therefore need a special version of zvcmul that works with this packed representation.

zvcmul for packed complex arrays
1
2
3
4
5
6
7
8
9
10
11
12
13
14
void dsp_zvcmul_packed(DSPSplitComplex const* a, vDSP_Stride stride_a,
                       DSPSplitComplex const* b, vDSP_Stride stride_b,
                       DSPSplitComplex const* c, vDSP_Stride stride_c, vDSP_Length n)
{
  if (n < 1) {
    return;
  }

  float dc_real = a->realp[0] * b->realp[0];
  float nyquist_real = a->imagp[0] * b->imagp[0];
  vDSP_zvcmul(a, stride_a, b, stride_b, c, stride_c, n);
  c->realp[0] = dc_real;
  c->imagp[0] = nyquist_real;
}

The full implementation of the autocorrelation is then available in this gist.