After some algorithm design in R, I was ready to implement the result on iOS using the
Accelerate framework.
The R version already gave me some performance challenges and the Accelerate
version had its own twists.
The equivalent R code that I wanted to implement was
fast autocorrelation
12345678
acf.II <-function(x){ n <- length(x) x <- c(x, rep.int(0, n)) transform <- fft(x) x <- fft(Conj(transform)* transform, inverse =TRUE) Re(x)[1:n]/(2*n)}
At a high level, the correct Accelerate functions to give the same result are something like the following, but unit testing showed this wasn’t generating the correct result.
simplified Accelerate code
1234567891011
floatdsp_acf_II_fft_unscaled(constFFTSetupsetup,floatconst*signal,DSPSplitComplex*temp,float*acf,vDSP_Lengthn){...vDSP_vclr(...);// pad with 0vDSP_ctoz(...);// convert to correct representation for fftvDSP_fft_zrip(...,FFT_FORWARD);// compute transformvDSP_zvcmul(...);// compute Conj(transform) * transformvDSP_fft_zrip(...,FFT_INVERSE);// inverse transformvDSP_ztoc(...);// convert back to the desired output representation...}
I tracked down the culprit to the zvcmul function. It actually works fine, when given correct inputs. The problem is that the FFT computation does not generate a “plain” complex vector as expected by zvcmul. Instead, it generates a special packed representation. We therefore need a special version of zvcmul that works with this packed representation.