Analysing the Serato DJ timecode signal 2

Serato DJ (and previously Scratch Live) is one of the leading DJ software applications out there, and in my personal experience by far the most widely used. I have played almost all of my dj gigs with the Serato software since January 2007. Serato DJ and the old Scratch Live, together with an appropriate sound card, form a so called DVS setup, which is short for Digital Vinyl System. Nowadays these software solutions do much much more than what the name DVS implies, but the core function is still the same that it was in 2004 when Serato Scratch Live was introduced.

The fundamental idea is taking a traditional dj setup with either turntables or cd players (or both), and instead of playing actual records (vinyl or cd), the music comes from a computer. The main point is that the playback is still controlled with the vinyls or cd’s just like you would with real records, but the actual audio data is stored digitally on the computer. So instead of music, the physical records used with a DVS contain a timecode signal that is send to the software, which in turn controls the audio file. Here’s an old example picture of a DVS setup using the Rane SL2 usb interface (sound card):

Rane Serato DVS setup


Alright that’s enough introduction, let’s get down to business. What I want to know is how it actually works, and how accurate it can be. To control the playback of the digital audio files and enable things like scratching, where the direction and speed of the playback changes rapidly, the timecode has to describe the current playback speed, direction and position accurately and reliably. The signal found on a Serato control vinyl or cd is a regular stereo audio file, and the official description of it goes like this:

“There are two parts to the Serato DJ Control Vinyl: The directional tone, and the NoiseMap™. Listening to the control vinyl, the directional tone is the 1 kHz tone. The NoiseMap™ sounds like random noise over the top of the tone. The directional tone provides the current speed and direction of the record, while the NoiseMap™ tells the software precisely where on the record the needle is currently. “

Indeed, when comparing the control signal to a pure 1 kHz tone, it sounds quite noisy.


But how does it actually work? Well, let’s fire up Matlab and have a closer look.















From the frequency spectrum we can confirm that the signal is indeed exactly 1 kHz, and looking at the waveforms both channels contain the same sine wave, but the phase of the right channel is 90 degrees ahead. This is confirmed in the “padded right channel picture”, where the right channel has been shifted 90 degrees backwards and the waveforms line up nicely. The distinctly varying amplitudes of the sine wave suggests the use of Amplitude shift keying -modulation, which is a digital version of amplitude modulation. From here we can already deduce two of the three control parameters.



Detecting speed changes is really straightforward. Changing the speed, aka pitch, linearly affects the frequency, so a 1% speed increase corresponds to the frequency:

 f_{+1\%} = 1000 \ Hz \times 1,01 = 1010 \ Hz

And 16% speed decrease:

 f_{-16\%} = 1000 \ Hz \times 0,84 = 840 \ Hz

Therefore one can just do a quick Fourier transform, for example using STFT, to get the frequency and then calculate the corresponding speed. The performance limiting factor here is the frequency resolution of the Fourier transform, i.e. how small changes in the frequency can be detected and how long does it take. The frequency resolution of FFT is simply:

 r = \dfrac{F_s}{N},  where  F_s  is the sampling frequency and  N  is the number of FFT points (samples).

So to get a resolution of 1 Hz, which corresponds to 0,1 % speed change, we would need  N = 44100 when  F_s = 44100 \ Hz  . This means it would take one second to read a change in the speed with this accuracy (since we need 44100 samples for the FFT). One second between speed readings is not good. However, there is an easy way to reduce the time, as we don’t need the whole frequency spectrum all the way to 20 000 Hz, which we are now calculating every time. With a maximum speed change of 100%, the frequency will be between 500 Hz and 2000 Hz. Therefore for the FFT calculation, the sampling rate could be lowered to as low as 4000, which gives an usable frequency range of 0-2000 Hz, that covers the range of the control signal. This way, to achieve the same frequency resolution of 1 Hz we only need 4000 samples for the FFT, and the time between speed measurements is ideally:

 t = \frac{4000}{44100} = 0.0907 \ s \ or \ approx. \ 91 \ ms

Of course there will be some overhead from all the calculations, but I think an overall update speed of around a hundred milliseconds should be enough even for fast scratching, which would allow the system to detect ten speed changes in a second. Serato probably has an even better / more optimized detection and calculation method for this, but these calculations were just to see that it’s feasible. One possible method would be to just stay in the time domain, and look at the length of one period, since  f = \frac{1}{T} . Also, most of the DJ hardware uses an ADC sampling rate of 48 kHz or higher instead of the 44,1 kHz used here, which changes the results a bit.



This one is also very simple. There’s a reason for the 90 degree phase-shift between the channels: when the signal is played forward, the right channels sine wave is ahead, and when played backwards, the left channel is ahead by 90 degrees. This could be detected in many ways, and ideally it only takes two consecutive samples to determine the direction, which would be really fast. Assuming that we take a whole one period of the signal to detect the direction, it would still only take around one millisecond on normal speed.






Now comes the hard part. This is were the so called NoiseMap comes in. We need to have some sort of sequence or pattern in the signal that tells us where we are, i.e. a position marker.




Firstly,  there’s what appears to be amplitude shift keying (the spikes in the waveform). The simplest way to interpret this would be to just look at the positive part and assume two amplitude levels: one period encodes one binary value, where a high amplitude in corresponds to binary one and low amplitude to zero. This gives us a binary sequence that follows the amplitudes:



The first 64 binary values extracted with this method:



This could work in practice, for example we could use a 16 bit sequence as one position marker. Reading this marker would take 16 periods, which corresponds to 16 milliseconds (one period is  \frac{1}{1000}) at normal speed. The A-side on the Serato vinyl is 10 minutes (600 s) long and the B-side is 15 minutes (900 s), so there are 600 000 and 900 000 wave periods per side. There would be

 \frac{1000}{16} = 62.5

position markers per one second, which sounds reasonable, and

 \frac{600 000}{16} = 37500   (side A), and

 \ \frac{900 000}{16} = 56250 (side B)

position markers for the whole track, which is doable since with 16 bits we can represent

 2^{16} = 65536

different values. Of course by using more amplitude levels, and for example by taking the absolute value of the sine wave it would be possible to pack a lot more data into one period.


However, in this old interview from 2011, Dylan Wood from Serato says the following:

Behind the Scenes: Interview with Serato R&D

The difference between timecode and our noisemap is that our noisemap is not a sequential series of markers that the software counts to know where it is, it’s more a pattern. Imagine that timecode is like going along a road, and you [need to] pass a certain number of markers until you know where you are. Our unique noise map is more like an island – if you’re a fisherman and you know that island really really well, you can be plonked at any point on that island and know immediately where you are, it’s more like a map, a picture that our software knows REALLY well.”


Maybe this is just marketing talk, but this answer suggests that instead of a one dimensional sequence, like we just obtained, the noisemap seems to be more akin to a 2d pattern, i.e. a picture. Also, what has been bothering me is that why does the control signal sound so noisy, when the frequency spectrum shows only a single tone. So let’s look at the spectrogram of the signal to see what is happening with the frequency in relation to time.

First, here is a reference spectrogram for a regular 1 kHz tone:



Serato control signal:






There is clearly something interesting going on here. The first Serato spectrogram looks a little bit like a QR code, and zooming in there seems to be patterns in the spectrum. This might indeed be what Serato calls their noisemap. This seems like a good place to stop my analysis, since everything further would just be pure speculation.



Here’s all the Matlab magic:

% Akseli Lukkarila / DJ Esgrove

clear all;

% read audio file, 10 sec sample without the silence in the beginning
[xt, Fs] = audioread('Serato_Control_CD.wav', [1363, 1363+10*44100]); 

N = max(size(xt));    % samples
Ts = 1/Fs;            % sample time 
t = 0:Ts:(N-1)*Ts;    % time vector

xt = xt/max(xt(:,1)); % normalize to 1

xleft = xt(1:N)';     % left channel
xright = xt(1:N, 2);  % right channel

%% Time plots

plot(t, xleft); grid on;
title({'Serato Control Signal';'left channel'});
xlabel('Time (ms)');
axis([0 0.1 -1.05 1.05]);
set(gca,'XTickLabel',{0 10 20 30 40 50 60 70 80 90 100});
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_left', '-dpng', '-r300'); 

plot(t, xright, 'r'); grid on;
title({'Serato Control Signal';'right channel'});
xlabel('Time (ms)');
axis([0 0.1 -1.05 1.05]);
set(gca,'XTickLabel',{0 10 20 30 40 50 60 70 80 90 100});
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_right', '-dpng', '-r300'); 

plot(t, xt); grid on;
title({'Serato Control Signal';'stereo'});
xlabel('Time (ms)');
axis([0 0.1 -1.05 1.05]);
set(gca,'XTickLabel',{0 10 20 30 40 50 60 70 80 90 100});
legend('Left', 'Right', 'Location', 'Best');
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_stereo', '-dpng', '-r300'); 

plot(t, xt); grid on;
title({'Serato Control Signal';'stereo'});
xlabel('Time (ms)');
axis([0 0.01 -1.05 1.05]);
set(gca,'XTick', 0:0.001:0.01);
set(gca,'XTickLabel',{0 1 2 3 4 5 6 7 8 9 10});
legend('Left', 'Right', 'Location', 'Best');
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_stereo2', '-dpng', '-r300'); 

%% Frequency plots

Xf = 2*abs((fft(xleft)/N)); % real part
Xf = 20*log(Xf); % dB

f0 = Fs/N; % frequency resolution (Hz)
f1 = 0:f0:(N-1)*f0; % frequency vector

semilogx(f1, Xf); grid on;
title({'Serato Scratch Live Control Signal';'frequency spectrum'});
ylabel('Magnitude (dB)');
xlabel('Frequency (Hz)');
axis([20 20000 -90 3]);
set(gca,'XTick',[20 50 100 200 500 1000 2000 5000 10000 20000])
set(gca,'XTickLabel',{20 50 100 200 500 '1k' '2k' '5k' '10k' '20k'})
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_fft', '-dpng', '-r300'); 

plot(f1, Xf); grid on;
title({'Serato Scratch Live Control Signal';'frequency spectrum'});
ylabel('Magnitude (dB)');
xlabel('Frequency (Hz)');
axis([900 1100 -90 3]);
set(gca,'XTick',[950 1000 1050])
set(gca,'XTickLabel',{950 1000 1050})
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_fft2', '-dpng', '-r300'); 

%% Pure sine wave reference

% export 10 sec clip
audiowrite('Serato Control Signal clip.wav', xt, Fs);

% 1 kHz sine wave for reference
sinref = zeros(max(size(t)), 2);
sinref(:,1) = sin(2*pi*1000.*t);
sinref(:,2) = sin(2*pi*1000.*t);
audiowrite('1 kHz sine reference.wav', sinref, Fs)

%% Spectrograms

% reference sine
spectrogram(sinref(:,1),1024, 512,[],Fs,'yaxis','MinThreshold',-100)
title('1 kHz sine, 1024 sample window'); axis([0 10 0 2]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Sine_spectrogram1', '-dpng', '-r300'); 

spectrogram(sinref(:,1),2^12, 2^11,[],Fs,'yaxis','MinThreshold',-100)
title('1 kHz sine, 4096 sample window'); axis([0 10 0 2]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Sine_spectrogram2', '-dpng', '-r300'); 

% serato
spectrogram(xleft(1:441001),1024, 512,[],Fs,'yaxis', 'MinThreshold',-100)
title('Serato timecode, 1024 sample window'); axis([0 10 0 2]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_spectrogram1', '-dpng', '-r300'); 

spectrogram(xleft(1:441001),2^12, 2^11,[],Fs,'yaxis', 'MinThreshold',-100)
title('Serato timecode, 4096 sample window'); axis([0 2 0 2]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_spectrogram2', '-dpng', '-r300'); 

spectrogram(xleft,2^11, 2^10,[],Fs,'yaxis', 'MinThreshold',-100)
title('Serato timecode, 2048 sample window'); axis([0.01 0.2 0 2]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_spectrogram3', '-dpng', '-r300'); 

%% Phase shift

samples90deg = 44100/(4*1000); % samples for 1/4 wavelength for 1 kHz sine

% samples90deg = 11

% pad right channel 11 samples = 90 deg phase shift backwards
% -> waveforms should overlap
rightpad = padarray(xright,11,'pre'); % insert 11 zeros to beginning
padded = rightpad(1:441001);

plot(t, xleft, t, padded); grid on;
title({'Serato Scratch Live Control Signal';'padded right channel'});
xlabel('Time (ms)');
axis([0 0.02 -1.05 1.05]);
set(gca,'XTick', 0:0.001:0.02);
set(gca,'XTickLabel',{0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20});
legend('Left', 'Right', 'Location', 'Best');
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_padded', '-dpng', '-r300'); 

%% Amplitude shft keying demodulation

plot(t, xleft); grid on;
title({'Serato Control Signal';'Amplitude shift keying'});
xlabel('Time (s)');
axis([0 0.2 -0.05 1.05]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_ASK', '-dpng', '-r300'); 

% 1 kHz period in samples
period = 44100/1000; % = 44.1

plot(xleft); grid on; 
title({'Serato Control Signal';'amplitude shift keying'});
axis([439 1101 -0.05 1.05]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_ASK2', '-dpng', '-r300'); 

% simple demodulation assuming every period encodes one bit
bit_array = []; a = 1;
for i = 2:max(size(xleft))
 % take one period of signal and convert to bit value
 if xleft(i-1) < 0 && xleft(i) >= 0
 period = xleft(a:i); a = i; % one period
 if any(period > 0.8) % has a value larger than 0.8
 bit_array = [bit_array 1]; % append 1
 bit_array = [bit_array 0]; % append 0

% Convert bit array to square wave representation
square = ones(441,1)*bit_array; % replicate each value 441 times
square = square(:); % array to column

s = 1:132000; % sample vector
plot(s, xleft(1:132000)); hold on
plot(s/10, square(1:132000), 'r', 'LineWidth', 0.7); grid on; 
title({'Serato Control Signal';'ASK demodulation'});
xlabel('Bits'); axis([1 2205 -0.05 1.05]);
set(gca,'XTick', 0:44.1:2205); set(gca,'XTickLabel',[]);
set(gca,'YTick', [0 1]); set(gca,'YTickLabel',[0 1]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_bitmask', '-dpng', '-r300'); 

%% Bit array

x = 0:1:64;
stem(x, bit_array(1:max(size(x))), 'filled'); grid on; 
title({'Serato Control Signal';'bit values'});
xlabel('bit'); axis([0 64 -0.1 1.1]);
set(gca,'XTick', 0:8:64);
set(gca,'YTick', [0 1]); set(gca,'YTickLabel',[0 1]);
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_bitvalues', '-dpng', '-r300'); 

%% Direction

t = 0:1/60000:0.1;

right = sin(2*pi*1000.*t);
left = sin(2*pi*1000.*t - pi/2);

plot(t(16:121), left(16:121), 'b'); hold on;
plot(t, right, 'r'); grid on;
title({'Serato Control Signal';'direction forward'});
xlabel('Time (ms)');
axis([0 0.002 -1.05 1.05]);
set(gca,'XTickLabel',{0 0.25 0.5 0.75 1 1.25 1.5 1.75 2});
legend('Left', 'Right', 'Location', 'Best');
set(gcf, 'PaperUnits','centimeters', 'PaperPosition',[0 0 16 9])
print(gcf, 'Serato_direction', '-dpng', '-r300'); 


Leave a Reply

2 thoughts on “Analysing the Serato DJ timecode signal

  • Reticuli

    Would having the two output signals from the deck/player 180 degrees out of absolute phase (i.e. inverted polarity) cause Serato to play in reverse? Does its calibration mode and settings have the ability to compensate for this?

    • Akseli Lukkarila Post author

      I think inverting the polarity of the stereo signal wouldn’t affect the direction detection, but of course that depends on how it is implemented in the software. It is definitely possible to code it so that changing the polarity would not matter, as the phase shift between the left and right channel stays the same. It’s worth noting also that 180 degrees of phase shift in either direction is not strictly speaking the same as inverted polarity, phase shift just delays or advances the signal, whereas inverting the polarity means mirroring the signal (negative values become positive and vice versa). Sure, for a pure sine tone the end result is essentially the same but for more complex signals it is not.

      If you mean delaying just the other channel, then yes it will invert the direction. Crossing the inputs (RCA connectors) does this also, meaning connecting the left channel to the right channel input and vice versa. There are no settings to combat this in the software, you just have to connect everything correctly. If for some reason the deck is running in the wrong direction, then it can be fixed by swapping the left and right signal connectors as mentioned above.