Detect beats, extract amplitude data from an audio file, using Nodejs


Fork on Github

The Problem: Code my TJBot robot such that it listens to and enjoys good music. For a robot to actually enjoy music, it needs to .. well .. become aware of the beats (usually peaks) in said song! (and then react to them). Update … here is my robot waving/dancing to song.

The general theory is simple – convert the sound file into an array of continuous signals, identify signals that occur above a set peak threshold and voila … you now know where the beats are!

In my search, I came across a really helpful blog post about beat detection (basically identify the peaks above a given threshhold, computing their frequency of occurrence and estimating the overall beats/timing of the song.). For consistency with other applications, I needed my app to work with Node js, so the primary task was finding an appropriate library to assist with the task of decoding an audio file. The post above relies on the html5 web audio standard which is not exactly supported with nodejs. However, some good samaritans have started the amazing work of creating a nodejs version (web-audio-api), and that’s what I finally used!

Converting sound (Mp3, Wav, etc) to a buffer array of pcm data

To store sound in digital format, analog sound data is represented using pulse code modulation (PCM). Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications.

In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

An important thing to note here is that to interprete the pcm data we extract from our audio, we need to know the sample rate (i.e the number of datapoints sampled per second). As noted above, the web-audio-api is used, using the decodeAudioData method.

var AudioContext = require('web-audio-api').AudioContext
context = new AudioContext
var pcmdata = [] ;

var soundfile = "sounds/sound.wav"

function decodeSoundFile(soundfile){
  console.log("decoding mp3 file ", soundfile, " ..... ")
  fs.readFile(soundfile, function(err, buf) {
    if (err) throw err
    context.decodeAudioData(buf, function(audioBuffer) {
      console.log(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate, audioBuffer.duration);
      pcmdata = (audioBuffer.getChannelData(0)) ;
      samplerate = audioBuffer.sampleRate; // store sample rate
      maxvals = [] ; max = 0 ;
      findPeaks(pcmdata, samplerate)
    }, function(err) { throw err })

At the end of this, variable pcmdata contains an array of values that represent your song/sound. The decodeAudioData method also returns data on the song sample rate, length and duration.

(Naive) Peak Extraction

For my use case, the intent is to recognize high energy or peaks within a song. These can be representative of a recurring beat or instruments between beats (drums, snares, etc). To do this, I loop through the pcmdata using steps and time sequence that correspond to the sample rate. For example, if the sample rate is 44100, then I analyze the pcm data returned in steps of 44100 each second. For better granularity (ensure I catch changes in beats), I analyze at a smaller interval – 0.05 of a second. During this period, I compute the maximum signal and subsequently compare it with the max signal from the previous 0.05/s interval. If the difference is more than a than a threshold (here 0.3), then I consider this to be a local peak. Pretty naive but practical approach. Further work can be done on the proper way to ascertain the threshold above.

function findPeaks(pcmdata, samplerate){
  var interval = 0.05 * 1000 ; index = 0 ;
  var step = Math.round( samplerate * (interval/1000) );
  var max = 0 ;
  var prevmax = 0 ;
  var prevdiffthreshold = 0.3 ;

  //loop through song in time with sample rate
  var samplesound = setInterval(function() {
    if (index >= pcmdata.length) {
      console.log("finished sampling sound")

    for(var i = index; i < index + step ; i++){ max = pcmdata[i] > max ? pcmdata[i].toFixed(1)  : max ;

    // Spot a significant increase? Potential peak
    bars = getbars(max) ;
    if(max-prevmax >= prevdiffthreshold){
      bars = bars + " == peak == "

    // Print out mini equalizer on commandline
    console.log(bars, max )
    prevmax = max ; max = 0 ; index += step ;
  }, interval,pcmdata);

To visualize the signals obtained within each sampling window, I print out bars to represent the max signal within the window.

function getbars(val){
  bars = ""
  for (var i = 0 ; i < val*50 + 2 ; i++){
    bars= bars + "|";
  return bars ;

To ascertain correctness, I play the sound file being analyzed.

function playsound(soundfile){
// linux or raspi
// var create_audio = exec('aplay'+soundfile, {maxBuffer: 1024 * 500}, function (error, stdout, stderr) {
var create_audio = exec('ffplay -autoexit '+soundfile, {maxBuffer: 1024 * 500}, function (error, stdout, stderr) {
if (error !== null) {
console.log('exec error: ' + error);
}else {
//console.log(" finshed ");

Next Steps

Beat extraction. The general approach is to identify peaks most repetitive peaks to be indicative of a beat, and then compute inter-peak interval. Two 60 peaks within a 60 second interval might correspond to 60 beats per minute. Some good references that have implemented beat detection using web-audio-api html5 (not nodejs)  can be found   here (by jose perez) , and here (by joe sullivan). 

Fork on Github



About Vykthur

Mobile and Web App Developer and Researcher. Passionate about learning, teaching, and recently - writing.
This entry was posted in Programming, Research, Tutorials, Uncategorized and tagged , , , . Bookmark the permalink.