Algorithm in this article is used for solving a problem of detection of musical scales and chords from an audio signal. It is based on algorithms for pattern recognition and Hidden Markov Model classifier. This algorithm explores relations between tones, scale, chords and progressions to support the detection process and provide more accurate results.
The problem studied here is related to the subject of harmonic analysis and structure of a given musical composition. Today many original songs are performed and reworked from different performers but audience can easily recognize the original song or tonal expression. Song can be performed in different tempo and with different musical instruments and still be recognized by the human audience.
Recognition of musical scale or chord structure of a song is not an easy task even for experienced musicians especially without a musical instrument in hand. Frequently this ability it is considered as a gift. Chord recognition problem could be separated in two parts, first part is identification of measurable data (vectors) from audio signal, and second part is processing and recognition of this data.
Discrete Fourier Transformation (DFT) for given audio signal represented in time and space domain will convert it into frequency domain as a series of frequencies each with specific intensity. Fast Fourier Transformation (FFT) is one implementation of the DFT and as name implies it is optimized for fast processing. In this case FFT is used for processing the audio signal from the input and set of frequencies is calculated as a result. From this set we need to separate only frequencies that are in our interest. Human can hear sounds in the range of 15Hz – 20KHz, so having this limitation in mind all other that are out of the hearable range will be filtered.
In music theory each tone is identified with specific frequency, there are total of 12 tones that could be repeated in different octaves.
Musical instruments because of their imperfection cannot always produce sounds in those exact frequencies, also environmental conditions have impact on the sound shape and can modify the sound. Considering those limitations this algorithm is using categorizing techniques to separate frequencies into one of the 12 tones. For further simplification tones will be grouped into single octave, for example tones C1 and C3 will be treated as single tone C.
For each excerpt from the audio signal this algorithm will calculate number of repetitions of each of 12 tones. With that information a vector of 12 elements (chroma vector) can be created. Each element of the vector will contain number of repetitions of tone in given excerpt.
High percentage of pop music today is composed on the predefined set of rules. Those rules are subject of harmonic analysis in music theory. Tones can be grouped into chords or scales. For simplification here we take that chord is composed of no more than three different tones. Additionally, chords are grouped in two groups: minor and major. There are another group of chords but for simplicity we will analyze those two group of chords only. There are 12 chords in each group, 24 in total.
Repetition of chords in a song is a predictable process and it follows some rules. From set of chords a musical scale can be identified, and scale can further support prediction of next chords in a song. Interactive circle of fifths is one method of detection of chord progression. Numbers on the inner circle denote chord order in the progression, uppercase number is Major chord, lowercase is Minor. So in this example first chord could be C, next D minor, E minor and so on. This knowledge can further support detection process.
Hidden Markov Model
Model of discrete Markov processes here is used for detection of musical scale from series of tones. System is implemented as a classifier with Hidden Markov Model. Model is working with 24 symbols (a set of all detectable chords) and 12 states – number of detectable scales.
Process of chord detection starts with taking samples from the audio signal. Sampling frequency is 44.1KHz that produces 44100 samples in one second. For every sample DFT is performed that results with spectrogram – distribution of frequencies from sample. Next step is calculation of chroma vector, in this step tones are collected from all octaves, for example one repletion of C1 and two repetitions of C2 will result with three repetitions of C. After this values in chroma vector are normalized in range 0 – 100.
After identification of chroma vector, first part of the analysis is finished. Next part is chord detection from given chroma vector
Tone can be associated with numerical value, for example A=1, B=2, … , G#=12 then binary representation of chord is
Taking into consideration chroma vector that we have calculated in the previous step and the binary values from the above table, we can conclude that problem of chord detection could be reduced to detection of sequence that correspond to one of the sequences. This pattern matching problem has its own disadvantages in practice. Some musical experts claim that total number of chords that can be produced on a piano is 8400, and this proposed system can detect 24 chords in total. The rest of 8376 chords needs to be classified into one of the existing 24 detectable chords. One solution of this problem that is used here is minimal sequence distance approach, or sequence with highest probability.
From experimental results with different dataset we determined that sometime minimal sequence distance can be same for multiple scales, in that case we use musical theory knowledge to more accurately define chord progression.
Detection of musical scale in a song is very important because it highly defines the order of previous and next chords in the progression. If detected scale is C, then probability that chords F or F# is played is minimal, in this case if the system hesitates between C and F# it will select chord C based in his probability distribution in scale C.
For scale detection problem we use HMM based classifier with 12 classes and 24 symbols. If we assign numerical value on each chord, for example Am=0, Bm=1, … A=12, B=13, … , G#=23 then each scale can be represented as sequence of numbers, for example scale C will have: Am=0, Dm=5, Em=7, C=15, F=20, and G=22, and the final sequence for this scale would be: 0, 5, 7, 15, 20, and 22. Those sequences are used for training of the classifier that latter will be used in the detection process.
At the beginning of song classifier is using only chords that is 100% positive as an input to the scale detector. And then after scale is detected that knowledge can be used to support the chord detection process.
Chord Analyzer is a sample application built with this algorithm for chord detection. This application can detect chords from an audio signal coming from microphone, web camera, external audio input or internal audio device. User first select audio source from list.
Detections are displayed in three sections: Chord, Scale and Chroma vector. Detected chord displays currently detected chord. If system cannot accurately decide on the detected chord, then all possibilities are displayed here. In another section detected scale is displayed with all detected chords from the scale. In another section currently calculated chroma vector is shown.
Results, limitations and improvements
From the experimental testing this approach gives satisfactory results under specific conditions. For played chord from single instrument detection rate is 90%. Detection rate drops significantly if the song is played by multiple instruments. Also lower detection rate is identified with fast songs compared to slower ones. Also we should consider that this system is using only knowledge from two types of musical scales (minor and major), the music theory has identified multiple scales and chord orders. Also often song can use more than one musical scales.
The presented approach for chord detection in this article is using knowledge of musical scales and chord progressions to support the process of detection of next chords. This algorithm can be further developed with parallel algorithms and knowledge from multiple scales. Music as any other art is not following strict mathematical rules so detection of next chord in the progression will continue to be part of the probability theory.