Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
1 of 42
Download to read offline
More Related Content
Interval Hashing Based Ranking
1. London Information Retrieval Meetup
21 OCT 2019
Music Information Retrieval Take 2
Interval Hashing Based Ranking
Andrea Gazzarini, Software Engineer
21th October 2019
2. London Information Retrieval Meetup
? Software Engineer (1999-)
? ¡°Hermit¡± Software Engineer (2010-)
? Java & Information Retrieval Passionate
? Apache Qpid (past) Committer
? Husband & Father
? Bass Player
Andrea Gazzarini, ¡°Gazza¡±
3. London Information Retrieval Meetup
Sease
Search Services
¡ñ Open Source Enthusiasts
¡ñ Apache Lucene/Solr experts
! Community Contributors
¡ñ Active Researchers
¡ñ Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
4. London Information Retrieval Meetup
?Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
? Chroma Feature
? Music Language Essentials
? Ranked Interval Hashing
? Next steps
? Q&A
Agenda
5. London Information Retrieval Meetup
MIR is concerned with the extraction, analysis and usage of information about any kind of music
entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI
representation of a piece of music, or name of a music artist).¡±
Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web.
Dissertation, Johannes Kepler University, Wien (2003)
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from
music. MIR is a small but growing ?eld of research with many real-world applications. Those involved in
MIR may have a background in in musicology, psychoacoustics, psychology, academic music study,
signal processing, informatics, machine learning, optical music recognition, computational intelligence or
some combination of these.
https://en.wikipedia.org/wiki/Music_information_retrieval
Music Information Retrieval (MIR)
6. London Information Retrieval Meetup
AUDIO IDENTIFICATION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
SYMBOLIC SIMILARITY
MOOD
SOURCE SEPARATION
INSTRUMENT RECOGNITION
TEMPO ESTIMATION
SCORE ALIGNMENT
BEAT TRACKING
QUERY BY HUMMINGAUDIO IDENTIFICATION
INSTRUMENT RECOGNITION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
TEMPO ESTIMATION
SONG STRUCTURE SCORE ALIGNMENT
COVER SONG DETECTION
SYMBOLIC SIMILARITY
KEY DETECTION
BEAT TRACKING
MOOD
SOURCE SEPARATION
Music Information Retrieval (MIR)
7. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
?Query By Humming / Cover Song Detection
? Chroma Feature
? Music Language Essentials
? Ranked Interval Hashing
? Next steps
? Q&A
Agenda
8. London Information Retrieval Meetup
Query by humming (QbH) is a music retrieval
system that branches off the original classification
systems of title, artist, composer, and genre.
It normally applies to songs or other music with a
distinct single theme or melody. The system involves
taking a user-hummed melody (input query) and
comparing it to an existing database.
(Wikipedia)
What is it?
Query by Humming/Playing
The main challenge of such query paradigm is the
difference, in terms of audio signal, between the
input query and the stored dataset.
9. London Information Retrieval Meetup
A cover song is an alternative version/recording of
a previously recorded song.
A cover can differ from the original song in several
features: key, timbre, tempo, structure, arrangement,
language, even melody and lyrics.
The goal of a cover song detection search algorithm
is to identify, for a query audio track, recordings that
belong to the same conceptual work/composition.
The main challenges are:
? find a set of features that capture the common
aspects of the target compositions
? give a rank to each response items (generally not
so important)
What is it?
Cover Song Detection
11. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
?Chroma Feature
? Music Language Essentials
? Ranked Interval Hashing
? Next steps
? Q&A
Agenda
12. London Information Retrieval Meetup
Chroma features are a powerful representation for
music audio in which the entire spectrum is
projected onto 12 bins representing the 12 distinct
semitones (or chroma) of the musical octave.
The base assumption is that in music, notes exactly
one octave apart are perceived as ¡°similar¡±. As
consequence of that, working on the distribution of
chroma without the absolute frequency can give
useful musical information about the audio
It¡¯s a kind of analysis which bridges between low-
level and middle-level features, moving the audio
signal representation towards something which is
more readable, from a functional perspective.
Chroma Feature
Chroma Features (1/5)
Source: Wikipedia
13. London Information Retrieval Meetup
Chroma Features (2/5)
C
D
D# D
DD
D
D
DC
C
C E
F
F#
A
A#
B
F#
F#
BB
EF
D
C
D
F
14. London Information Retrieval Meetup
A Chroma Matrix consists of a 12 * tn matrix where
each vector represents the twelve pitches
strengths for a given instant t.
Sometimes the matrix is inverted so the same
information described above is available at row-level.
Chroma features can be computed using different
tools available on the internet including:
? LabROSA audio analysis tools (partially portable
to Octave)
? opensource tools written in several languages
Chroma Matrix
Chroma Features (3/5)
Pitches strengths at instant t1 where you can see the top (i.e. most intense) 3 classes
are A#, B and A
15. London Information Retrieval Meetup
Time
C
D
E
F
G
A
B
C#
D#
F#
G#
A#
A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B
N
O
I
S
E
Chroma Features (4/5)
17. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
? Chroma Feature
?Music Language Essentials
? Ranked Interval Hashing
? Next steps
? Q&A
Agenda
18. London Information Retrieval Meetup
A note is used for denoting a sound, its pitch and duration
A sound is the audio signal produced by a vibrating body
Notes are associated to graphical symbols (for indicating the pitch and the duration)
Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As
consequence of that, we say they belong to the same pitch class
A note is also used for denoting a pitch class. The Western Music Theory individuates 12 pitch classes
Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI)
C D E
F G A B
C
B A
G F E D
C
C#
D# F# G#
A#
Bb Ab Gb Eb Db
Music Language Essentials
The distance between two subsequent notes is called a semitone
19. London Information Retrieval Meetup
Text Music
Letter Note
Word
Phrase
Sentence
Chord
Ghost Note
Phrase
Text vs Music
20. London Information Retrieval Meetup
Time Signature
Key Signature
Clef
Tempo
Note
Reference Chord
Chord
Score music representation
21. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
? Chroma Feature
? Music Language Essentials
?Ranked Interval Hashing
? Next steps
? Q&A
Agenda
22. London Information Retrieval Meetup
We¡¯ve already seen Chroma Features analysis
normalises the entire spectrogram in classes,
loosing the absolute frequency.
That introduces a normalisation which actually puts
on the same layer notes belonging to the same
class.
This strictly relates to a concept known in music
theory as ¡°Inversion¡±
¡°Inversion, in music, is the rearrangement of the top-
to-bottom elements in an interval, a chord, a melody,
or a group of contrapuntal lines of music.¡±
(Source: Encyclopedia Britannica)
Inversions
Assumption #1: Inversions as ¡°Synonyms¡±
=
= =
AND =
IF
AND
THEN
¡« ¡«
=¡« =¡« =¡«
23. London Information Retrieval Meetup
In Music Theory, an interval is defined as the
distance between two notes.
In Western Music Theory the minimum distance
between two notes is called semitone. As
consequence of that, an interval measures the
number of semitones that occurs between two
notes.
The intervals concept allows to translate any
sequence of notes.
If we have a melody starting from a given note (e.g.
C) we can translate the whole sequence using a
different starting point (e.g. A): the most important
thing is that intervals must remain the same.
Intervals
Assumption #2: Intervals instead of Notes!
24. London Information Retrieval Meetup
A simple matrix which defines all the possible
intervals between two notes.
Each interval has a precise name in Western Music
Theory (e.g. 0 = unison), but for our purposes seeing
the number (of semitones) is more important
The interval between two arbitrary notes can be
easily computed by using the following formula:
(defn semitones [x y]
(if (>= y x)
(- y x)
(+ (- 11 (- x 1)) y)))
Intervals Table
Intervals Table
C C# D D# E F F# G G# A A# B
C 0 1 2 3 4 5 6 7 8 9 10 11
C# 11 0 1 2 3 4 5 6 7 8 9 10
D 10 11 0 1 2 3 4 5 6 7 8 9
D# 9 10 11 0 1 2 3 4 5 6 7 8
E 8 9 10 11 0 1 2 3 4 5 6 7
F 7 8 9 10 11 0 1 2 3 4 5 6
F# 6 7 8 9 10 11 0 1 2 3 4 5
G 5 6 7 8 9 10 11 0 1 2 3 4
G
#
4 5 6 7 8 9 10 11 0 1 2 3
A 3 4 5 6 7 8 9 10 11 0 1 2
A# 2 3 4 5 6 7 8 9 10 11 0 1
B 1 2 3 4 5 6 7 8 9 10 11 0
25. London Information Retrieval Meetup
For each Chroma Vector we keep the top k
most intense pitch energies.
Each class is denoted by its ordinal position
(the examples on the left provides, for
simplicity, the corresponding mnemonic code
as well).
The underlying assumption is that in order
to represent and summarise a chroma
vector, the stronger a class energy is, the
better.
? INPUT => n Chroma vectors
? OUTPUT => n Rank vectors
Top K classes
Step #1: Ranks
0.28956729767718553 0.29918483818230696 0.29750046253718804 0.26970075757159021
0.26570971680808941 0.28762092331555628 0.27230350897098865 0.26479060237734442
0.24563184404410343 0.35945972816613164 0.20234795661509283 0.19691813882136551
0.21751928582300245 0.31682129361484279 0.21327678582822374 0.19345434437896755
0.19067933460430175 0.19899201366683592 0.20626346569405868 0.11096279563650845
0.18864723689615112 0.15732581183385463 0.16582685461399227 0.13454776152251802
0.17659990800974107 0.12781101466846442 0.12998904712393031 0.17361679104251077
0.15805875199726557 0.11070659008562342 0.12210923914829071 0.14658222225322309
0.14652990682934677 0.10462518086147332 0.11147004101461088 0.11962189455960689
0.13012578265891636 0.08802297436725277 0.08795354261988087 0.07726694847307212
0.10615064267867232 0.06285819867921958 0.06280150731135973 0.04137698779765421
0.07978125487239938 0.04066800213125138 0.03900988562456922 0.01849582605529331
0 - C 2 - D 0 - C 0 - C
1 - C# 3 - D# 1 - C# 1 - C#
2 - D 0 - C 3 - D# 2 - D
3 - D# 1 - C# 4 - E 3 - D#
4 - E 4 - E 2 - D 6 - F#
26. London Information Retrieval Meetup
For each pair of Rank vectors, for each
pair of pitch classes sharing the same
index, we compute the corresponding
interval.
The underlying assumption is that we want
to have a relative model which works
regardless the input key/tonality.
? INPUT => n Rank Vectors
? OUTPUT => n - 1 Interval Vectors
Distances
Step #2: Intervals
0 - C 2 - D 0 - C 0 - C
1 - C# 3 - D# 1 - C# 1 - C#
2 - D 0 - C 3 - D# 2 - D
3 - D# 1 - C# 4 - E 3 - D#
4 - E 4 - E 2 - D 6 - F#
(defn semitones [x y]
(if (>= y x) (- y x) (+ (- 11 (- x 1)) y)))
2 10 0
2 10 0
10 3 11
10 3 11
0 10 4
27. London Information Retrieval Meetup
Step #3: Hashing
Each Interval Vector is reduced to a
number (hash).
The hash uniquely represents a given
intervals distribution.
? INPUT => n-1 Interval Vectors
? OUTPUT => 1 Hash Vector
Summarisation
2 10 0 ¡
2 10 0 ¡
10 3 11 ¡
10 3 11 ¡
0 10 4 ¡
¡Æxi *12^i
18476 213106 103536 ¡
28. London Information Retrieval Meetup
Step #4: Partitions
The hash vector is partitioned using a size and an
overlap parameters (see examples on the left).
An audio input is therefore represented as a list of
partitions derived from the hash vector where
? each partition is a term
? ordering is preserved:
? in the term metadata (e.g. position increment)
? implicitly in the term value
Bags of Bags of Words
39483 4149 1440 3213 33232 44256 6654 4432 4149 7843 ¡
Partition of 3 hashes without overlap
[39483, 4149, 1440] [3213, 33232, 44256] [6654,4432, 4149], ¡
Partition of 3 hashes with overlap = 1
[39483, 4149, 1440] [1440, 3213, 33232] [33232, 44256, 6654], ¡
Partition of 5 hashes without overlap
[39483,4149,1440,3213,33232] [44256,6654,4432,4149,7843] ¡
(Examples)
29. London Information Retrieval Meetup
Step #5: Multiple choices for the output
{
"title": "Zombie",
"hashes": [¡°39483,4149,1440¡±, ¡°3213,33232,44256¡±, ¡°6654,4432, 4149¡±, ¡ ]
}
Only one partition type
Only one partition type with different overlaps
{
"title": "Zombie",
¡°hashes_overlap_0¡±: [¡°39483,4149,1440¡±, ¡°3213,33232,44256¡±, ¡°6654,4432, 4149¡±, ¡ ],
¡°hashes_overlap_1¡±: [¡°39483,4149,1440¡±, ¡°1440,3213,33232¡±, ¡°33232,44256,6654¡±, ¡ ],
¡
}
Multiple partition types
{
"title": "Zombie",
¡°hashes_3¡±: [¡°39483,4149,1440¡±, ¡°3213,33232,44256¡±, ¡°6654,4432, 4149¡±, ¡ ],
¡°hashes_5¡±: [¡°39483,4149,1440,3213,33232¡±, ¡°33232,44256,6654,4432,4149¡±, ¡ ],
¡
}
30. London Information Retrieval Meetup
Sample test query
Sample Data
Sample test data
Money, Pink Floyd
https://www.youtube.com/watch?v=-0kcet4aPpQ
{Q1 - Main Bass Riff (original tonality)
Q2 - Main Bass Riff (different tonality)
Q3 - Descendent pentatonic scale (different tonality)
Zombie, Cranberries
https://www.youtube.com/watch?v=6Ejga4kJUts
{
Q1 - Main Bass Riff (original tonality)
Q2 - Main Melody
Q3 - Chorus
Stand By Me, John Lennon
https://www.youtube.com/watch?v=YqB8Dm65X18
{Q1 - Main Bass Riff (different tonality)
Q2 - Main Melody
32. London Information Retrieval Meetup
Nice shot #1
Q2 - Money (Pink Floyd): main bass riff (different tonality)
Q1 - Money (Pink Floyd): main bass riff (original tonality)
Same melody (bass riff) in different tonalities yield similar results!
P@1 P@2 P@10 NDCG
33. London Information Retrieval Meetup
Nice shot #2
Q2 - Zombie (Cranberries): main melody (different tonality)
Q1 - Zombie (Cranberries): main bass riff (original tonality)
Position matters!
P@1 P@2 P@10 NDCG
34. London Information Retrieval Meetup
Nice shot #3
Q1 - Zombie (Cranberries): main bass riff (original tonality)
Good similarity even if the query pattern is not in result.
P@1
35. London Information Retrieval Meetup
To be improved
Q1 - Zombie (Cranberries): main bass riff (original tonality)
Same query as before, red items have a very different ¡°shape¡±
P@1
36. London Information Retrieval Meetup
To be improved
If we were looking for ¡°Money¡± these would look like great results¡
¡but unfortunately the input query here is the ¡°Stand by me¡± bass line
37. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
? Chroma Feature
? Music Language Essentials
? Ranked Interval Hashing
?Next Steps
? Q&A
Agenda
38. London Information Retrieval Meetup
Further areas of investigation
? Different types of Chroma Features
? ¡°Stop¡± chroma vectors / intervals / energies
? Input data understanding & pre-processing
? Complementary approaches
? Additional audio features
? Query time
? Machine Learning
39. London Information Retrieval Meetup
Chroma Matrix Clean Up
Original Threshold = 50% Threshold = 70% Threshold = 80% Top frequency
40. London Information Retrieval Meetup
? Music Information Retrieval (MIR)
? Query by Humming and Cover Song Detection
? Chroma Feature
? Music Language Essentials
? Ranked Interval Hashing
? Next Steps
?Q&A
Agenda
41. London Information Retrieval Meetup
Links
Chroma Feature Analysis and Synthesis
https://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
Chroma (Java)
https://github.com/adiyoss/Chroma
CHROMA TOOLBOX: MATLAB IMPLEMENTATIONS FOR
EXTRACTING VARIANTS OF CHROMA-BASED AUDIO FEATURES
https://www.academia.edu/30760846/
CHROMA_TOOLBOX_MATLAB_IMPLEMENTATIONS_FOR_EXTRACTING_VARIANTS_OF_CH
ROMA-BASED_AUDIO_FEATURES
FALCON: FAst Lucene-based Cover sOng identification
https://www.researchgate.net/publication/221571301_FALCON_FAst_Lucene-based_Cover_sOng_identification
Music Theory
https://en.wikipedia.org/wiki/Music_theory
Inversions
https://en.wikipedia.org/wiki/Inversion_(music)
Intervals
https://en.wikipedia.org/wiki/Interval_(music)
Chroma Feature
https://en.wikipedia.org/wiki/Chroma_feature
42. London Information Retrieval Meetup
21 OCT 2019
Thank you!
Music Information Retrieval Take 2
Interval Hashing Based Ranking
Andrea Gazzarini, Software Engineer
21th October 2019