�ݺ�ߣ

Dec. 2011 @ MTG, UPF

Dimensional
Music Emotion
Recognition
Yi-Hsuan Yang
Assistant Research Fellow
Music & Audio Computing (MAC) Lab
Research Center for IT Innovation
Academia Sinica

1

Music & Emotion
Music conveys emotion and modulates our mood
Music emotion recognition (MER)
Understand how human perceives/feels emotion when
listening to music
Develop systems for emotion-based music retrieval

2

Why Do We Listen to Music?
Motive Ratio
“to express, release, and influence emotions” 47%
“to relax and settle down” 33%
“for enjoyment, fun, and pleasure” 22%
“as company and background sound” 16%
“because it makes me feel good” 13%
“because it’s a basic need, I can’t live without it” 12%
“because I like/love music” 11%
“to get energized” 9%
“to evoke memories” 4%

“Expression, Perception, and Induction of Musical Emotions: A Review and a
Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka,
Journal of New Music Research, 2004 3

Categories of Emotion
Expressed (intended) emotion
What a performer tries to express
Perceived emotion
What a listener perceives as being expressed in music
Usually the same as the expressed emotion
Felt (induced) emotion
What a listener actually feels
Strongly influenced by the context of music listening
(environment, mood)

4

Emotion Description w/ Mood Labels

5
Courtesy of Ching-Wei Chen @ Gracenote

Description w/ Latent Dimensions

6

Categorical Approach
Audio spectrum

Hevner’ model (1936)

7

Dimensional Approach
Audio spectrum

Emotion plane (Russell
1980, Thayer 1989)

8

Categorical vs. Dimensional
Pros Cons

Categorical • Intuitive • Lack a unifying model
• Natural language • Ambiguous
• Atomic description • Subjective
• Difficult to offer fine-grained
differentiation

Dimensional • Focus on a few • Less intuitive
dimensions • Semantic loss in projection
• Good user interface • Difficult to obtain ground
truth

9

Q: No Consensus on Mood Taxonomy
Work # Emotion description
Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, serious
Feng et al [sigir03] 4 Happy, angry, fear, sad
Li et al [ismir03], Happy, light, graceful, dreamy, longing, dark, sacred,
Wieczorkowska 13 dramatic, agitated, frustrated, mysterious, passionate,
et al [imtci04] bluesy
Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy
Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm
Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, content
Yang et al [mm06] 4 Happy, angry, sad, relaxed
Skowronek et al Arousing, angry, calming, carefree, cheerful, emo-
12
[ismir07] tional, loving, peaceful, powerful, sad, restless, tender
Happy, light, easy, touching, sad, sublime,
Wu et al [mmm08] 8
grand, exciting
Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressive
Trohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry 10

Fuzzy Boundary b/w Mood Classes
Subjective usage of affective terms
Cheerful, happy, joyous, party/celebratory
Melancholy, gloomy, sad, sorrowful
Semantic overlap (#2 and #4) and acoustic overlap
(#1 and #5) [mirex07.cyril&perfe]

MIREX AMC Taxonomy
Cluster 1 Passionate, rowdy, rousing, confident, boisterous
Cluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerful
Cluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignant
Cluster 4 Witty, humorous, whimsical, wry, campy, quirky, silly
Cluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense
11

Granularity of Emotion Description
Small set of emotion classes
Insufficient comparing to the richness of our perception
Large set of emotion classes
Difficult to obtain reliable ground truth data

□ Happy
□ Sad Acerbic, Aggressive, Ambitious,
□ Angry Amiable, Angry, Bittersweet, Bright,
□ Relaxed Brittle, Calm/, Carefree, Cathartic,
Cerebral, Cheerful, Circular, Clinical,
Cold, Confident, Delicate, Dramatic,
Dreamy, Druggy, Earnest, Eccentric,
Elegant, Energetic, Enigmatic, Epic,
Exciting, Exuberant, Fierce, Fiery, Fun,
Gentle, Gloomy, Greasy, Happy, …

12

Sol: Describing Emotions in Emotion Space
￮ Activation, activity
Arousal ￮ Energy and stimulation level

Valence
￮ Pleasantness
￮ Positive and
negative affective
states

[psp80] 13

The Dimensional Approach

Strength
No need to consider which and how many emotions
Generalize MER from categorical domain
to real-valued domain
Easy to compare different
computational models
Arousal
Valence

14

The Dimensional Approach

Weakness
Semantic loss due to projection
Blurs important psychological distinctions
3rd dimension: potency [psy07]
Angry ↔ afraid
Proud ↔ shameful
Interested ↔ disappointed
4th dimension: unpredictability
Surprised
Tense ↔ afraid
Contempt ↔ disgust

15

Music Retrieval in VA Space

arousal
Provide a simple
means for 2D user
interface
Pick a point
valence Draw a trajectory

Useful for mobile
devices with small
display space
Demo
16

Q: How to Predict Emotion Values?
Transformation-based approach [mm06]
Consider the four quadrants
Perform 4-class mood classification
Apply the following transformation
Arousal = u1 + u2 – u3 – u4
Valence = u1 + u4 – u2 – u3
(u denotes likelihood)

Not rigorous

17

Sol: Perform Regression

Given features, y
predict a numerical value

Given N inputs (xi, yi), 1≤ i ≤N,
where xi is feature and yi is the
numerical value to be predicted,
train a regression model R(.)
such that the following mean
squared error (MSE) is minimized

1 N
min ∑ (yi − f (xi ))2
x
f N
i =1

18

Computational Framework [taslp08]
Predict the VA values 1 N
Trains a regression model min ∑ (yi − f (xi ))2
f N
f (·) that minimizes the mean i =1
squared error (MSE) yi : numerical emotion value
One for valence; xi : feature (input)
one for arousal f(xi) : prediction result (output)
e.g. linear regression
f(xi) = wTxi +b
Emotion
Manual value = sumj {wjxij} +b
annotation
Training Regressor
data training
Feature
extraction Feature
Regressor

Test Feature Feature Automatic Emotion
data extraction Prediction value
19

Obtain Music Emotion Rating
Manual annotation
Rates the VA values of each song
Ordinal rating scale
Scroll bar

Emotion
Manual value
Training annotation
Regressor
data training
Feature
extraction Feature
Regressor
Test Feature Feature Automatic Emotion
data extraction Prediction value
20

Evaluation of Emotion Rating
User study
1240 Chinese pop songs; each 30-sec
666 subjects; each rates 8 random songs

Subjective evaluation 0 100
Easiness of annotating emotion
Within-subject reliability: compare to one month later
Between-subject reliability: compare to other subjects

Within-subject Between-subject
Method Easiness
reliability reliability
Emotion rating 2.82 2.92 2.81
From 1 to 5 (strongly disagree to strongly agree)
21

AnnoEmo: GUI for Emotion Rating [hcm07]
Encourages differentiation Demo

Drag & drop Click to
to modify listen again
annotation

22

Cognitive Load is Still High
Determining VA values is not that easy
Difficult to ensure consistently
Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of
our emotion perception?
Does 0.7 the same for two subjects? 1

0.5 0.8
-1 1

0.1
-0.2

-1
23

Sol: Ranking Instead of Rating [taslp11a]
Determines the position of a song
By the relative ranking with respect to other songs
Rather than by the exact emotion values
Oh Happy Day valence
positive
valence I Want to Hold Your Hand by Beatles =1
I Feel Good by James Brown
What a Wonderful World by Louis Armstrong
relative Into the Woods by My Morning Jacket exact
ranking The Christmas Song rating
C'est La Vie
Labita by Lisa One
Just the Way You Are by Billy Joel
negative Perfect Day by Lou Reed valence
valence When a Man Loves a Woman by Michael Bolton = –1
Smells Like Teen Spirit by Nirvana

24

Ranking-Based Emotion Annotation
Emotion tournament
Requires only n–1 pairwise comparisons
The global ordering can later be approximated by a
greedy algorithm [jair99]
a b c d e f g h
a 0
b 3
c 1
d 0
e 0
f 7
g 0
a b c d e f g h h 1

Which songs is more positive? f>b>c=h>a=d=e=g
25

Online Interface

26

Simplify Emotion Annotation
Subjective evaluation
Both rate and rank
The ordering of rate and rank does not matter
Result
Strong

Weak

27

Q: Which Features are Relevant? [psy07]

Sound intensity Tempo Rhythm

major

Pitch range
Mode Consonance

28

Feature Extraction
Melody/harmony [MIR toolbox]
Pitch estimate, key clarity, harmonic change, musical mode
Spectral [Marsyas]
Spectral flatness measures, spectral crest factors, MFCCs
Temporal [Sound description toolbox]
Zero-crossing rate, temporal centroid, log-attack time
Rhythmic [Rhythm pattern extractor]
Beat histogram and average tempo
Psyco-acoustic motivated features [PsySound]
Loudness, sharpness, timbral width, volume,
spectral dissonance, tonal dissonance, pure tonal,
complex tonal, multiplicity, tonality, chord
29

Data Collection

0

30

Q: Subjective Issue

Each circle
represents the
emotion annotation
for a music piece
by a subject
31

Sol: Probabilistic MER [taslp11b]
Predicts the probabilistic distribution P(e|d) of the
perceived emotions of a music piece

32

Sol: Personalized MER [sigir09]
From P(e|d) to P(e|d,u)
General regressor personal regressor
Utilize user feedback

Manual Emotion value
annotation Regressor
Training
data Feature training
extraction Feature Regressor
Test Feature Feature Automatic Personalization
data extraction Prediction
Emotion value
Emotion-based User
retrieval feedback

33

Evaluation Setup
Training data
195 Western/Japanese/Chinese pop songs
25-sec segment that is representative of the song
Too long the emotion may not be homogeneous
Too short the listener may not hear enough
Manual annotation
253 subjects; each rates 12 songs
Rate the VA values in 11 ordinal levels
￮ 0 ￮ 1 ￮ 2 ￮ 3 ￮ 4 ￮ 5 ￮ 6 ￮ 7 ￮ 8 ￮ 9 ￮ 10

Each song is annotated by 10+ subjects
Ground truth obtained by averaging
34

Quantitative Result
Method R2 of valence R2 of arousal
Multiple linear regression 0.109 0.568
Adaboost.RT [ijcnn04] 0.117 0.553
SVR (support vector regression) [sc04] 0.222 0.570
SVR + RReliefF (feature selection) [ml03] 0.254 0.609

Result
R2: squared correlation between y and f(x)
Valence prediction is challenging
Valence: 0.25 ~ 0.35
Arousal: 0.60 ~ 0.85

35

Qualitative Result
No No No Part 2 - Beyonce
Out Ta Get Me - Guns N' Roses

You're Crazy - Guns N' All Of Me - 50 Cent
Roses

Bodies - Sex Pistols New York Giants -
Big Pun

I've Got To See You
Mammas Don't Let
Again - Norah Jones
Your Babies Grow
Up To Be Cowboys -
If Only In The Heaven's Willie Nelson
Eyes - NSYNC

Live For The One I Love -
The Last Resort - The Eagles
Celine Dion
Why Do I Have To Choose - Willie Nelson 36

Missing 1: Temporal Context of Music
“Sweet anticipation” by David Huron
Music’s most expressive qualities probably relate to
structural changes across time
Music emotion can
also vary within an
excerpt [tsmc06]

37

Missing 2: Context of Music Listening

Listening mood/context
Familiarity/associated memory
Preference of the singer/performer/song
Social relationship
38

Conclusion
A computational framework for predicting numerical
emotion values
Generalizes MER from categorical to dimensional
Resolves some issues of emotion description
Rank instead of rate
2D user interface for music retrieval
Valence & subjectivity
Content & context

Acknowledgement
Prof. Homer Chen, National Taiwan University
39

Reference
Music Emotion Recognition, CRC Press, 2011
“A regression approach to music emotion recognition,”
IEEE TASLP, 2008. (cited by 76)
“Ranking-based emotion recognition for music
organization and retrieval,” IEEE TASLP, 2011
“Prediction of the distribution of perceived
music emotions using discrete samples,”
IEEE TASLP, 2011
“Exploiting online tags for music emotion
classification,” ACM TOMCCAP, 2011
“Machine recognition of music emotion:
A review,” ACM TIST, 2012
40
CRC Press

�ݺ�ߣ

Dimensional Music Emotion Recognition

More Related Content

Dimensional Music Emotion Recognition