Invited talk at Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain, 2011/12/15.
This talk describes a computational framework for automatically predicting the emotion we perceive in music listening. Based on this computational framework, novel emotion-based music organization, browsing, and retrieval methods can be created to provide users an an intuitive, easy-to-use, and effective way for accessing music information.
1 of 40
More Related Content
Dimensional Music Emotion Recognition
1. Dec. 2011 @ MTG, UPF
Dimensional
Music Emotion
Recognition
Yi-Hsuan Yang
Assistant Research Fellow
Music & Audio Computing (MAC) Lab
Research Center for IT Innovation
Academia Sinica
1
2. Music & Emotion
Music conveys emotion and modulates our mood
Music emotion recognition (MER)
Understand how human perceives/feels emotion when
listening to music
Develop systems for emotion-based music retrieval
2
3. Why Do We Listen to Music?
Motive Ratio
to express, release, and influence emotions 47%
to relax and settle down 33%
for enjoyment, fun, and pleasure 22%
as company and background sound 16%
because it makes me feel good 13%
because its a basic need, I cant live without it 12%
because I like/love music 11%
to get energized 9%
to evoke memories 4%
Expression, Perception, and Induction of Musical Emotions: A Review and a
Questionnaire Study of Everyday Listening, Patrik N. Juslin and Petri Laukka,
Journal of New Music Research, 2004 3
4. Categories of Emotion
Expressed (intended) emotion
What a performer tries to express
Perceived emotion
What a listener perceives as being expressed in music
Usually the same as the expressed emotion
Felt (induced) emotion
What a listener actually feels
Strongly influenced by the context of music listening
(environment, mood)
4
9. Categorical vs. Dimensional
Pros Cons
Categorical Intuitive Lack a unifying model
Natural language Ambiguous
Atomic description Subjective
Difficult to offer fine-grained
differentiation
Dimensional Focus on a few Less intuitive
dimensions Semantic loss in projection
Good user interface Difficult to obtain ground
truth
9
10. Q: No Consensus on Mood Taxonomy
Work # Emotion description
Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, serious
Feng et al [sigir03] 4 Happy, angry, fear, sad
Li et al [ismir03], Happy, light, graceful, dreamy, longing, dark, sacred,
Wieczorkowska 13 dramatic, agitated, frustrated, mysterious, passionate,
et al [imtci04] bluesy
Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy
Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm
Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, content
Yang et al [mm06] 4 Happy, angry, sad, relaxed
Skowronek et al Arousing, angry, calming, carefree, cheerful, emo-
12
[ismir07] tional, loving, peaceful, powerful, sad, restless, tender
Happy, light, easy, touching, sad, sublime,
Wu et al [mmm08] 8
grand, exciting
Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressive
Trohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry 10
12. Granularity of Emotion Description
Small set of emotion classes
Insufficient comparing to the richness of our perception
Large set of emotion classes
Difficult to obtain reliable ground truth data
Happy
Sad Acerbic, Aggressive, Ambitious,
Angry Amiable, Angry, Bittersweet, Bright,
Relaxed Brittle, Calm/, Carefree, Cathartic,
Cerebral, Cheerful, Circular, Clinical,
Cold, Confident, Delicate, Dramatic,
Dreamy, Druggy, Earnest, Eccentric,
Elegant, Energetic, Enigmatic, Epic,
Exciting, Exuberant, Fierce, Fiery, Fun,
Gentle, Gloomy, Greasy, Happy,
12
13. Sol: Describing Emotions in Emotion Space
鐃 Activation, activity
Arousal 鐃 Energy and stimulation level
Valence
鐃 Pleasantness
鐃 Positive and
negative affective
states
[psp80] 13
14. The Dimensional Approach
Strength
No need to consider which and how many emotions
Generalize MER from categorical domain
to real-valued domain
Easy to compare different
computational models
Arousal
Valence
14
15. The Dimensional Approach
Weakness
Semantic loss due to projection
Blurs important psychological distinctions
3rd dimension: potency [psy07]
Angry afraid
Proud shameful
Interested disappointed
4th dimension: unpredictability
Surprised
Tense afraid
Contempt disgust
15
16. Music Retrieval in VA Space
arousal
Provide a simple
means for 2D user
interface
Pick a point
valence Draw a trajectory
Useful for mobile
devices with small
display space
Demo
16
17. Q: How to Predict Emotion Values?
Transformation-based approach [mm06]
Consider the four quadrants
Perform 4-class mood classification
Apply the following transformation
Arousal = u1 + u2 u3 u4
Valence = u1 + u4 u2 u3
(u denotes likelihood)
Not rigorous
17
18. Sol: Perform Regression
Given features, y
predict a numerical value
Given N inputs (xi, yi), 1 i N,
where xi is feature and yi is the
numerical value to be predicted,
train a regression model R(.)
such that the following mean
squared error (MSE) is minimized
1 N
min (yi f (xi ))2
x
f N
i =1
18
19. Computational Framework [taslp08]
Predict the VA values 1 N
Trains a regression model min (yi f (xi ))2
f N
f (揃) that minimizes the mean i =1
squared error (MSE) yi : numerical emotion value
One for valence; xi : feature (input)
one for arousal f(xi) : prediction result (output)
e.g. linear regression
f(xi) = wTxi +b
Emotion
Manual value = sumj {wjxij} +b
annotation
Training Regressor
data training
Feature
extraction Feature
Regressor
Test Feature Feature Automatic Emotion
data extraction Prediction value
19
20. Obtain Music Emotion Rating
Manual annotation
Rates the VA values of each song
Ordinal rating scale
Scroll bar
Emotion
Manual value
Training annotation
Regressor
data training
Feature
extraction Feature
Regressor
Test Feature Feature Automatic Emotion
data extraction Prediction value
20
21. Evaluation of Emotion Rating
User study
1240 Chinese pop songs; each 30-sec
666 subjects; each rates 8 random songs
Subjective evaluation 0 100
Easiness of annotating emotion
Within-subject reliability: compare to one month later
Between-subject reliability: compare to other subjects
Within-subject Between-subject
Method Easiness
reliability reliability
Emotion rating 2.82 2.92 2.81
From 1 to 5 (strongly disagree to strongly agree)
21
22. AnnoEmo: GUI for Emotion Rating [hcm07]
Encourages differentiation Demo
Drag & drop Click to
to modify listen again
annotation
22
23. Cognitive Load is Still High
Determining VA values is not that easy
Difficult to ensure consistently
Does dist(0.5,0.8) = dist(0.2,0.1) in terms of
our emotion perception?
Does 0.7 the same for two subjects? 1
0.5 0.8
-1 1
0.1
-0.2
-1
23
24. Sol: Ranking Instead of Rating [taslp11a]
Determines the position of a song
By the relative ranking with respect to other songs
Rather than by the exact emotion values
Oh Happy Day valence
positive
valence I Want to Hold Your Hand by Beatles =1
I Feel Good by James Brown
What a Wonderful World by Louis Armstrong
relative Into the Woods by My Morning Jacket exact
ranking The Christmas Song rating
C'est La Vie
Labita by Lisa One
Just the Way You Are by Billy Joel
negative Perfect Day by Lou Reed valence
valence When a Man Loves a Woman by Michael Bolton = 1
Smells Like Teen Spirit by Nirvana
24
25. Ranking-Based Emotion Annotation
Emotion tournament
Requires only n1 pairwise comparisons
The global ordering can later be approximated by a
greedy algorithm [jair99]
a b c d e f g h
a 0
b 3
c 1
d 0
e 0
f 7
g 0
a b c d e f g h h 1
Which songs is more positive? f>b>c=h>a=d=e=g
25
31. Q: Subjective Issue
Each circle
represents the
emotion annotation
for a music piece
by a subject
31
32. Sol: Probabilistic MER [taslp11b]
Predicts the probabilistic distribution P(e|d) of the
perceived emotions of a music piece
32
33. Sol: Personalized MER [sigir09]
From P(e|d) to P(e|d,u)
General regressor personal regressor
Utilize user feedback
Manual Emotion value
annotation Regressor
Training
data Feature training
extraction Feature Regressor
Test Feature Feature Automatic Personalization
data extraction Prediction
Emotion value
Emotion-based User
retrieval feedback
33
34. Evaluation Setup
Training data
195 Western/Japanese/Chinese pop songs
25-sec segment that is representative of the song
Too long the emotion may not be homogeneous
Too short the listener may not hear enough
Manual annotation
253 subjects; each rates 12 songs
Rate the VA values in 11 ordinal levels
鐃 0 鐃 1 鐃 2 鐃 3 鐃 4 鐃 5 鐃 6 鐃 7 鐃 8 鐃 9 鐃 10
Each song is annotated by 10+ subjects
Ground truth obtained by averaging
34
35. Quantitative Result
Method R2 of valence R2 of arousal
Multiple linear regression 0.109 0.568
Adaboost.RT [ijcnn04] 0.117 0.553
SVR (support vector regression) [sc04] 0.222 0.570
SVR + RReliefF (feature selection) [ml03] 0.254 0.609
Result
R2: squared correlation between y and f(x)
Valence prediction is challenging
Valence: 0.25 ~ 0.35
Arousal: 0.60 ~ 0.85
35
36. Qualitative Result
No No No Part 2 - Beyonce
Out Ta Get Me - Guns N' Roses
You're Crazy - Guns N' All Of Me - 50 Cent
Roses
Bodies - Sex Pistols New York Giants -
Big Pun
I've Got To See You
Mammas Don't Let
Again - Norah Jones
Your Babies Grow
Up To Be Cowboys -
If Only In The Heaven's Willie Nelson
Eyes - NSYNC
Live For The One I Love -
The Last Resort - The Eagles
Celine Dion
Why Do I Have To Choose - Willie Nelson 36
37. Missing 1: Temporal Context of Music
Sweet anticipation by David Huron
Musics most expressive qualities probably relate to
structural changes across time
Music emotion can
also vary within an
excerpt [tsmc06]
37
38. Missing 2: Context of Music Listening
Listening mood/context
Familiarity/associated memory
Preference of the singer/performer/song
Social relationship
38
39. Conclusion
A computational framework for predicting numerical
emotion values
Generalizes MER from categorical to dimensional
Resolves some issues of emotion description
Rank instead of rate
2D user interface for music retrieval
Valence & subjectivity
Content & context
Acknowledgement
Prof. Homer Chen, National Taiwan University
39
40. Reference
Music Emotion Recognition, CRC Press, 2011
A regression approach to music emotion recognition,
IEEE TASLP, 2008. (cited by 76)
Ranking-based emotion recognition for music
organization and retrieval, IEEE TASLP, 2011
Prediction of the distribution of perceived
music emotions using discrete samples,
IEEE TASLP, 2011
Exploiting online tags for music emotion
classification, ACM TOMCCAP, 2011
Machine recognition of music emotion:
A review, ACM TIST, 2012
40
CRC Press