The presentation of "Flickr Distance" given by Xian-Sheng Hua in the best paper session of ACM Multimedia 2008
1 of 41
Downloaded 142 times
More Related Content
Flickr Distance
1. ACM Multimedia 2008
Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li
Microsoft Research Asia
University of Science and Technology of China
October 28, 2008
6. Image Similarity/Distance
Numerous efforts have been made.
Concept Similarity/Distance
Olympic
Cat Sports
Paw Tiger
More and more used, but not well studied.
6
8. WordNet
150,000 words
WordNet Distance
Quite a few methods to get it in WordNet
Basic idea is to measure the length of the
path between two words
Pros and Cons
Pros: Built by human experts, so close to
human perception
Cons: Coverage is limited and difficult to
extend
8
9. Normalized Google Distance (NGD)
Reflects the concurrency of two words in Web documents
Defined as
Pros and Cons
Pros: Easy to get and huge coverage
Cons: Only reflects concurrency in textual documents. Not really
concept distance (semantic relationship)
9
10. Concept Pairs Google Distance
Airplane ¨C Dog 0.2562
Football ¨C Soccer 0.1905
Horse ¨C Donkey 0.2147
Airplane ¨C Airport 0.3094
Car ¨C Wheel 0.3146
10
11. Image Tag Concurrence Distance (Qi, Hua, et al. ACMMM07)
Reflects the frequency of two tags
occur in the same images
Based on the same idea of NGD
Mostly is sparse (> 95% are zero in
the similarity matrix)
Pros and Cons
Pros: Images are taken into account
Cons: a) Tags are sparse so visual
concurrency is not well reflected
b) Training data is difficult to get similarity matrix: 500 tags
similarity matrix: 50 tags
11
12. Concept Pairs Google Distance
Tag Concurrence Distance
Airplane ¨C Dog 0.2562 0.8532
Football ¨C Soccer 0.1905 0.1739
Horse ¨C Donkey 0.2147 0.4513
Airplane ¨C Airport 0.3094 0.1833
Car ¨C Wheel 0.3146 0.9617
12
13. table ¡ª ping- horse ¡ª donkey car ¡ª wheel airplane ¡ª airport
tennis pong
Synonymy Visually Similar Meronymy Concurrency
different words but similar things or part and exist at the same
the same meaning things of same type the whole scene/place
13
14. Image tag concurrence
distance implicitly uses
image information, but
Concept Distance tags are too sparse
Mine from image tags
Google distance¡¯s coverage
is very high, but it is for
text domain
Mine from text documents
WordNet distance is good,
but coverage is too low
Mine from ontology
14
15. Can we mine concept distance
from image content?
15
16. Semantic concept distance is based on human¡¯s cognition
To of humanconcept distanceinformation
80%
mine cognition comes from visual from a
large tagged image collection
There are around 2.8 billion photos on Flickr (by Sep 08)
based on image content
In average each Flickr image has around 8 tags
bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa
16
17. Concept A: Airplane Concept B: Airport
Concept Model A Concept Model B
Flickr Distance (A, B)
17
18. FlickrDistance
0.5151
0.0315
0.4231
0.0576
0.0708
Flickr Distance is able to cover the
four different semantic relationships
Synonymy, Visually Similar, Meronymy, and Concurrency
18
19. R1: A Good Image Collection
Large
High coverage, especially on real-life world
With tags
Real-life photos reflect human¡¯s perception
Based on collective intelligence from grassroots
19
20. R2: A Good Concept Representation or Model
Based on image content
Can cover wider concept relationships
Can handle large-concept set
Concept Models
Discriminative SVM, Boosting, ¡
Generative
Global Feature
Local Feature
w/o Spatial Relation Bag-of-Words (pLSA, LDA), ¡
w/ Spatial Relation 2D HMM, MRF, ¡ 20
21. VLM ¨C Visual Language Model
Spatial-relation sensitive
Efficient
Concept Models object variations
Can handle
Discriminative SVM, Boosting, ¡
Generative
Global Feature
Local Feature
w/o Spatial Relation Bag-of-Words, ¡
w/ Spatial Relation 2D HMM, MRF, ¡ 21
22. I am talking about statistical language model.
Unigram Model P?wx w1w2 ? wn ? ? P?wx ?
Bigram Model P?wx w1w2 ? wn ? ? p?wx wx?1 ?
Trigram Model P?wx w1w2 ? wn ? ? P?wx wx?1 wx ?2 ?
22
23. Visual Word Generation
1 0 0 1 0 0 1 0
Patch ? Gradient Texture Histogram Hashing ? Visual Word
Image ? Patch
Unigram Model P?wxy w11w12 ? wmn ? ? P?wxy ?
Bigram Model P?wxy w11w12 ? wmn ? ? P wxy wx ?1, y ? ?
Trigram Model P?wxy w11w12 ? wmn ? ? p?w xy wx?1, y wx, y ?1 ?
Trigram VLM is estimated by directly counting from sufficient samples of each category.
To avoid the bias in the sampling, back-off smoothing method is adopted.
23
26. Latent-Topic VLM Training
Solved by EM algorithm,
The objective function is to maximize the joint distribution of concept
and its visual word arrangement Aw
maximize p? Aw , C ?
£½? ? ?
P wxy wx ?1, y wx , y ?1 , d C
j ?
d C ?C
j x, y
Estimate the posteriors of the hidden topics Maximize the likelihood of visual arrangement
26
28. Kullback ¨C Leibler (KL) divergence
Good, but not symmetric
topic distance
Jensen ¨CShannon (JS) divergence
Better, as it is symmetric
And, square root of JS divergence is a metric, so is Flickr Distance
topic distance
concept distance 28
29. Concept A: Airplane Concept B: Airport
Tag search in
Flickr
Concept Model A LT-VLM Concept Model B
Jensen-Shannon
Divergence
Flickr Distance (A, B)
29
31. Images
6,400,000 from Flickr
Concepts
130,000,000 different tags
10,000,000 filtered tags
1,000 randomly-selected tags
Comparison
Normalized Google Distance (NGD)
Tag Concurrence Distance (TCD)
Flickr Distance (FD)
31
32. Ground-Truth
12 persons are asked to score semantic correlation of each concept pair
Average scores are taken as ground-truth
Evaluate Accuracy of ¡°Relative Distance Pairs¡±
Step 1: Find all distance pairs D(a,b) and D(c,d)
Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth
Correct Rate
0.56
0.55
0.54
0.53
0.52
0.51
0.5
0.49
0.48
0.47
NGD TCD FD
32
33. Ground-Truth
WordNet Distance
Only 497 concepts (overlap of WordNet and the 1000 concepts)
Evaluate Accuracy of ¡°Relative Distance Pairs¡±
Step 1: Find all distance pairs D(a,b) and D(c,d)
Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth
Correct Rate
0.54
0.53
0.52
0.51
0.5
0.49
0.48
0.47
0.46
0.45
NGD TCD FD
33
34. Concept Clustering
23 concepts;
3 groups ¨C (1) outer space, (2) animal and (3) sports
Normalized Google Distance Tag Concurrence Distance Flickr Distance
Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3
bears bowling baseball moon baseball basketball moon bears baseball
horses dolphin basketball space donkey bears Saturn dolphin basketball
bowling
moon donkey football Venus softball space donkey football
dolphin
space Saturn golf whale wolf football Venus golf snake
sharks soccer golf horses soccer
snake tennis horses sharks bowling
softball volleyball Saturn spiders softball
spiders sharks tennis volleyball
soccer
turtle whale
spiders
Venus tennis wolf
whale turtle
wolf volleyball
34
35. Based on an approach using concept relation
Dual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007)
On 79 concepts / 79,000 images
NGD-DCMRM TCD-DCMRM FD-DCMRM
1200
960
1000
correct keywords
Total number of
800
600
423
400 354
301 310
212 186 212 193
200
55 53 57
0
1 2 3 4
NGD-DCMRM 55 212 212 301
TCD-DCMRM 53 186 193 310
FD-DCMRM 57 354 423 960
The number of correctly annotated keywords at the first N words
35
36. To Improve Tagging Quality
Eliminating tag incompletion, noises, and ambiguity
500 images / 10 recommended tags per image
0.78
0.758
0.76
0.74
0.72
0.7
0.68 0.665
0.66 0.652
0.64
0.62
0.6
0.58
NGD Tag Concurrent Distance Flickr Distance
Precision @ 10
36
37. Why VLM divergence can estimate concept distance?
Why FD works well even tags are not complete?
room patterns
Computer VLM: computer patterns of trigrams
distribution other patterns
room patterns TV patterns other patterns
TV
room patterns screen patterns other patterns
Office
37
38. If we find similar patterns in the images
associated with different concepts,
the corresponding concept
relationships can be discovered.
Computer Office
38
39. Flickr Distance
A novel approach to discover semantic relationships
from image content
based on real-life images from the Web
based on collective intelligence from grassroots
A distance more consistent with human¡¯s perception
A measurement more effective in many applications
39