際際滷

際際滷Share a Scribd company logo
CERTH @ MediaEval 2012 Social
Event Detection Task
Manos Schinas, Georgios Petkos, Symeon Papadopoulos,
Yiannis Kompatsiaris



Pisa, 4-5 October 2012
The problem
   Identify social events in tagged photos collections:
       Challenge 1: Technical Events @ Germany
       Challenge 2: Soccer matches @ Madrid, Hamburg
       Challenge3: Indignados protest @ Madrid

   Alternative formulation:
       Represent a collection of photos as a graph, where items
        with high probability to belong to the same event are
        connected.
       Each event forms a dense sub-graph in it.
       Points to community detection as method to address the
        problem.



                                                               2
Approach

 Step 1




 Step 2




 Step 3




           3
Graph Creation (1)

 Graph creation is based on the use of Same
  Class model
   A classifier which predicts whether two images
    belong to the same event or not
   Support Vector Machine classifier trained with the
    data of the 2011 challenge
   Input features: dissimilarities across user, title, tags,
    description, time taken, GIST, SURF/VLAD

                                                           4
Graph Creation (2)

 Use the same class model to connect the items
  of the collection that belong to the same event
 Retrieve candidate neighbours (~350) to
  reduce computational cost
     50 with respect to textual features
     150 with respect to time
     50 with respect to location (when it exists)
     100 with respect to visual features

                                                     5
Event Partitioning and Expansion (1)
 Event partitioning
   The nodes of the graph are clustered into
    candidate events by using the Structural Clustering
    Algorithm for Networks (SCAN).
   The items clustered together by SCAN are used to
    obtain an aggregate representation of each
    candidate social event.
   Split the candidate events that exceed a
    predefined time range into shorter events.


                                                     6
Event Partitioning and Expansion (2)
 Expansion of the candidate events set
   Each image that does not belong to any event
    forms a single-item event.
   Merge these single-item events into larger clusters
    by checking location and time.
   Add the new events in the set of the candidate
    events




                                                      7
Event Filtering (1)
 Filter in two ways:
   By using geo-location (if exists)
   By using tag-based models
 Geo-location Filtering
   Discard events that dont contained into the
    bounding box of the specific challenge
   30% of candidate events are discarded




                                                   8
Event Filtering (2)
 Tag-based filtering
   Build term models by finding the 500 dominant
    terms for the specific locations and event types.
   we collect images from Flickr that are relevant to
    the location or the type of event of interest.
   Images for Madrid, Hamburg and Germany
   Images for indignados, soccer and technical
    events



                                                         9
Event Filtering (3)
 Tag-based filtering
   Probability of appearance


   We compute the ratio of the probability of
    appearance in the focus set over the probability of
    appearance in the reference set.
   Keep the 500 terms with the highest ratio
   Jaccard similarity between a tag model and events
    terms


                                                     10
Evaluation




Notation
Run 1: Same class model trained with 10000 pairs of images.
Run 2: Same class model trained with 30000 pairs of images.
Run 3: Same class model of run 1 with post processing step


                                                              11
Discussion (1)
 Moving from a smaller (run 1) to a larger (run
  2) training dataset does not seem to improve
  most of the performance  over fitting
 Method fails in challenge 1 because these
  events are different from these of the training
  dataset
 A good tag model has to be used for
  classification in post-filtering step


                                               12
Discussion (2)
 Future actions:
   train the same class model with a richer set of
    data
   explore different graph construction strategies
    and community detection algorithms.
 Ways to improve:
   better topic classification methods
   more sophisticated methods for location
    estimation

                                                      13
Questions




            14

More Related Content

CERTH @ MediaEval 2012 Social Event Detection Task

  • 1. CERTH @ MediaEval 2012 Social Event Detection Task Manos Schinas, Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris Pisa, 4-5 October 2012
  • 2. The problem Identify social events in tagged photos collections: Challenge 1: Technical Events @ Germany Challenge 2: Soccer matches @ Madrid, Hamburg Challenge3: Indignados protest @ Madrid Alternative formulation: Represent a collection of photos as a graph, where items with high probability to belong to the same event are connected. Each event forms a dense sub-graph in it. Points to community detection as method to address the problem. 2
  • 3. Approach Step 1 Step 2 Step 3 3
  • 4. Graph Creation (1) Graph creation is based on the use of Same Class model A classifier which predicts whether two images belong to the same event or not Support Vector Machine classifier trained with the data of the 2011 challenge Input features: dissimilarities across user, title, tags, description, time taken, GIST, SURF/VLAD 4
  • 5. Graph Creation (2) Use the same class model to connect the items of the collection that belong to the same event Retrieve candidate neighbours (~350) to reduce computational cost 50 with respect to textual features 150 with respect to time 50 with respect to location (when it exists) 100 with respect to visual features 5
  • 6. Event Partitioning and Expansion (1) Event partitioning The nodes of the graph are clustered into candidate events by using the Structural Clustering Algorithm for Networks (SCAN). The items clustered together by SCAN are used to obtain an aggregate representation of each candidate social event. Split the candidate events that exceed a predefined time range into shorter events. 6
  • 7. Event Partitioning and Expansion (2) Expansion of the candidate events set Each image that does not belong to any event forms a single-item event. Merge these single-item events into larger clusters by checking location and time. Add the new events in the set of the candidate events 7
  • 8. Event Filtering (1) Filter in two ways: By using geo-location (if exists) By using tag-based models Geo-location Filtering Discard events that dont contained into the bounding box of the specific challenge 30% of candidate events are discarded 8
  • 9. Event Filtering (2) Tag-based filtering Build term models by finding the 500 dominant terms for the specific locations and event types. we collect images from Flickr that are relevant to the location or the type of event of interest. Images for Madrid, Hamburg and Germany Images for indignados, soccer and technical events 9
  • 10. Event Filtering (3) Tag-based filtering Probability of appearance We compute the ratio of the probability of appearance in the focus set over the probability of appearance in the reference set. Keep the 500 terms with the highest ratio Jaccard similarity between a tag model and events terms 10
  • 11. Evaluation Notation Run 1: Same class model trained with 10000 pairs of images. Run 2: Same class model trained with 30000 pairs of images. Run 3: Same class model of run 1 with post processing step 11
  • 12. Discussion (1) Moving from a smaller (run 1) to a larger (run 2) training dataset does not seem to improve most of the performance over fitting Method fails in challenge 1 because these events are different from these of the training dataset A good tag model has to be used for classification in post-filtering step 12
  • 13. Discussion (2) Future actions: train the same class model with a richer set of data explore different graph construction strategies and community detection algorithms. Ways to improve: better topic classification methods more sophisticated methods for location estimation 13
  • 14. Questions 14

Editor's Notes

  • #6: But if not possible to match with any city, then dont filter out the photo (bias towards higher recall).