狠狠撸

Spatio-temporal
local feature clusters
Iveel

Intro
鈥� Overview
鈥� Video as a cloud of feature points
鈥� Clusters of feature points
鈥� Video representation
鈥� Classification
鈥� Decision making
鈥� Result

Overview
鈥� In Bag-of-Features (BOF) representation, the spatio-
temporal configuration of video is ignored
鈥� Proposed approach is to integrate spatio-temporal
structure in video representation.
鈥� Local features are grouped ( refered as cluster ) based on
their spatio-temporal proximity
鈥� Each group , or cluster, will be independently represented
as BOF, (refered as cluster-level BOF).
鈥� It will allow to localize the action in the video
segment.

Video
鈥� A video segment can be viewed as a cloud of
local features in 3D space (x,y,t) .

Local feature grouping
鈥� Intuition: Closely localized features ( in spatio-temporal
domain) are more likely to be correspond to a same
object, and far ones are more unlikely.
鈥� In order to exploit this idea, a tree cluster is used to group
local features based on their spatio-temporal proximity.
In this example, local feature points grouped into two clusters ( red & blue )

Cluster-level BOF
鈥� Once local features are grouped as a cluster, each
cluster is represented using BOF approach ( will be
referred as cluster-level BOF) .
鈥� A frequency histogram will be generated over local
descriptors which belong to a particular cluster.

Training & Learning
鈥� At each scale, a SVM classifier is trained with
cluster-level BOF.

Experimental study
鈥� Action segments from TRECVID SED is used for
training & testing.
鈥� 7 action class:
CellToEar, Embrace, ObjectPut, PeopleMeet, Peopl
eSplitUp, PersonRuns, Pointing.
鈥� Training : 210 video segments in total
鈥� 30 videos segments per action class
鈥� Testing: 138 video segments in total
鈥� approx.20 video segments per action class

Experimental study
鈥� The spatio-temporal bounding box is manually
drawn for both test & training set segments.

Experiment 1- Cluster number vs
performace
鈥� The optimal number of cluster is studied.
鈥� In the experiment, 6 different cluster number are chosen:
1,2,4,8,16 and 32.
鈥� For example: If the cluster number is 16, then it means
that the video segment is divided into 16 sub-regions
(cluster) and each has its own BOF histogram ( cluster-
BOF) . Based on the bounding box information, the cluster-
BOF is annotated.

performace : CellToEar

performace : Embrace

performace : ObjecPut

performace : PeopleMeet

performace : PeopleSplitUp

performace : PersonRuns

performace : Pointing

Conclusion
鈥� The results is based on cluster-level BOF.
鈥� To give segment-based result, the proper
aggregation of cluster-BOFs, belong to same
video-segment, is required.
鈥� The na茂ve approach is to assign an action
class, that has a highest vote from clusters, to its
parent segment.

狠狠撸

Cluster

More Related Content

Viewers also liked (10)

Similar to Cluster (20)

Recently uploaded (20)

Cluster