際際滷

際際滷Share a Scribd company logo
Spatio-temporal
local feature clusters
Iveel
Intro
 Overview
 Video as a cloud of feature points
 Clusters of feature points
 Video representation
 Classification
 Decision making
 Result
Overview
 In Bag-of-Features (BOF) representation, the spatio-
temporal configuration of video is ignored
 Proposed approach is to integrate spatio-temporal
structure in video representation.
 Local features are grouped ( refered as cluster ) based on
their spatio-temporal proximity
 Each group , or cluster, will be independently represented
as BOF, (refered as cluster-level BOF).
 It will allow to localize the action in the video
segment.
Video
 A video segment can be viewed as a cloud of
local features in 3D space (x,y,t) .
Local feature grouping
 Intuition: Closely localized features ( in spatio-temporal
domain) are more likely to be correspond to a same
object, and far ones are more unlikely.
 In order to exploit this idea, a tree cluster is used to group
local features based on their spatio-temporal proximity.
In this example, local feature points grouped into two clusters ( red & blue )
Cluster-level BOF
 Once local features are grouped as a cluster, each
cluster is represented using BOF approach ( will be
referred as cluster-level BOF) .
 A frequency histogram will be generated over local
descriptors which belong to a particular cluster.
Training & Learning
 At each scale, a SVM classifier is trained with
cluster-level BOF.
Experimental study
 Action segments from TRECVID SED is used for
training & testing.
 7 action class:
CellToEar, Embrace, ObjectPut, PeopleMeet, Peopl
eSplitUp, PersonRuns, Pointing.
 Training : 210 video segments in total
 30 videos segments per action class
 Testing: 138 video segments in total
 approx.20 video segments per action class
Experimental study
 The spatio-temporal bounding box is manually
drawn for both test & training set segments.
Experiment 1- Cluster number vs
performace
 The optimal number of cluster is studied.
 In the experiment, 6 different cluster number are chosen:
1,2,4,8,16 and 32.
 For example: If the cluster number is 16, then it means
that the video segment is divided into 16 sub-regions
(cluster) and each has its own BOF histogram ( cluster-
BOF) . Based on the bounding box information, the cluster-
BOF is annotated.
Experiment 1- Cluster number vs
performace : CellToEar
Experiment 1- Cluster number vs
performace : Embrace
Experiment 1- Cluster number vs
performace : ObjecPut
Experiment 1- Cluster number vs
performace : PeopleMeet
Experiment 1- Cluster number vs
performace : PeopleSplitUp
Experiment 1- Cluster number vs
performace : PersonRuns
Experiment 1- Cluster number vs
performace : Pointing
Conclusion
 The results is based on cluster-level BOF.
 To give segment-based result, the proper
aggregation of cluster-BOFs, belong to same
video-segment, is required.
 The na誰ve approach is to assign an action
class, that has a highest vote from clusters, to its
parent segment.

More Related Content

Viewers also liked (10)

PDF
Action Recognition based Graph Cut
DCU
PPTX
Npo Program for 501c Corporations
Lester Faison
DOCX
Mandibular fractures
tapanjardosh
PPTX
CAD Overview
Lester Faison
PPTX
Basic Five Training
Lester Faison
PPTX
Official ULO Overview
Lester Faison
PPTX
Cell analogy
e050265
PPTX
The Health Bensfits of OPC3
Lester Faison
PPTX
Shopping Annuity Presentation
Lester Faison
PPTX
Motives Cosmetics Business Overview
Lester Faison
Action Recognition based Graph Cut
DCU
Npo Program for 501c Corporations
Lester Faison
Mandibular fractures
tapanjardosh
CAD Overview
Lester Faison
Basic Five Training
Lester Faison
Official ULO Overview
Lester Faison
Cell analogy
e050265
The Health Bensfits of OPC3
Lester Faison
Shopping Annuity Presentation
Lester Faison
Motives Cosmetics Business Overview
Lester Faison

Similar to Cluster (20)

PPTX
Explaining video summarization based on the focus of attention
VasileiosMezaris
PPTX
Autom editor video blooper recognition and localization for automatic monolo...
Carlos Toxtli
PDF
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
PDF
Object-Region Video Transformers
Sangwoo Mo
PPTX
Action_recognition-topic.pptx
computerscience98
PDF
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
Sunghyun Park
PDF
Maxim Kamensky - Applying image matching algorithms to video recognition and ...
Eastern European Computer Vision Conference
PPT
Fast object re-detection and localization in video for spatio-temporal fragme...
LinkedTV
PDF
Video + Language 2019
Goergen Institute for Data Science
PDF
F0953235
IOSR Journals
PPT
Video summarization using clustering
Sahil Biswas
PPTX
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Saimunur Rahman
PDF
CS216FinalPaper
Naren Sathiya
PDF
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
MediaMixerCommunity
PPT
Integration of Domain-Specific and Domain-Independent Ontologies for Colonosc...
Jie Bao
PPTX
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Sangmin Woo
PDF
Fast object re detection and localization in video for spatio-temporal fragme...
MediaMixerCommunity
PPTX
NMSL_2017summer
Wen-Chih Lo
PDF
A Multimodal Approach for Video Geocoding
MediaEval2012
Explaining video summarization based on the focus of attention
VasileiosMezaris
Autom editor video blooper recognition and localization for automatic monolo...
Carlos Toxtli
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
Object-Region Video Transformers
Sangwoo Mo
Action_recognition-topic.pptx
computerscience98
[AAAI 2021] Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Di...
Sunghyun Park
Maxim Kamensky - Applying image matching algorithms to video recognition and ...
Eastern European Computer Vision Conference
Fast object re-detection and localization in video for spatio-temporal fragme...
LinkedTV
Video + Language 2019
Goergen Institute for Data Science
F0953235
IOSR Journals
Video summarization using clustering
Sahil Biswas
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Saimunur Rahman
CS216FinalPaper
Naren Sathiya
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
MediaMixerCommunity
Integration of Domain-Specific and Domain-Independent Ontologies for Colonosc...
Jie Bao
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Sangmin Woo
Fast object re detection and localization in video for spatio-temporal fragme...
MediaMixerCommunity
NMSL_2017summer
Wen-Chih Lo
A Multimodal Approach for Video Geocoding
MediaEval2012
Ad

Recently uploaded (20)

PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
PPTX
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
PDF
Quantum Threats Are Closer Than You Think Act Now to Stay Secure
WSO2
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
PDF
Lets Build Our First Slack Workflow! .pdf
SanjeetMishra29
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
PDF
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
PDF
A Re-imagination of Embedded Vision System Design, a Presentation from Imag...
Edge AI and Vision Alliance
PPTX
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
Quantum Threats Are Closer Than You Think Act Now to Stay Secure
WSO2
Why aren't you using FME Flow's CPU Time?
Safe Software
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
Lets Build Our First Slack Workflow! .pdf
SanjeetMishra29
CapCut Pro PC Crack Latest Version Free Free
josanj305
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
A Re-imagination of Embedded Vision System Design, a Presentation from Imag...
Edge AI and Vision Alliance
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
My Journey from CAD to BIM: A True Underdog Story
Safe Software
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Safe Software
Ad

Cluster

  • 2. Intro Overview Video as a cloud of feature points Clusters of feature points Video representation Classification Decision making Result
  • 3. Overview In Bag-of-Features (BOF) representation, the spatio- temporal configuration of video is ignored Proposed approach is to integrate spatio-temporal structure in video representation. Local features are grouped ( refered as cluster ) based on their spatio-temporal proximity Each group , or cluster, will be independently represented as BOF, (refered as cluster-level BOF). It will allow to localize the action in the video segment.
  • 4. Video A video segment can be viewed as a cloud of local features in 3D space (x,y,t) .
  • 5. Local feature grouping Intuition: Closely localized features ( in spatio-temporal domain) are more likely to be correspond to a same object, and far ones are more unlikely. In order to exploit this idea, a tree cluster is used to group local features based on their spatio-temporal proximity. In this example, local feature points grouped into two clusters ( red & blue )
  • 6. Cluster-level BOF Once local features are grouped as a cluster, each cluster is represented using BOF approach ( will be referred as cluster-level BOF) . A frequency histogram will be generated over local descriptors which belong to a particular cluster.
  • 7. Training & Learning At each scale, a SVM classifier is trained with cluster-level BOF.
  • 8. Experimental study Action segments from TRECVID SED is used for training & testing. 7 action class: CellToEar, Embrace, ObjectPut, PeopleMeet, Peopl eSplitUp, PersonRuns, Pointing. Training : 210 video segments in total 30 videos segments per action class Testing: 138 video segments in total approx.20 video segments per action class
  • 9. Experimental study The spatio-temporal bounding box is manually drawn for both test & training set segments.
  • 10. Experiment 1- Cluster number vs performace The optimal number of cluster is studied. In the experiment, 6 different cluster number are chosen: 1,2,4,8,16 and 32. For example: If the cluster number is 16, then it means that the video segment is divided into 16 sub-regions (cluster) and each has its own BOF histogram ( cluster- BOF) . Based on the bounding box information, the cluster- BOF is annotated.
  • 11. Experiment 1- Cluster number vs performace : CellToEar
  • 12. Experiment 1- Cluster number vs performace : Embrace
  • 13. Experiment 1- Cluster number vs performace : ObjecPut
  • 14. Experiment 1- Cluster number vs performace : PeopleMeet
  • 15. Experiment 1- Cluster number vs performace : PeopleSplitUp
  • 16. Experiment 1- Cluster number vs performace : PersonRuns
  • 17. Experiment 1- Cluster number vs performace : Pointing
  • 18. Conclusion The results is based on cluster-level BOF. To give segment-based result, the proper aggregation of cluster-BOFs, belong to same video-segment, is required. The na誰ve approach is to assign an action class, that has a highest vote from clusters, to its parent segment.