際際滷

際際滷Share a Scribd company logo
Scaling-up and Speeding-up Video Analytics Inside Database EngineQiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang21 HP Labs, Palo Alto, California, USA2 HP Labs, Beijing, ChinaHewlett Packard Co.
MotivationVideo has become an indispensable carrier of information For business perception, decision and actionExistent video analysis applications generally fail to scaleDatabase is treated a storage engine rather than a computation engineTransfer of massive amount of data is the bottleneckA unified platform is required by The demand for near real-time responses to enable Operational BIData-intensive transformation and analysis
Our ApproachPush down video processing to database engine Faster data access, less data transferUser Defined Functions (UDFs) as wrapper of video analysis and search operations
Problems with UDF (1)Lack of formal support of relational input and outputUnaware of relation schemaUnable to model complex applications, Unable to be composed with relational operators in a SQL queryTypically executed in the tuple-wise pipeline in query processingPerformance penalty for certain applicationsProhibits data-parallel computation inside the function body
Problems with UDF (2)Dilemma between UDF execution efficiency and coding easinessUDF must use system internal data objects and system calls Encoding DBMS data into strings to pass to UDFs incurs significant overhead
Our SolutionsSupporting Relation-Valued Functions (RVF) at SQL levelE.g. SVM classifier as RVFRelations as input and outputEasier application modelingHigher execution efficiencyMake possible exploring of parallelismRVF invocation pattern Mechanisms of applying RVFs input/outputHigh-level APIs are provideInvocation pattern-oriented RVF containersSupport RVF running in query processing69/1/2009
Video Pattern Recognition Process
Video Retrieval Process
Video Classification by SVMTables:Features [featureID, imageID, featureType, feature]Models [modelID, featureType, concept, model]Labels [imageID, concept, nearness]
SVM by Scalar UDF  the InefficiencyClassify using conventional scalar UDFSELECT imageID, concept, AVG (nearness) FROM	(SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, 	f.feature, m.model) AS nearness 	FROM Features f, Models m 	WHERE f.featrureType = m.featrureType)GROUP BY imageID, concept;For each feature of each image, its nearness score to each concept is computedThe resulting nearness measures are aggregated by an average functionInefficiency of executionModel cannot be cachedModel is retrieved for each feature
RVFs as Relational OperatorsA simple RVF definitionDEFINE RVF f (R1, R2, k) RETURN R3 {Relation R1 (/*schema*/); Relation R2 (/*schema*/);int k; Relation R3 (/*schema*/);PROCEDURE fn(/*dll name*/);RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK}RVF can be naturally composed with relational operators or sub-queriesSELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
SVM by Relation Value FunctionSELECT imageID,concept,AVG(nearness)FROM (SELECT imageID, featureID,	concept, nearness FROM classify1(		SELECT * FROM Features,		SELECT concept, model, featureTypeFROM Models))GROUP BY imageID, concept;
RVF Invocation PatternsInvocation pattern Mechanism to deal with input/output of RVFGeneralization of the limited formsPurposesEnsuring that its interaction with the query executor is defined at a high levelMaking it possible to provide high-level APIsShielding UDF developers from DBMS system internal details
Patterns DefinedBasic patternPer-tuple patternBlock patternComplex patternCartProdProbe (Cartesian product probe)
CartProdProbe PatternSELECT r.imageID, r.concept, AVG(r.nearness)FROM 	(Features f CROSS APPLY classify2 (f.featureID, f.featureType, f.feature,		SELECT concept, model, featureType FROM Models)) rGROUP BY r.imageID, r.concept;Features table is fed into RVF tuple by tuple; Models table fed in as a whole
RVF ContainerAn extension of query executor for supporting RVF executionInvocation pattern-specificArgument evaluationReturn value wrappingMemory context switchingData conversionInitial data preparationCross-call data passingFinal cleanup
Performance Gain in SVM Classification by Using RVFSVM query using RVF outperforms that using conventional scalar UDF
Support In-RVF Data-Parallel -SVM LearningINSERT INTO ModelsSELECT modelID + 1, feature_type, concept_name, svm_learning (	SELECT feature, nearness FROM TrainFeatures f, TrainLables l WHERE l.imageID = f.imageID AND l.concept = concept_name AND 	f.featureType = feature_type)FROM Models WHERE modelID = (SELECT max(modelID) from Models);SVM learning speed up in multi-core RVF
SummaryVideo analysis system inside a database engineLeverage UDF to push down video analyticsRVFs, a language level extensionImprove the capability of application modelingIncrease efficiency execution and cache usesMake it possible to explore computation parallelismRVF container and its associated APIsSeparate analytics logic from system administration and programming effortsPrototyped on the PostgreSQL

More Related Content

What's hot (8)

10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles
Majong DevJfu
5 sins of all hands ppt
5 sins of all hands ppt5 sins of all hands ppt
5 sins of all hands ppt
Spike Gu
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Services
khandaa
Half-Push/Half-Polling
Half-Push/Half-PollingHalf-Push/Half-Polling
Half-Push/Half-Polling
YoungSu Son
Building modular applications
Building modular applicationsBuilding modular applications
Building modular applications
IndicThreads
Automatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodelsAutomatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodels
Ivano Malavolta
MexADL - HADAS Presentation
MexADL - HADAS PresentationMexADL - HADAS Presentation
MexADL - HADAS Presentation
jccastrejon
4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio
Majong DevJfu
10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles
Majong DevJfu
5 sins of all hands ppt
5 sins of all hands ppt5 sins of all hands ppt
5 sins of all hands ppt
Spike Gu
EmployDEX Consulting Services
EmployDEX Consulting ServicesEmployDEX Consulting Services
EmployDEX Consulting Services
khandaa
Half-Push/Half-Polling
Half-Push/Half-PollingHalf-Push/Half-Polling
Half-Push/Half-Polling
YoungSu Son
Building modular applications
Building modular applicationsBuilding modular applications
Building modular applications
IndicThreads
Automatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodelsAutomatically bridging UML profiles into MOF metamodels
Automatically bridging UML profiles into MOF metamodels
Ivano Malavolta
MexADL - HADAS Presentation
MexADL - HADAS PresentationMexADL - HADAS Presentation
MexADL - HADAS Presentation
jccastrejon
4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio4 - Architetture Software - Architecture Portfolio
4 - Architetture Software - Architecture Portfolio
Majong DevJfu

Viewers also liked (6)

Extend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated AnalyticsExtend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated Analytics
Rui Liu
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingDAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
Rui Liu
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
Roger Barga
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
10 Tips for WeChat
10 Tips for WeChat10 Tips for WeChat
10 Tips for WeChat
Chris Baker
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
Barry Feldman
Extend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated AnalyticsExtend Udf Technology For Integrated Analytics
Extend Udf Technology For Integrated Analytics
Rui Liu
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS HostingDAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting
Rui Liu
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
Roger Barga
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
10 Tips for WeChat
10 Tips for WeChat10 Tips for WeChat
10 Tips for WeChat
Chris Baker
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
Barry Feldman

Similar to Scaling Up And Speeding Up Video Analytics Inside Database Engine (20)

A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
Edge AI and Vision Alliance
Innovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy KeynoteInnovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy Keynote
Daniel Berg
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
Nishanth Harapanahalli
Legacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the EnterpriseLegacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the Enterprise
Anatole Tresch
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_exp
surekhakadi
Work Portfolio
Work PortfolioWork Portfolio
Work Portfolio
Amit Prabhudesai
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
Kellyton Brito
NashTech - Azure Application Insights
NashTech - Azure Application InsightsNashTech - Azure Application Insights
NashTech - Azure Application Insights
Phi Huynh
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Vladimir Bacvanski, PhD
.net Framework
.net Framework.net Framework
.net Framework
Rishu Mehra
Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...
WMG centre High Value Manufacturing Catapult
Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2Web 2.0 Development with IBM DB2
Web 2.0 Development with IBM DB2
Vladimir Bacvanski, PhD
Introduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORMIntroduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORM
peterbahaa
Building machine learning service in your business Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business  Eric Chen (Uber) @PAPIs ...Building machine learning service in your business  Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business Eric Chen (Uber) @PAPIs ...
PAPIs.io
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 Apps
Iwan Rahabok
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
Han Yan
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
Han Yan
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
Joel Falcou
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
Amazon Web Services
A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
Edge AI and Vision Alliance
Innovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy KeynoteInnovate 2014 DevOps: Release and Deploy Keynote
Innovate 2014 DevOps: Release and Deploy Keynote
Daniel Berg
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
Nishanth Harapanahalli
Legacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the EnterpriseLegacy Renewal of Central Framework in the Enterprise
Legacy Renewal of Central Framework in the Enterprise
Anatole Tresch
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_exp
surekhakadi
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
Kellyton Brito
NashTech - Azure Application Insights
NashTech - Azure Application InsightsNashTech - Azure Application Insights
NashTech - Azure Application Insights
Phi Huynh
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Revolutionizing the Data Abstraction Layer with IBM Optim pureQuery and DB2
Vladimir Bacvanski, PhD
.net Framework
.net Framework.net Framework
.net Framework
Rishu Mehra
Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...
WMG centre High Value Manufacturing Catapult
Introduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORMIntroduction to Telerik OpenAccess ORM
Introduction to Telerik OpenAccess ORM
peterbahaa
Building machine learning service in your business Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business  Eric Chen (Uber) @PAPIs ...Building machine learning service in your business  Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business Eric Chen (Uber) @PAPIs ...
PAPIs.io
Virtualising Tier 1 Apps
Virtualising Tier 1 AppsVirtualising Tier 1 Apps
Virtualising Tier 1 Apps
Iwan Rahabok
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
Han Yan
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
Han Yan
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
Joel Falcou
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
Amazon Web Services

Scaling Up And Speeding Up Video Analytics Inside Database Engine

  • 1. Scaling-up and Speeding-up Video Analytics Inside Database EngineQiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang21 HP Labs, Palo Alto, California, USA2 HP Labs, Beijing, ChinaHewlett Packard Co.
  • 2. MotivationVideo has become an indispensable carrier of information For business perception, decision and actionExistent video analysis applications generally fail to scaleDatabase is treated a storage engine rather than a computation engineTransfer of massive amount of data is the bottleneckA unified platform is required by The demand for near real-time responses to enable Operational BIData-intensive transformation and analysis
  • 3. Our ApproachPush down video processing to database engine Faster data access, less data transferUser Defined Functions (UDFs) as wrapper of video analysis and search operations
  • 4. Problems with UDF (1)Lack of formal support of relational input and outputUnaware of relation schemaUnable to model complex applications, Unable to be composed with relational operators in a SQL queryTypically executed in the tuple-wise pipeline in query processingPerformance penalty for certain applicationsProhibits data-parallel computation inside the function body
  • 5. Problems with UDF (2)Dilemma between UDF execution efficiency and coding easinessUDF must use system internal data objects and system calls Encoding DBMS data into strings to pass to UDFs incurs significant overhead
  • 6. Our SolutionsSupporting Relation-Valued Functions (RVF) at SQL levelE.g. SVM classifier as RVFRelations as input and outputEasier application modelingHigher execution efficiencyMake possible exploring of parallelismRVF invocation pattern Mechanisms of applying RVFs input/outputHigh-level APIs are provideInvocation pattern-oriented RVF containersSupport RVF running in query processing69/1/2009
  • 9. Video Classification by SVMTables:Features [featureID, imageID, featureType, feature]Models [modelID, featureType, concept, model]Labels [imageID, concept, nearness]
  • 10. SVM by Scalar UDF the InefficiencyClassify using conventional scalar UDFSELECT imageID, concept, AVG (nearness) FROM (SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, f.feature, m.model) AS nearness FROM Features f, Models m WHERE f.featrureType = m.featrureType)GROUP BY imageID, concept;For each feature of each image, its nearness score to each concept is computedThe resulting nearness measures are aggregated by an average functionInefficiency of executionModel cannot be cachedModel is retrieved for each feature
  • 11. RVFs as Relational OperatorsA simple RVF definitionDEFINE RVF f (R1, R2, k) RETURN R3 {Relation R1 (/*schema*/); Relation R2 (/*schema*/);int k; Relation R3 (/*schema*/);PROCEDURE fn(/*dll name*/);RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK}RVF can be naturally composed with relational operators or sub-queriesSELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
  • 12. SVM by Relation Value FunctionSELECT imageID,concept,AVG(nearness)FROM (SELECT imageID, featureID, concept, nearness FROM classify1( SELECT * FROM Features, SELECT concept, model, featureTypeFROM Models))GROUP BY imageID, concept;
  • 13. RVF Invocation PatternsInvocation pattern Mechanism to deal with input/output of RVFGeneralization of the limited formsPurposesEnsuring that its interaction with the query executor is defined at a high levelMaking it possible to provide high-level APIsShielding UDF developers from DBMS system internal details
  • 14. Patterns DefinedBasic patternPer-tuple patternBlock patternComplex patternCartProdProbe (Cartesian product probe)
  • 15. CartProdProbe PatternSELECT r.imageID, r.concept, AVG(r.nearness)FROM (Features f CROSS APPLY classify2 (f.featureID, f.featureType, f.feature, SELECT concept, model, featureType FROM Models)) rGROUP BY r.imageID, r.concept;Features table is fed into RVF tuple by tuple; Models table fed in as a whole
  • 16. RVF ContainerAn extension of query executor for supporting RVF executionInvocation pattern-specificArgument evaluationReturn value wrappingMemory context switchingData conversionInitial data preparationCross-call data passingFinal cleanup
  • 17. Performance Gain in SVM Classification by Using RVFSVM query using RVF outperforms that using conventional scalar UDF
  • 18. Support In-RVF Data-Parallel -SVM LearningINSERT INTO ModelsSELECT modelID + 1, feature_type, concept_name, svm_learning ( SELECT feature, nearness FROM TrainFeatures f, TrainLables l WHERE l.imageID = f.imageID AND l.concept = concept_name AND f.featureType = feature_type)FROM Models WHERE modelID = (SELECT max(modelID) from Models);SVM learning speed up in multi-core RVF
  • 19. SummaryVideo analysis system inside a database engineLeverage UDF to push down video analyticsRVFs, a language level extensionImprove the capability of application modelingIncrease efficiency execution and cache usesMake it possible to explore computation parallelismRVF container and its associated APIsSeparate analytics logic from system administration and programming effortsPrototyped on the PostgreSQL