Scaling-up and Speeding-up Video Analytics Inside Database EngineQiming Chen1, Meichun Hsu1, Rui Liu2, and WeihongWang21 HP Labs, Palo Alto, California, USA2 HP Labs, Beijing, ChinaHewlett Packard Co.
MotivationVideo has become an indispensable carrier of information For business perception, decision and actionExistent video analysis applications generally fail to scaleDatabase is treated a storage engine rather than a computation engineTransfer of massive amount of data is the bottleneckA unified platform is required by The demand for near real-time responses to enable Operational BIData-intensive transformation and analysis
Our ApproachPush down video processing to database engine Faster data access, less data transferUser Defined Functions (UDFs) as wrapper of video analysis and search operations
Problems with UDF (1)Lack of formal support of relational input and outputUnaware of relation schemaUnable to model complex applications, Unable to be composed with relational operators in a SQL queryTypically executed in the tuple-wise pipeline in query processingPerformance penalty for certain applicationsProhibits data-parallel computation inside the function body
Problems with UDF (2)Dilemma between UDF execution efficiency and coding easinessUDF must use system internal data objects and system calls Encoding DBMS data into strings to pass to UDFs incurs significant overhead
Our SolutionsSupporting Relation-Valued Functions (RVF) at SQL levelE.g. SVM classifier as RVFRelations as input and outputEasier application modelingHigher execution efficiencyMake possible exploring of parallelismRVF invocation pattern Mechanisms of applying RVFs input/outputHigh-level APIs are provideInvocation pattern-oriented RVF containersSupport RVF running in query processing69/1/2009
Video Pattern Recognition Process
Video Retrieval Process
Video Classification by SVMTables:Features [featureID, imageID, featureType, feature]Models [modelID, featureType, concept, model]Labels [imageID, concept, nearness]
SVM by Scalar UDF  the InefficiencyClassify using conventional scalar UDFSELECT imageID, concept, AVG (nearness) FROM	(SELECT imageID, featureID, concept, classify0 (f.featureType, m.concept, 	f.feature, m.model) AS nearness 	FROM Features f, Models m 	WHERE f.featrureType = m.featrureType)GROUP BY imageID, concept;For each feature of each image, its nearness score to each concept is computedThe resulting nearness measures are aggregated by an average functionInefficiency of executionModel cannot be cachedModel is retrieved for each feature
RVFs as Relational OperatorsA simple RVF definitionDEFINE RVF f (R1, R2, k) RETURN R3 {Relation R1 (/*schema*/); Relation R2 (/*schema*/);int k; Relation R3 (/*schema*/);PROCEDURE fn(/*dll name*/);RETURN MODE SET_MODE; INVOCATION PATTERN BLOCK}RVF can be naturally composed with relational operators or sub-queriesSELECT * FROM RVF1(RVF2(Q1, Q2), Q3);
SVM by Relation Value FunctionSELECT imageID,concept,AVG(nearness)FROM (SELECT imageID, featureID,	concept, nearness FROM classify1(		SELECT * FROM Features,		SELECT concept, model, featureTypeFROM Models))GROUP BY imageID, concept;
RVF Invocation PatternsInvocation pattern Mechanism to deal with input/output of RVFGeneralization of the limited formsPurposesEnsuring that its interaction with the query executor is defined at a high levelMaking it possible to provide high-level APIsShielding UDF developers from DBMS system internal details
Patterns DefinedBasic patternPer-tuple patternBlock patternComplex patternCartProdProbe (Cartesian product probe)
CartProdProbe PatternSELECT r.imageID, r.concept, AVG(r.nearness)FROM 	(Features f CROSS APPLY classify2 (f.featureID, f.featureType, f.feature,		SELECT concept, model, featureType FROM Models)) rGROUP BY r.imageID, r.concept;Features table is fed into RVF tuple by tuple; Models table fed in as a whole
RVF ContainerAn extension of query executor for supporting RVF executionInvocation pattern-specificArgument evaluationReturn value wrappingMemory context switchingData conversionInitial data preparationCross-call data passingFinal cleanup
Performance Gain in SVM Classification by Using RVFSVM query using RVF outperforms that using conventional scalar UDF
Support In-RVF Data-Parallel -SVM LearningINSERT INTO ModelsSELECT modelID + 1, feature_type, concept_name, svm_learning (	SELECT feature, nearness FROM TrainFeatures f, TrainLables l WHERE l.imageID = f.imageID AND l.concept = concept_name AND 	f.featureType = feature_type)FROM Models WHERE modelID = (SELECT max(modelID) from Models);SVM learning speed up in multi-core RVF
SummaryVideo analysis system inside a database engineLeverage UDF to push down video analyticsRVFs, a language level extensionImprove the capability of application modelingIncrease efficiency execution and cache usesMake it possible to explore computation parallelismRVF container and its associated APIsSeparate analytics logic from system administration and programming effortsPrototyped on the PostgreSQL

