Eye in the Sky: Real-time Drone Surveillance System
(DSS) forViolent Individuals Identi?cation using
ScatterNet Hybrid Deep Learning Network
Amarjot Singh et al.

????? ???????

Data Science & Business Analytics ???
0. Summary
1. Feature Pyramid Network
2. SHDL networks - Human pose estimation
3. Support Vector Machine - Detect violent individuals
4. Aerial Violent Individual(AVI) dataset
5. Experiments
0. Summary
1. FPN?? human region ??
2. SHDL network? human region?? keypoint ??? regression
3. Key-point? ???? ?? ?? ??
1. FPN: Feature Pyramid Networks
^Eye in the sky ̄ ???? human region? ???? ??
?FPN ??? ??? ??? object detection task?? pyramid ??? ? ?? ?? ??
??? ??? ??, ??? ???? ??
??? FPN ??? Faster R-CNN ??? ???? ?? cost ???? ?? ??? ??
?GPU?? 6 FPS(COCO dataset)
?Pyramid ??? ???? ?? ?? ? ??
?Featurized image pyramid
?Single feature map
?Pyramidal feature hierarchy
?Feature Pyramid Network
1. FPN: Feature Pyramid Networks
Pyramid ?? (1) Featurized image pyramid
?hand craft feature? ???? ?? ?????
??? ??, scale? pyramid ??
???? pyramid? ? level???
????? feature ??
?? ?? ???? ?? Object detection ??
?????? ??, ?? ??
1. FPN: Feature Pyramid Networks
Pyramid ?? (2) Single feature map
?????? feature? ???? ?????
??? feature layer? ???? ???? ??
?CNN? ?? ??? ??, ??, ?? ???
invariant?? ??? ??? ????? ?? ??
???? ??? ??? ???? ???? ????
?? ?? ??? ??? ??? ? ??
1. FPN: Feature Pyramid Networks
Pyramid ?? (3) Pyramid feature hierarchy
?Multi-scale feature representation ??
??? ???? ?? ??? ? ?? ???? ?
?feature? ???? ? ??? ?? ???
????? object detection ??
?????? ^SSD: Single Shot MultiBox Detector ̄
1. FPN: Feature Pyramid Networks
Pyramid ?? (4) Feature Pyramid Network
?? ??? FPN ??
??? ???? Feature? ???? ??????
????? FPN? Upsampling ??? ??
?Feature map? Upsampling ?????
lateral connection?? ??
?? ???? ????? Object detection
?Multi-scale feature representation????
?? ? ? ????? ???? ??
(Spatial Info)
(Semantic Info)
1. FPN: Feature Pyramid Networks
FPN ?? ?? ??
?????? ResNet ??? ??
???? stride ??? receptive field ??
?Upsampling? ???? ?? nearest neighbor
upsampling ??(2? ??)
?feature map? 1x1 conv? ???? ??
element-wise addtion
?3x3 conv ???? ?? feaure map P ??
?P?? 1x1 conv 2?? ??? class, bbox ????: github.com/hwkim94
1. FPN: Feature Pyramid Networks
FPN Application ??
?RPN : ?? FPN ??? Predictor Head? ? level? ??
?? 5?? level?? Anchor ratio {1:2, 1:1, 2:1} 3?? ???
-> 15 anchors
?IoU threshold
?0.7 ?? : positive
?0.3 ?? : negative
?Predictor head? parameter? ?? level?? ??
?MS COCO 80 category detection ?????? pretrain
Class BBox
2. SHDL : ScatterNet Hybrid Deep Network 11
(1) ScatterNet : ??? ?? ??? [Dual-tree wavelet ScatterNet]? ??? ??
?CNN ???? Input
image? ???? ?
?? Conv block ??
(Coarse to fine)
?DT-CWT ??? ???
feature ???? 2?
layer? ??
?Hand crafted feature
? ? ????, CNN?
??? ?? ??? ??
??? ?? ??
2. SHDL : ScatterNet Hybrid Deep Network 12
(1) ScatterNet : ??? ?? ??? [Dual-tree wavelet ScatterNet]? ??? ??
? Input signal x? dual-tree complex wavelets? ???? feature ??
?j : scale ??. 2? scale ??
?r : rotaion ??. 15, 45, 75, 105, 135, 165? ? 6?? ??
??? ???? scale, rotation? ????, ?? wavelet transform ??
?L2 normalization?, Log transform, Smoothing? ???? ??
??? ???? ? ???? coefficients? concatenate ? vector
2. SHDL : ScatterNet Hybrid Deep Network 13
(2) Regression Network : CNN ?? ??
??? ScatterNet? output? ???? ?? CNN
???? ??
?Conv block 4? ?? : { Convolution, ReLU, Pooling, Normalization } 4 blocks
?Fully connected layer 2?(+Dropout) : 1024, 2048 hidden units
Dense Dense
2. SHDL : ScatterNet Hybrid Deep Network 14
(2) Regression Network : ?? ??
?Key-point 14?? ?? (x, y) ?? -> 28? ? regression
?Stochastic gradient descent
??? layer? output? prior? ???? PCANet ????? ??
?Tukey¨s Biweight loss function ?? - ???? ??
f(n) =
x(1 ? x2
?for|x| < c
0 for|x| > c
3. Violent individual classification 15
Key-point ?? SVM?? ???? 6? ??? ??(??5+??1)
?SHDL network?? ??? keypoint?
?? ???? SVM ??
?6? ??? ?? : 5?? ?? ??, ?? ??
??? ??
?Gaussian kernel
?C = 14
?gamma = 0.00002
?5-fold cross validation
4. Aerial Violent Individual(AVI) Dataset 16
?? task? ???? ?? ?? ???? ??
?2,000?? ???(?? 10? ??)
?? 10,863?? ?? ??
?48%? 5,124?? ??? ??
??? ?? 5??: Punching, Stabbing(???),?
Shooting, Kicking, Strangling(????, ??)
????? ??? 14?? key-point annotation
????? 2, 4, 6, 8?? ???? ??
???? ?? ??? ????, ??? ????
?? ???? ??? ? ?? ??? ??
5. Experiments 17
(1) FPN? ??? Human detect accuracy 97.2% ?? ??
?MS COCO ?????? pretrain ? ??? fine tuning
?AVI ????? ??? 10,863?? ?? ? 10,558? ?? detect ?? -> 97.2%
(2) SHDL ?? ??
?FPN? ?? ?? human region? 120 x 80 ???? resize ? normalize?? ??
?10,558?? region? ???? train:validation:test ??? 6:2:2? ??
5. Experiments 18
(2) SHDL Key-point regression ??
?Distance from GT : Ground Truth ??? ?? ??? ?? ???? ???? ??? ???
???? ??? ???? ???? 5?? ???? ???? ? ?? ??? ??
5. Experiments 19
(2) SHDL Key-point regression ??
?Distance from GT ?? d=5? ???? ?? accuracy
??? ? ?? ??? ?? ?? ???? ??
?CN : Coordinate network
?CNE : Coordinate extended network
SHDL CN CNE SpatialNet
87.6% 79.6% 80.1% 83.4%
5. Experiments 20
(3) Violent individuals identification with SVM
?AVI ????? ??
?? ??? ?? ??
Punching Kicking Strangling Shooting Stabbing
DSS 89% 94% 85% 82% 92
Surya 80% 84% 73% 73% 79%
Number of Violent individuals per image
1 2 3 4 5
DSS 94.1% 90.6% 88.3% 87.8% 84.0%
?????? ????
??? ??????
??? ??
3D human pose estimation in video with temporal
convolutions and semi-supervised training
Dario Pavllo et al.

????? ???????

Data Science & Business Analytics ???
1. Introduction
2. Temporal Dilated Convolutional model
3. Semi-supervised approach
4. Experiments
1. Introduction
Dilated Convolution? ??? 2d->3D mapping Semi-supervised ??
??? : 3D human pose estimation in video
?Problem formulation : Mapping
?2D keypoint detection -> 3D pose estimation
?2D?? 3D???? mapping? ? ???? ?? ???? RNN ??? ??
?Main contribution
?3D human pose estimation in video based on?
dilated temporal convolutions on 2D keypoint trajectories
?semi-supervised approach which exploits unlabeled video
2. Temporal dilated convolutional model
2D joint coordinates? Sequence? ?? 3D joint? ??? ??
2. Temporal dilated convolutional model
?? ??
?Input data : 243(frame) x 34(17 joints * 2dim(x,y))
?4 Residual blocks, 0.25 dropout rate, 243 frames, filter size 3, output feature 1024
?TCN layer notation
?ex) 2J, 3d1, 1024 => ?? ?? 2J, Conv filter size 3, Dilation 1, ???? 1024
?VALID convolution? ???? ??? Skip connection?? ??? ? ?? ???
-> Residual? ?? ???? Slice?? ??? ????.
2. Temporal dilated convolutional model
Normal convolution(Acausal) for train
???? ? ????? ???? ?? ??
2. Temporal dilated convolutional model
Causal convolution for test
?test ? ?? ?? ??? ??????? ????? ???? ??
2. Temporal dilated convolutional model
Padding with replica of the boundary frames
????? frame? ???? padding(?? ???? Acausal)
????? ? ?? ???? zero pdding? ?? ? loss? ? ??? ?
3. Semi-supervised approach
Supervised, Unsupervised loss ? ?? ???? ??? ???
?Batch? Labeled, Unlabeled ??? ??
?Supervised loss
?Ground truth 3d joint ??
?Unsupervised loss(+Regularizer)
?Autoencoder ??? ??
?encoder: 3D pose estimator
?3D joint? ?? projected back?
??? ? reconstruction loss? ??
?Bone length? L2 loss? ??
Reconstruction error
MPJPE(Mean Per-Joint Position Error)

: ???? joint ?? ????? ??? ??
3. Semi-supervised approach
Trajectory model
?Trajectory model? 2D pose? ?????
=> 3D trajectory? ???? ????
?? ??? ??? 2D -> 3D mapping? ???
trajectory ??? ??
?Unlabled data? back projection? ??
3D trajectory?? ???? reconstruct
?Back projection? ???? ?? ???
Reconstruction error
3. Semi-supervised approach
Loss function
?Supervised loss
?3D Ground truth? MPJPE ??
?Global trajectory loss
?Camera?? Ground-truth depth??
??? ?? ?? ???? ??
?Weighted Mean Per-Joint?
Position Error(WMPJPE) ??
E =
||f(x) ? y|| Reconstruction error
4. Experiments
(1) ??? ???? : Human3.6M, HumanEva-I
?Human 3.6M
?360?? video frame
?11 subjects(7?? 3D pose annotated)
?? subject ?? 15? action ??
??? ????
?3? subject, 3? action(Walk, Jog, Box)
?15 joint skeleton ???? ??
4. Experiments
(3) 2D pose estimation : Mask R-CNN & Cascaded pyramid network
?Backbone model
?Mask R-CNN with ResNet-101-FPN
?Cascaded Pyramid Network with ResNet-50
??? ??
?MS COCO ????? pre-train
?Human3.6M? fine-tune
4. Experiments
(4) Results - Qualitative
?Top : ??? 2D pose? ?? ?
?Bottom : 3D joint mapping
4. Experiments
(4) Results - Reconstruction error
??? Joint ??? Ground truth ??? ????? ??(MPJPE)
????? ? ?? ??? ??? ??,
? ?? ??? ?? [24]??? Ground truth? ??? ??
4. Experiments
(4) Results
Semi supervised ??? ?? ??? ?
? ? ?? ??? ??
[?? ??? ??]
Supervised ??? ?????
? ???? ?? 14.7mm ?? ?? ??

