10. 动画认识の流れ – Sparse, Dense and Deep
1) Laptev, I. and Lindeberg, T. “Space-Time Interest Points,” International Conference on Computer Vision (ICCV), pp.432–439, 2003.
2) Laptev, I., Marszalek, M., Schmid, C. and Rozenfeld, B. “Learning realistic human actions from movies,” IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp.1–8, 2008.
3) Klaser, A., Marszalek, M., and Schmid, C. “A Spatio-Temporal Descriptor Based on 3D-Gradients,” British Machine Vision Conference
(BMVC), 2008.
4) Wang, H., Klaser, A., Schmid, C. and Liu, C.-L. “Action recognition by dense trajectories,” IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp.3169–3176, 2011.
5) Wang, H. and Schmid, C. “Action Recognition with Improved Trajectories,” International Conference on Computer Vision (ICCV), pp.3551–
3558, 2013.
6) Simonyan, K. and Zisserman, A. “Two-Stream Convolutional Networks for Action Recognition in Videos,” Neural Information Processing
Systems (NIPS), 2014.
7) Wang, L., Qiao, Y. and Tang, X. “Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors,” IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2015.
8) D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks“, ICCV 2015.
9) Wang, L., Xiong, Y., Wang, Z. Qiao, Y., Lin, D., Tang, X. and Gool, L. C. “Temporal Segment Networks: Towards Good Practices for Deep
Action Recognition,“ in ECCV 2016.
10) J. Carreira, A. Zisserman, “Quo Vadis, Action Recognition?”, in CVPR 2017.
11) K. Hara, H. Kataoka, Y. Satoh, “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?”, in CVPR 2018.
Sparse Space-Time feature Dense Space-Time feature Deeply-Learned Representation
34. Two-stream ConvNetsの基本的な情報
? 考案者
– Karen Simonyan (発表当時Oxford所属、現Deep Mind)
– NIPS2014
? ?法
– RGBのみでなく、時間情報を画像に投影したフロー画像に対してCNN
Simonyan, K. and Zisserman, A. “Two-Stream Convolutional Networks for Action Recognition in Videos,” Neural Information
Processing Systems (NIPS), 2014.
40. IDTとTwo-stream ConvNetsの統合: TDD
? TDD(Trajectory-pooled Deep-convolutional Descriptors)
– 動線抽出まではIDTと同様
– TDD:畳み込みマップから値を抽出
Feature extraction
(HOG, HOF, MBH, Traj.)
Fisher Vectors (FVs)
IDT
x x x
TDD
x
x x
Feature extraction
(spa4, spa5, tem3, tem4)
Fisher Vectors (FVs)
xxxxx x x x xx x xxx x xxxxx x x x xx x xxx x
48. What Actions are Needed? (ICCV 2017)
?物?動認識のためにはどんな?動が必要?
– アノテーション/アルゴリズム構築等への提?
– マルチラベル,より詳細説明かつ物体/?体関節情報が重
要と結論
DB作成や?法構築の?策を決定づける実験
49. What makes a video a video (CVPR 2018)
動画認識は動きを捉えていないのでは?
– 動画から重要フレームを選択/?成して認識
– 動きを学習しているのではなく,実は??から識別しや
すいフレームを選択していると結論
効果的な動き特徴は実は未だ学習できていない?
54. 提案?法の問題設定
? 2つの?動間に遷移?動 (TA; Transitional Action)を挿?
– 予測のためのヒントがTAに含有: 早期?動認識より時間的に早く認識
– TAの認識が即ち次?動の予測: ?動予測より安定した予測
Δt
【Proposal】
Short-term action prediction
recognize “cross” at time t5
【Previous works】
Early action recognition
recognize “cross” at time t9
Walk straight
(Action)
Cross
(Action)
Walk straight – Cross
(Transitional action)
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12
55. 提案?法の問題設定
? 2つの?動間に遷移?動 (TA; Transitional Action)を挿?
– 予測のためのヒントがTAに含有: 早期?動認識より時間的に早く認識
– TAの認識が即ち次?動の予測: ?動予測より安定した予測
手法 設定
行動認識
早期行動認識
行動予測
遷移行動認識
f (F1...t
A
) → At
f (F1...t?L
A
) → At
f (F1...t
A
) → At+L
f (F1...t
TA
) → At+L