ݺߣ

ݺߣShare a Scribd company logo
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
|?ѧ?ѧԺ?ѧϵо
gUӑѧ
βо
?Ұ
I
2
? Փ?Realtime Multi-Person 2D Pose Estimation using Part Affinity
Fields
C https://arxiv.org/abs/1611.08050
? ߣZhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh
C The Robotics Institute, Carnegie Mellon University
? _?24 Nov 2016
? CVPR 2017 Oral
? ݺߣ
? Video
?  ؤ˶Ϥ꤬oϤ, ӛՓ?, ݺߣ, Video?
Ǥ뤳
? βλꥢ륿ƶǤ.
? ʸФ.
3
Abstract
? Ф}?2Dݩ`ʵĤ˗ʳ?᰸Ƥ.
? ?؏
C Part Affinity Fields(PAFs)?岿λȂ?vBѧƤ.
C ܥȥॢå׵ĥץ`ȫФ?}򥨥󥳩`ɤ, ?ˤ餺, ??
?ȤS.
C Sequential Prediction with Learned Spatial ContextCNNy?˥åȤ
R귵.
C Jointly Learning Parts detection and Parts AssociationλΈȤvB
ͬѧ
? Y
C COCO2016keypoints challenge1λ, MPII Multi-Person benchmarkˤƄ
?ȤȤsotaϻؤä.
4
Introduction
? ȴоǤ, }?νj״rǤ΂??岿λ֪y}Ȥ֪Ƥ
.
C ?, `뤬
C ?ͬ?Υ󥿥饯
C ??Ӌ
? top-downĥץ`?Η֪?, ˂?˄ƶ?.
C ?  ڤ?Η֪ʧȡ˄ƶǤʤ
C ?  ??Ӌ
? bottom-upĥץ` Փ?Ϥä.
C ? ? ӛ٤ˌ, Х.
C ? ? Ӌ֤
C ? ȴ?Ǥ, 岿λ?Υ`ХʥƥֱӵĤˤʹƤ餺,
λΤĤʤϤ碌ƶӋ, ?.
?  Փ?Ǥ, .
? Փ?Ǥ, bottom-upץ`, }?Υݩ`ƶSoTAξȤ_.
C Part Affinity Fields(PAFs)ˤäƲλͬ?vBȤF岿λΈ, 򤭤򥨥󥳩`ɤ2
Ԫ٥ȥ.
C λ֪vBȤΥܥȥॢå׵ĤʱFͬrƶ뤳Ȥ, `ХʥƥȤ?
֥󥳩`ɤǤ. ˤ, ???٤gF.
5
Method
? Feed forward network ˤ, 岿λλä confidence map S (b)
, ٥DzgvBȤ򥨥󥳩`ɤ affinity fields L (c).
C S = (S1, S2, , SJ)  Jconfidence maps֤. Jβλˤ줾쌝.
? Sj  Rwh, j  {1J}
C L = (L1, L2, , LC)  CΥ٥֤, 줾λڥˌ.
? Lc  R wh2, c  {1C}
? confidence mapaffinity fieldsԪ, Bipartite Matching(2gvS
Ӌ)?(d)
6
<???죾
?? ?
Method > Simultaneous Detection and Association
? λ֪ confidence map ȲλvS󥳩`ɤ affinity fields ͬr
y.
? 2ĤΥ֥; [] confidence mapy, [] affinity fieldsy.
? ϤޤConv(VGG-1910Ӥdzڻ)DŽI, ؏եޥåFȤ
, Stage1??ˤʤ.
? Stage1, ֪ confidence map S1 = 1(F), part affinity fieldsΥå L1
= 1(F)?ɤ. (, Ϥ줾stageˤCNNs)
? ǰStageγ?2ĤԪ؏եޥåFĤʤϤ蘆ƴΤ??ˤʤ.
7
Method > Simultaneous Detection and Association
? StageU뤴Ȥ, confidence maps()ξϤäƤƤ뤳Ȥ
狼.
? λconfidence map1st branch, PAFs2nd branchR귵yƤ. 
֥˥v(L2`)?.
? Stage tˤ`v
? S*, L*: confidence map, affinity field.
? W(p): binary maskLjp˥ΥƩ`󤬤ʤ0. yؤΥڥʥƥ.
C ȫ?v
8
Method > Confidence Maps for Part Detection
? ʽ5ǤuΤ, map S*򥢥ΥƩ`󤵤줿2D`ݥ
Ȥ?ɤ.
? }?, λ?Ȥ, λj, ?kˌꤹconfidence map
Υԩ`ڤ٤.
? ޤ, confidence maps S*j,k (₎)?k?ɤ.
C xj,kR2λj?kλФλäȤ, S*jkˤpR2¤Τ
˶x.
? : ԩ`ΎڤߺϤ{
C ͥåȥ`yʽ˺Ϥ碌, ȫƤ?vmap򤢤碌, 
λvconfidence map¤Τ褦ˤ. (maxڥ`)
9
Method > Confidence Maps for Part Detection
? Confidence mapsƽȡΤǤϟo?ȡ褦ˤ뤳
ǡ˽_ΤޤޤǤ롣
? ƥȕrϡpredictconfidence maps?ơnon-maximum
suppression?Ȥǡ岿λyλää롣
C a?Non-maximum suppression铂ϤθϤǷֲ֤äƤϡ
?confidence ֤ķֲФ.
10
Method > Part Affinity Fields for Part Association
? ??v, ֪줿λͬ?򤤤ˤĤʤ뤫. (a)
C gȡ? }?äꤹȤޤĤʤϤ碌ʤ(b)
? λäΤߤΥ󥳩`ɤʤʤɱF?޽.
?  Part affinity fieldsλ, 򺬤F(c)
C limb(λΤĤʤϤ碌)ˌƸ2D٥ȥ
C limbvB벿λĤʤaffinity field֤.
11
Method > Part Affinity Fields for Part Association
? ٥ȥ낎ΛQ?
C xj1,k , xj2,k: ?klimb c ˤ벿λj1,j2₎Ȥ.
C Limb cϤpoint p_v, L*c,k(p)j1j2ؤ΅gλ٥ȥˤʤ. Limb c
point pˤƤȫzero٥ȥ. ʽ.
? v = (xj2,k-xj1,k) / || xj2,k-xj1,k ||2 limb򤭤΅gλ٥ȥ룩
C Limb cϤɤж¤ʽǶx.
? lϥԥx, lc,kϲλgΥ`åɾx.
12
Method > Part Affinity Fields for Part Association
? ͥåȥ`yΤ˺Ϥ碌ȫƤ?vfields򤢤碌
limbȤfield?. (average)
C nc(p)point p ˤk peopleФǥʤ٥ȥ(limbؤʤä
Ϥƽȡꤿ)
? ƥȕr, ꤹPAFξe֤aλλäY־֤ؤäӋ
㤹뤳Ȥ, aλgvBȤy. (= ʳ줿λY֤Ȥ
alimb, y줿PAFg?¶Ȥy)
C Ĥˤ, Ĥκaλdj1dj2ˌ, y줿PAF Lcλgξ֤ؤ
ƥץ󥰤, λvBȤconfidenceӋy.
? p(u)Ĥβλdj1dj2Ĥʤλ
C g?Ϥ, ?gu΂ӋƷe֤ƤƤ.
13
Method > Multi-Person Parsing using PAFs
? λv}֪꣨}? or Ԥ椨, 
ֿܤlimbΥѥ`󤬶य.
? AFϤξeӋˤalimb˥Ĥ. mʽMߺϤ碌
?Ĥ놖}, NP-Hard}ǤKԪΥޥå󥰆}ˌꤹ.
C  greedy ͷˤäƌIǤ.
C ɤ, PAFͥåȥ`Ұ?, pair-wisevBȥa
˥`Х륳ƥȤ򥨥󥳩`ɤƤ뤿ȿ. ()
? ޤ, }?κaλμϤä.
C DJ = {dj
m : for j  {1J}, m {1Nj} }
? Nj: λjκa
? dj
m  R2: ϲλjm?Η֪a.
? λa dj1
mdj2
nĤʤäƤ뤫? z j1j2
mn {0,1}x.
C `, ܤʿΤmʽMߺϤ碌?Ĥ뤳.
C ZzȫMߺϤ碌v뼯.
14
Method > Multi-Person Parsing using PAFs
? c?limbˤĤʤj1j2Υڥ򿼤.
C ʽ10᤿cˤؤߤ?ˤʤ褦?gޥå󥰤?.
C Eclimb type cΥޥå󥰤ȫؤߤ, Zclimb type cZΥ֥åȤ,
Emnϲλdj1
mdj2
ngpart affinity (ʽ10Ƕx)
C ʽ13,14, ĤΥåΩ`ɤ򥷥Τ. (ͬtypeΣĤlimb
λιФ.) Hungarian algorithmm?Ĥ.
15
Method > Multi-Person Parsing using PAFs
? }?ȫ?pose?Ĥ뤳ȿ.
? ZQΤKԪޥå󥰆}ˤʤ. Ά}NP Hard, य
ͷ.
? Փ?Ǥ, ؤΥɥᥤmĤξͷm˼Ӥ.
C (1) ȫդǤϤʤ, ?Υåǥĥ`ä (c)
C (2) ޥå󥰆}bipartiteޥå󥰤Υֆ}μϤ˷ֽ⤷, OϤtree
nodeФǶ?˥ޥå󥰤Q (d)
? Section 3.1?^YƤ, minimal greedy inference`
ХʽӋͤ, 褯ƤƤ뤳Ȥ?Ƥ.
16
Method > Multi-Person Parsing using PAFs
? Ĥξͷ, m¤Τ褦˥ץ˷ֽ⤵.
? 椨, limb typeˌ, ʽ12-14(limb cˌjoint?Ĥ
)ʹä, ?, limbοaä.
? ȫƤlimba֤ä, ͬaλ򥷥뿎Mߤ
, }?ȫ?pose.
17
Results
? }?pose estimation2ĤΥ٥ީ`
C (1) MPII human multi-person dataset 25k images, 40k ppl, 410 human
activities
C (2) the COCO 2016 keypoints challenge dataset
? ʌg״rλ򺬤ǩ`å
? 줾SotA.
? Ӌㄿʤv뿼Ӥ(Fig10. )
18
Results > Results on the MPII Multi-Person Dataset
? 铂PCKh, ȫ岿λvmean Average Precision(mAP)ָˤ?^.
? inference/optmization time per image?^
? ƥȥåȤˤY
C mAP: ؤ?ǏSotA8.5%ϻؤ루ϣ
? Scale searchoȤȴ?. MAPIIǩ`ȫǤ, 13%ʤä. scale search
ʤ.
?  ȴ??٤PAFsλgvSԤFΤЄʤȤ狼.
C inference time: 6礯ʤäԔ3.3ǣ
19
Results > Results on the MPII Multi-Person Dataset
? a?
C mean Average Precision(mAP)
? PrecisionƥबжΤΤgHäΤθ. _
C λжΤΤäΤθ
? Recallǩ`åȤȫΤƥबжΤθϣЩ`ʣ
C ǩ`åȤǥΥƩ`ȤƤ벿λΤ֪줿Τθ
? Average Precision(AP: ƽm)PrecisionRecallˤĤƽȤä.
C ¤ʽǽƤƤζत. I: ʤ1, ؓʤ0v
20
Results > Results on the MPII Multi-Person Dataset
? a?
C mean Average Precision(mAP)ؤΈ
? mAPȫƤ?βλˌƽprecision
C ޤ}дäƤ뻭ˌpose estimationg?
C ?PCKh铂ˤȤŤơestimate줿ݥȤground truth(GT)˸ϤƤƤ
C GT˸굱ƤʤäyݥȤϡfalse positiveȤƒQ
C λȤAverage PrecisionAPӋ㡣
C ȫβλvAPƽȡäơmAPˤʤ롣
C PCKh threshhold
? PCPѩ`Ĥ΁IˤβλΗʳλäΥѩ`Ĥ?ΰ֤˽
ʳɹȤ.
? PCK?bounding box铂Ȥƶx
? PCKhHeadȤ50%?铂Ȥƶx
21
Results > Results on the MPII Multi-Person Dataset
? ʤ륹ȥ?^Y
C (6b) ȫƤΥѥ`Ĥʤå, (6c) ?ޤΥĥ`å˥ץ
ߥ󥰤ǤȤ, (6d) ?ޤΥĥ`å؝르ꥺˤäƤȤ

C  ?ޤΥåǡܵĤˤ?֤ʤȤ?Ƥ.
C (6d)ΥդʤäƤ.ȥ`˥󥰤΅Ϥ뤫פˤʤ뤿
ȿ.(13 edges vs 91 edges)
22
Results > Results on the MPII Multi-Person Dataset
? PCKh-0.5ΤΈ, PAFʹ?Y, g??Y
ꃞƤ. (one-midpoint2.9%?, two-midpoint2.3%?)
? ٥ĤƤʤ?˥ޥʹ?Ӗ, True Positivey
루ʣ, 2.3%ȸƤǤ.
? λdetectionGround Truth?, PAFsy?, mAP 88.3%. 
ΈDetectionˤ`oΤ, PCKh铂ˤ餺?.
? λconnectionGround Truth?, detect?`Ǥ, mAP
81.6%.
C  PAFˤconnectionж, Ground Truthʹ?ǤۤȤɾȤʤ. (79.4%
vs 81.6%) . PAF, ??ȤǗʳǤ뤳?Ƥ.
? (b) stageȤmAP?^, ϤäƤ뤳Ȥ狼.
23
Results > Results on the COCO Keypoints Challenge
? 10?Ϥ?, 100Ϥkeypoints.
? COCOǤuǤ, object keypoint similarity (OKS) x, 10
OKS铂Ǥmean average precision (AP) ʹ?.
C OKS, JRˤIoUͬ۸.
C ?Υ`, GTλäyβ, ΁I??Ӌ㤵.
? 3: top teamȤ?^.
? ?^?`?(APM)춤Ƥ, top-downץ`?
˾ؓƤ.
C : ?Ǥ, Ϥ뤫?`, ڤȫƤ???˒Q
ʤȤʤ. ? top-downץ`Ǥ, ?֪Ƥ?ȤФȡä
?ƒQΤ, ?`ˤӰ푤?^Ăʤ.
24
a?: IoU
? ?: http://www.pyimagesearch.com/2016/11/07/intersection-
over-union-iou-for-object-detection/
25
Results > Results on the COCO Keypoints Challenge
? xk줿^åȤΥ֥åȤǤ?^.
? GT bounding box  single person CPMʹ?, CPM?top-downץ`
upper-bond(62.7%)˵_Ǥ.
? SSD(Single shot multibox detecter)?, ѥե`ޥ󥹤10%¤.
? ?Υܥȥॢå?ʽǤ, 58.4%AP_.
? ?`󥰤줿I, ???CPMm?뤳Ȥ, 2.6%AP
Ƥ. (I?precisionrecallθƤ˿ΤΤߥåץǩ`)
C  ?ʥ`Ǥsearch, ܥȥॢå?Υѥե`ޥ󥹤Ϥ뤳Ȥڴ
.
26
Results > Runtime Analysis
? Top-downץ``, Bottom-upץ`Ǥ, Runtime?
??ƤۤȤɉʤȤ狼.
C CNN΄I?ˌO(1), multi-person parsingO(n2)⤽Ӌ
``?Τ, ӋBottom-upץ`, ?ˌƤۤȤɉ仯
ʤRuntime֤.
27
Discussion
? ڤ?ˤ, }?2Ԫ˄?.
? ؤ?
C 1. λλä, ?΁I?ŻF
C 2. λʳvSԤͬѧ֥`ƥ
C 3. greedy르ꥺˤ?ƤӋ֤, ?.
? Ĥʧ
28
gװ
? C++https://github.com/CMU-Perceptual-Computing-Lab/openpose
? Caffehttps://github.com/ZheC/Realtime_Multi-
Person_Pose_Estimation
? PyTorchhttps://github.com/tensorboy/pytorch_Realtime_Multi-
Person_Pose_Estimation (trainδgװ)
29

More Related Content

What's hot (20)

ǥٻѡ
ǥٻѡǥٻѡ
ǥٻѡ
Yusuke Uchida
?
semantic segmentation `٥
semantic segmentation `٥semantic segmentation `٥
semantic segmentation `٥
yohei okawa
?
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII
?
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionDActive Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Yosuke Shinya
?
gװ٥ѧֳղϳմ
gװ٥ѧֳղϳմgװ٥ѧֳղϳմ
gװ٥ѧֳղϳմ
Ѥ󤤤 ߤ
?
᥿`٥ʽɥ֥ʦѧϰ
᥿`٥ʽɥ֥ʦѧϰ᥿`٥ʽɥ֥ʦѧϰ
᥿`٥ʽɥ֥ʦѧϰ
cvpaper. challenge
?
DL݆i᡿The Forward-Forward Algorithm: Some Preliminary
DL݆i᡿The Forward-Forward Algorithm: Some PreliminaryDL݆i᡿The Forward-Forward Algorithm: Some Preliminary
DL݆i᡿The Forward-Forward Algorithm: Some Preliminary
Deep Learning JP
?
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
Deep Learning JP
?
zߥ˥`ͥåȥ`о
zߥ˥`ͥåȥ`оzߥ˥`ͥåȥ`о
zߥ˥`ͥåȥ`о
Yusuke Uchida
?
Swin Transformer (ICCV'21 Best Paper) 赤⤹Y
Swin Transformer (ICCV'21 Best Paper) 赤⤹YSwin Transformer (ICCV'21 Best Paper) 赤⤹Y
Swin Transformer (ICCV'21 Best Paper) 赤⤹Y
Yusuke Uchida
?
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Kento Doi
?
?륢`ƥQ㤫Deep Neural Networkٻ
?륢`ƥQ㤫Deep Neural Networkٻ?륢`ƥQ㤫Deep Neural Networkٻ
?륢`ƥQ㤫Deep Neural Networkٻ
Yusuke Uchida
?
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
Deep Learning JP
?
You Only Look One-level FeatureνhҊʳΤԒ
You Only Look One-level FeatureνhҊʳΤԒYou Only Look One-level FeatureνhҊʳΤԒ
You Only Look One-level FeatureνhҊʳΤԒ
Yusuke Uchida
?
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII
?
[DL݆i]MetaFormer is Actually What You Need for Vision
[DL݆i]MetaFormer is Actually What You Need for Vision[DL݆i]MetaFormer is Actually What You Need for Vision
[DL݆i]MetaFormer is Actually What You Need for Vision
Deep Learning JP
?
?ɥǥ ᥿`٥
?ɥǥ ᥿`٥?ɥǥ ᥿`٥
?ɥǥ ᥿`٥
cvpaper. challenge
?
zߥ˥`ͥåȥ`θ߾Ȼȸٻ
zߥ˥`ͥåȥ`θ߾Ȼȸٻzߥ˥`ͥåȥ`θ߾Ȼȸٻ
zߥ˥`ͥåȥ`θ߾Ȼȸٻ
Yusuke Uchida
?
ڶٳiեɥᥤܞƤȲ֤v륵`٥
ڶٳiեɥᥤܞƤȲ֤v륵`٥ڶٳiեɥᥤܞƤȲ֤v륵`٥
ڶٳiեɥᥤܞƤȲ֤v륵`٥
Deep Learning JP
?
DL݆i᡿Perceiver io a general architecture for structured inputs &amp; outputs
DL݆i᡿Perceiver io  a general architecture for structured inputs &amp; outputs DL݆i᡿Perceiver io  a general architecture for structured inputs &amp; outputs
DL݆i᡿Perceiver io a general architecture for structured inputs &amp; outputs
Deep Learning JP
?
semantic segmentation `٥
semantic segmentation `٥semantic segmentation `٥
semantic segmentation `٥
yohei okawa
?
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII2021 [SS1] Transformer x Computer Vision gÿԤչ ? TransformerCompute...
SSII
?
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionDActive Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Active Convolution, Deformable Convolution D״?`ѧܤConvolutionD
Yosuke Shinya
?
gװ٥ѧֳղϳմ
gװ٥ѧֳղϳմgװ٥ѧֳղϳմ
gװ٥ѧֳղϳմ
Ѥ󤤤 ߤ
?
DL݆i᡿The Forward-Forward Algorithm: Some Preliminary
DL݆i᡿The Forward-Forward Algorithm: Some PreliminaryDL݆i᡿The Forward-Forward Algorithm: Some Preliminary
DL݆i᡿The Forward-Forward Algorithm: Some Preliminary
Deep Learning JP
?
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
[DL݆i]Learning Transferable Visual Models From Natural Language Supervision
Deep Learning JP
?
Swin Transformer (ICCV'21 Best Paper) 赤⤹Y
Swin Transformer (ICCV'21 Best Paper) 赤⤹YSwin Transformer (ICCV'21 Best Paper) 赤⤹Y
Swin Transformer (ICCV'21 Best Paper) 赤⤹Y
Yusuke Uchida
?
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[h饤] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Kento Doi
?
?륢`ƥQ㤫Deep Neural Networkٻ
?륢`ƥQ㤫Deep Neural Networkٻ?륢`ƥQ㤫Deep Neural Networkٻ
?륢`ƥQ㤫Deep Neural Networkٻ
Yusuke Uchida
?
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
DL݆i᡿ZǤTransformerΤޤȤ (ViT, Perceiver, Frozen Pretrained Transformer etc)
Deep Learning JP
?
You Only Look One-level FeatureνhҊʳΤԒ
You Only Look One-level FeatureνhҊʳΤԒYou Only Look One-level FeatureνhҊʳΤԒ
You Only Look One-level FeatureνhҊʳΤԒ
Yusuke Uchida
?
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII2022 [SS1] ˥`3DF„? ˥`ͥåȤǤʤǤ룿 ??
SSII
?
[DL݆i]MetaFormer is Actually What You Need for Vision
[DL݆i]MetaFormer is Actually What You Need for Vision[DL݆i]MetaFormer is Actually What You Need for Vision
[DL݆i]MetaFormer is Actually What You Need for Vision
Deep Learning JP
?
zߥ˥`ͥåȥ`θ߾Ȼȸٻ
zߥ˥`ͥåȥ`θ߾Ȼȸٻzߥ˥`ͥåȥ`θ߾Ȼȸٻ
zߥ˥`ͥåȥ`θ߾Ȼȸٻ
Yusuke Uchida
?
DL݆i᡿Perceiver io a general architecture for structured inputs &amp; outputs
DL݆i᡿Perceiver io  a general architecture for structured inputs &amp; outputs DL݆i᡿Perceiver io  a general architecture for structured inputs &amp; outputs
DL݆i᡿Perceiver io a general architecture for structured inputs &amp; outputs
Deep Learning JP
?

Similar to [DL݆i] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (20)

ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
Shunsuke Ono
?
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Satoshi Kato
?
Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Learning Spatial Common Sense with Geometry-Aware Recurrent NetworksLearning Spatial Common Sense with Geometry-Aware Recurrent Networks
Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Kento Doi
?
28th CV㏊@v| #3
28th CV㏊@v| #328th CV㏊@v| #3
28th CV㏊@v| #3
Hiroki Mizuno
?
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
cvpaper. challenge
?
ܽ飩ѧϰˤ붯дХȤʥζƶ
ܽ飩ѧϰˤ붯дХȤʥζƶܽ飩ѧϰˤ붯дХȤʥζƶ
ܽ飩ѧϰˤ붯дХȤʥζƶ
Morpho, Inc.
?
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
Yoichi Shirasawa
?
2018 07 02_dense_pose
2018 07 02_dense_pose2018 07 02_dense_pose
2018 07 02_dense_pose
harmonylab
?
èǤ֤Variational AutoEncoder
èǤ֤Variational AutoEncoderèǤ֤Variational AutoEncoder
èǤ֤Variational AutoEncoder
Sho Tatsuno
?
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
kanejaki
?
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
MasanoriSuganuma
?
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
Deep Learning JP
?
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
MasanoriSuganuma
?
[DL݆i]Meta-Learning Probabilistic Inference for Prediction
[DL݆i]Meta-Learning Probabilistic Inference for Prediction[DL݆i]Meta-Learning Probabilistic Inference for Prediction
[DL݆i]Meta-Learning Probabilistic Inference for Prediction
Deep Learning JP
?
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
Koji Matsuda
?
Neural scene representation and rendering νh33D㏊@v|
Neural scene representation and rendering νh33D㏊@v|Neural scene representation and rendering νh33D㏊@v|
Neural scene representation and rendering νh33D㏊@v|
Masaya Kaneko
?
[DL݆i]EfficientDet: Scalable and Efficient Object Detection
[DL݆i]EfficientDet: Scalable and Efficient Object Detection[DL݆i]EfficientDet: Scalable and Efficient Object Detection
[DL݆i]EfficientDet: Scalable and Efficient Object Detection
Deep Learning JP
?
[DL݆i]Dense CaptioningҰΤޤȤ
[DL݆i]Dense CaptioningҰΤޤȤ[DL݆i]Dense CaptioningҰΤޤȤ
[DL݆i]Dense CaptioningҰΤޤȤ
Deep Learning JP
?
MIRU2018 tutorial
MIRU2018 tutorialMIRU2018 tutorial
MIRU2018 tutorial
Takayoshi Yamashita
?
ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
ĤΥȥåץեՓi߻ / Realtime Multi-Person 2D Pose Estimation using Part Affin...
Shunsuke Ono
?
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Satoshi Kato
?
Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Learning Spatial Common Sense with Geometry-Aware Recurrent NetworksLearning Spatial Common Sense with Geometry-Aware Recurrent Networks
Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Kento Doi
?
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
줫Υԥ`ӥg - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMUо 2...
cvpaper. challenge
?
ܽ飩ѧϰˤ붯дХȤʥζƶ
ܽ飩ѧϰˤ붯дХȤʥζƶܽ飩ѧϰˤ붯дХȤʥζƶ
ܽ飩ѧϰˤ붯дХȤʥζƶ
Morpho, Inc.
?
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
201101098ذǿᣨߩ`󥷥եȤԭȏã6?7£󾱰⣩
Yoichi Shirasawa
?
2018 07 02_dense_pose
2018 07 02_dense_pose2018 07 02_dense_pose
2018 07 02_dense_pose
harmonylab
?
èǤ֤Variational AutoEncoder
èǤ֤Variational AutoEncoderèǤ֤Variational AutoEncoder
èǤ֤Variational AutoEncoder
Sho Tatsuno
?
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
18إԥ`ӥǿvռ꡹kϣ첹ԱᲹ쾱
kanejaki
?
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
MasanoriSuganuma
?
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
[DL݆i]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...
Deep Learning JP
?
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...When NAS Meets Robustness:In Search of Robust Architectures againstAdversar...
When NAS Meets Robustness: In Search of Robust Architectures against Adversar...
MasanoriSuganuma
?
[DL݆i]Meta-Learning Probabilistic Inference for Prediction
[DL݆i]Meta-Learning Probabilistic Inference for Prediction[DL݆i]Meta-Learning Probabilistic Inference for Prediction
[DL݆i]Meta-Learning Probabilistic Inference for Prediction
Deep Learning JP
?
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
Koji Matsuda
?
Neural scene representation and rendering νh33D㏊@v|
Neural scene representation and rendering νh33D㏊@v|Neural scene representation and rendering νh33D㏊@v|
Neural scene representation and rendering νh33D㏊@v|
Masaya Kaneko
?
[DL݆i]EfficientDet: Scalable and Efficient Object Detection
[DL݆i]EfficientDet: Scalable and Efficient Object Detection[DL݆i]EfficientDet: Scalable and Efficient Object Detection
[DL݆i]EfficientDet: Scalable and Efficient Object Detection
Deep Learning JP
?
[DL݆i]Dense CaptioningҰΤޤȤ
[DL݆i]Dense CaptioningҰΤޤȤ[DL݆i]Dense CaptioningҰΤޤȤ
[DL݆i]Dense CaptioningҰΤޤȤ
Deep Learning JP
?

More from Deep Learning JP (20)

DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
?
ٳi᡿ǰѧϰåǩ`åȤˤĤ
ٳi᡿ǰѧϰåǩ`åȤˤĤٳi᡿ǰѧϰåǩ`åȤˤĤ
ٳi᡿ǰѧϰåǩ`åȤˤĤ
Deep Learning JP
?
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
?
DL݆i᡿Zero-Shot Dual-Lens Super-Resolution
DL݆i᡿Zero-Shot Dual-Lens Super-ResolutionDL݆i᡿Zero-Shot Dual-Lens Super-Resolution
DL݆i᡿Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
?
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxivDL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
?
DL݆i᡿ޥ` LLM
DL݆i᡿ޥ` LLMDL݆i᡿ޥ` LLM
DL݆i᡿ޥ` LLM
Deep Learning JP
?
DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
  DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...  DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
?
DL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
DL݆i᡿AnyLoc: Towards Universal Visual Place RecognitionDL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
DL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
?
DL݆i᡿Can Neural Network Memorization Be Localized?
DL݆i᡿Can Neural Network Memorization Be Localized?DL݆i᡿Can Neural Network Memorization Be Localized?
DL݆i᡿Can Neural Network Memorization Be Localized?
Deep Learning JP
?
DL݆i᡿Hopfield networkvBоˤĤ
DL݆i᡿Hopfield networkvBоˤĤDL݆i᡿Hopfield networkvBоˤĤ
DL݆i᡿Hopfield networkvBоˤĤ
Deep Learning JP
?
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
?
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
?
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
?
DL݆i᡿"Language Instructed Reinforcement Learning for Human-AI Coordination "
DL݆i᡿"Language Instructed Reinforcement Learning  for Human-AI Coordination "DL݆i᡿"Language Instructed Reinforcement Learning  for Human-AI Coordination "
DL݆i᡿"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
?
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat ModelsDL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
?
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
?
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
?
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
?
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
?
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
?
DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
DL݆i᡿AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
?
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
DL݆i᡿ "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
?
DL݆i᡿Zero-Shot Dual-Lens Super-Resolution
DL݆i᡿Zero-Shot Dual-Lens Super-ResolutionDL݆i᡿Zero-Shot Dual-Lens Super-Resolution
DL݆i᡿Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
?
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxivDL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
DL݆i᡿BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
?
DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
  DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...  DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
DL݆i᡿ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
?
DL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
DL݆i᡿AnyLoc: Towards Universal Visual Place RecognitionDL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
DL݆i᡿AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
?
DL݆i᡿Can Neural Network Memorization Be Localized?
DL݆i᡿Can Neural Network Memorization Be Localized?DL݆i᡿Can Neural Network Memorization Be Localized?
DL݆i᡿Can Neural Network Memorization Be Localized?
Deep Learning JP
?
DL݆i᡿Hopfield networkvBоˤĤ
DL݆i᡿Hopfield networkvBоˤĤDL݆i᡿Hopfield networkvBоˤĤ
DL݆i᡿Hopfield networkvBоˤĤ
Deep Learning JP
?
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
DL݆i᡿SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
?
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
DL݆i᡿RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
?
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
DL݆i᡿"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
?
DL݆i᡿"Language Instructed Reinforcement Learning for Human-AI Coordination "
DL݆i᡿"Language Instructed Reinforcement Learning  for Human-AI Coordination "DL݆i᡿"Language Instructed Reinforcement Learning  for Human-AI Coordination "
DL݆i᡿"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
?
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat ModelsDL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
DL݆i᡿Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
?
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
DL݆i᡿"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
?
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
DL݆i᡿Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
?
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
DL݆i᡿Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
?
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
DL݆i᡿Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
?
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
DL݆i᡿Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
?

[DL݆i] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

  • 1. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields |?ѧ?ѧԺ?ѧϵо gUӑѧ βо ?Ұ
  • 2. I 2 ? Փ?Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields C https://arxiv.org/abs/1611.08050 ? ߣZhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh C The Robotics Institute, Carnegie Mellon University ? _?24 Nov 2016 ? CVPR 2017 Oral ? ݺߣ ? Video ? ؤ˶Ϥ꤬oϤ, ӛՓ?, ݺߣ, Video?
  • 4. Abstract ? Ф}?2Dݩ`ʵĤ˗ʳ?᰸Ƥ. ? ?؏ C Part Affinity Fields(PAFs)?岿λȂ?vBѧƤ. C ܥȥॢå׵ĥץ`ȫФ?}򥨥󥳩`ɤ, ?ˤ餺, ?? ?ȤS. C Sequential Prediction with Learned Spatial ContextCNNy?˥åȤ R귵. C Jointly Learning Parts detection and Parts AssociationλΈȤvB ͬѧ ? Y C COCO2016keypoints challenge1λ, MPII Multi-Person benchmarkˤƄ ?ȤȤsotaϻؤä. 4
  • 5. Introduction ? ȴоǤ, }?νj״rǤ΂??岿λ֪y}Ȥ֪Ƥ . C ?, `뤬 C ?ͬ?Υ󥿥饯 C ??Ӌ ? top-downĥץ`?Η֪?, ˂?˄ƶ?. C ? ڤ?Η֪ʧȡ˄ƶǤʤ C ? ??Ӌ ? bottom-upĥץ` Փ?Ϥä. C ? ? ӛ٤ˌ, Х. C ? ? Ӌ֤ C ? ȴ?Ǥ, 岿λ?Υ`ХʥƥֱӵĤˤʹƤ餺, λΤĤʤϤ碌ƶӋ, ?. ? Փ?Ǥ, . ? Փ?Ǥ, bottom-upץ`, }?Υݩ`ƶSoTAξȤ_. C Part Affinity Fields(PAFs)ˤäƲλͬ?vBȤF岿λΈ, 򤭤򥨥󥳩`ɤ2 Ԫ٥ȥ. C λ֪vBȤΥܥȥॢå׵ĤʱFͬrƶ뤳Ȥ, `ХʥƥȤ? ֥󥳩`ɤǤ. ˤ, ???٤gF. 5
  • 6. Method ? Feed forward network ˤ, 岿λλä confidence map S (b) , ٥DzgvBȤ򥨥󥳩`ɤ affinity fields L (c). C S = (S1, S2, , SJ) Jconfidence maps֤. Jβλˤ줾쌝. ? Sj Rwh, j {1J} C L = (L1, L2, , LC) CΥ٥֤, 줾λڥˌ. ? Lc R wh2, c {1C} ? confidence mapaffinity fieldsԪ, Bipartite Matching(2gvS Ӌ)?(d) 6 <???죾 ?? ?
  • 7. Method > Simultaneous Detection and Association ? λ֪ confidence map ȲλvS󥳩`ɤ affinity fields ͬr y. ? 2ĤΥ֥; [] confidence mapy, [] affinity fieldsy. ? ϤޤConv(VGG-1910Ӥdzڻ)DŽI, ؏եޥåFȤ , Stage1??ˤʤ. ? Stage1, ֪ confidence map S1 = 1(F), part affinity fieldsΥå L1 = 1(F)?ɤ. (, Ϥ줾stageˤCNNs) ? ǰStageγ?2ĤԪ؏եޥåFĤʤϤ蘆ƴΤ??ˤʤ. 7
  • 8. Method > Simultaneous Detection and Association ? StageU뤴Ȥ, confidence maps()ξϤäƤƤ뤳Ȥ 狼. ? λconfidence map1st branch, PAFs2nd branchR귵yƤ. ֥˥v(L2`)?. ? Stage tˤ`v ? S*, L*: confidence map, affinity field. ? W(p): binary maskLjp˥ΥƩ`󤬤ʤ0. yؤΥڥʥƥ. C ȫ?v 8
  • 9. Method > Confidence Maps for Part Detection ? ʽ5ǤuΤ, map S*򥢥ΥƩ`󤵤줿2D`ݥ Ȥ?ɤ. ? }?, λ?Ȥ, λj, ?kˌꤹconfidence map Υԩ`ڤ٤. ? ޤ, confidence maps S*j,k (₎)?k?ɤ. C xj,kR2λj?kλФλäȤ, S*jkˤpR2¤Τ ˶x. ? : ԩ`ΎڤߺϤ{ C ͥåȥ`yʽ˺Ϥ碌, ȫƤ?vmap򤢤碌, λvconfidence map¤Τ褦ˤ. (maxڥ`) 9
  • 10. Method > Confidence Maps for Part Detection ? Confidence mapsƽȡΤǤϟo?ȡ褦ˤ뤳 ǡ˽_ΤޤޤǤ롣 ? ƥȕrϡpredictconfidence maps?ơnon-maximum suppression?Ȥǡ岿λyλää롣 C a?Non-maximum suppression铂ϤθϤǷֲ֤äƤϡ ?confidence ֤ķֲФ. 10
  • 11. Method > Part Affinity Fields for Part Association ? ??v, ֪줿λͬ?򤤤ˤĤʤ뤫. (a) C gȡ? }?äꤹȤޤĤʤϤ碌ʤ(b) ? λäΤߤΥ󥳩`ɤʤʤɱF?޽. ? Part affinity fieldsλ, 򺬤F(c) C limb(λΤĤʤϤ碌)ˌƸ2D٥ȥ C limbvB벿λĤʤaffinity field֤. 11
  • 12. Method > Part Affinity Fields for Part Association ? ٥ȥ낎ΛQ? C xj1,k , xj2,k: ?klimb c ˤ벿λj1,j2₎Ȥ. C Limb cϤpoint p_v, L*c,k(p)j1j2ؤ΅gλ٥ȥˤʤ. Limb c point pˤƤȫzero٥ȥ. ʽ. ? v = (xj2,k-xj1,k) / || xj2,k-xj1,k ||2 limb򤭤΅gλ٥ȥ룩 C Limb cϤɤж¤ʽǶx. ? lϥԥx, lc,kϲλgΥ`åɾx. 12
  • 13. Method > Part Affinity Fields for Part Association ? ͥåȥ`yΤ˺Ϥ碌ȫƤ?vfields򤢤碌 limbȤfield?. (average) C nc(p)point p ˤk peopleФǥʤ٥ȥ(limbؤʤä Ϥƽȡꤿ) ? ƥȕr, ꤹPAFξe֤aλλäY־֤ؤäӋ 㤹뤳Ȥ, aλgvBȤy. (= ʳ줿λY֤Ȥ alimb, y줿PAFg?¶Ȥy) C Ĥˤ, Ĥκaλdj1dj2ˌ, y줿PAF Lcλgξ֤ؤ ƥץ󥰤, λvBȤconfidenceӋy. ? p(u)Ĥβλdj1dj2Ĥʤλ C g?Ϥ, ?gu΂ӋƷe֤ƤƤ. 13
  • 14. Method > Multi-Person Parsing using PAFs ? λv}֪꣨}? or Ԥ椨, ֿܤlimbΥѥ`󤬶य. ? AFϤξeӋˤalimb˥Ĥ. mʽMߺϤ碌 ?Ĥ놖}, NP-Hard}ǤKԪΥޥå󥰆}ˌꤹ. C greedy ͷˤäƌIǤ. C ɤ, PAFͥåȥ`Ұ?, pair-wisevBȥa ˥`Х륳ƥȤ򥨥󥳩`ɤƤ뤿ȿ. () ? ޤ, }?κaλμϤä. C DJ = {dj m : for j {1J}, m {1Nj} } ? Nj: λjκa ? dj m R2: ϲλjm?Η֪a. ? λa dj1 mdj2 nĤʤäƤ뤫? z j1j2 mn {0,1}x. C `, ܤʿΤmʽMߺϤ碌?Ĥ뤳. C ZzȫMߺϤ碌v뼯. 14
  • 15. Method > Multi-Person Parsing using PAFs ? c?limbˤĤʤj1j2Υڥ򿼤. C ʽ10᤿cˤؤߤ?ˤʤ褦?gޥå󥰤?. C Eclimb type cΥޥå󥰤ȫؤߤ, Zclimb type cZΥ֥åȤ, Emnϲλdj1 mdj2 ngpart affinity (ʽ10Ƕx) C ʽ13,14, ĤΥåΩ`ɤ򥷥Τ. (ͬtypeΣĤlimb λιФ.) Hungarian algorithmm?Ĥ. 15
  • 16. Method > Multi-Person Parsing using PAFs ? }?ȫ?pose?Ĥ뤳ȿ. ? ZQΤKԪޥå󥰆}ˤʤ. Ά}NP Hard, य ͷ. ? Փ?Ǥ, ؤΥɥᥤmĤξͷm˼Ӥ. C (1) ȫդǤϤʤ, ?Υåǥĥ`ä (c) C (2) ޥå󥰆}bipartiteޥå󥰤Υֆ}μϤ˷ֽ⤷, OϤtree nodeФǶ?˥ޥå󥰤Q (d) ? Section 3.1?^YƤ, minimal greedy inference` ХʽӋͤ, 褯ƤƤ뤳Ȥ?Ƥ. 16
  • 17. Method > Multi-Person Parsing using PAFs ? Ĥξͷ, m¤Τ褦˥ץ˷ֽ⤵. ? 椨, limb typeˌ, ʽ12-14(limb cˌjoint?Ĥ )ʹä, ?, limbοaä. ? ȫƤlimba֤ä, ͬaλ򥷥뿎Mߤ , }?ȫ?pose. 17
  • 18. Results ? }?pose estimation2ĤΥ٥ީ` C (1) MPII human multi-person dataset 25k images, 40k ppl, 410 human activities C (2) the COCO 2016 keypoints challenge dataset ? ʌg״rλ򺬤ǩ`å ? 줾SotA. ? Ӌㄿʤv뿼Ӥ(Fig10. ) 18
  • 19. Results > Results on the MPII Multi-Person Dataset ? 铂PCKh, ȫ岿λvmean Average Precision(mAP)ָˤ?^. ? inference/optmization time per image?^ ? ƥȥåȤˤY C mAP: ؤ?ǏSotA8.5%ϻؤ루ϣ ? Scale searchoȤȴ?. MAPIIǩ`ȫǤ, 13%ʤä. scale search ʤ. ? ȴ??٤PAFsλgvSԤFΤЄʤȤ狼. C inference time: 6礯ʤäԔ3.3ǣ 19
  • 20. Results > Results on the MPII Multi-Person Dataset ? a? C mean Average Precision(mAP) ? PrecisionƥबжΤΤgHäΤθ. _ C λжΤΤäΤθ ? Recallǩ`åȤȫΤƥबжΤθϣЩ`ʣ C ǩ`åȤǥΥƩ`ȤƤ벿λΤ֪줿Τθ ? Average Precision(AP: ƽm)PrecisionRecallˤĤƽȤä. C ¤ʽǽƤƤζत. I: ʤ1, ؓʤ0v 20
  • 21. Results > Results on the MPII Multi-Person Dataset ? a? C mean Average Precision(mAP)ؤΈ ? mAPȫƤ?βλˌƽprecision C ޤ}дäƤ뻭ˌpose estimationg? C ?PCKh铂ˤȤŤơestimate줿ݥȤground truth(GT)˸ϤƤƤ C GT˸굱ƤʤäyݥȤϡfalse positiveȤƒQ C λȤAverage PrecisionAPӋ㡣 C ȫβλvAPƽȡäơmAPˤʤ롣 C PCKh threshhold ? PCPѩ`Ĥ΁IˤβλΗʳλäΥѩ`Ĥ?ΰ֤˽ ʳɹȤ. ? PCK?bounding box铂Ȥƶx ? PCKhHeadȤ50%?铂Ȥƶx 21
  • 22. Results > Results on the MPII Multi-Person Dataset ? ʤ륹ȥ?^Y C (6b) ȫƤΥѥ`Ĥʤå, (6c) ?ޤΥĥ`å˥ץ ߥ󥰤ǤȤ, (6d) ?ޤΥĥ`å؝르ꥺˤäƤȤ C ?ޤΥåǡܵĤˤ?֤ʤȤ?Ƥ. C (6d)ΥդʤäƤ.ȥ`˥󥰤΅Ϥ뤫פˤʤ뤿 ȿ.(13 edges vs 91 edges) 22
  • 23. Results > Results on the MPII Multi-Person Dataset ? PCKh-0.5ΤΈ, PAFʹ?Y, g??Y ꃞƤ. (one-midpoint2.9%?, two-midpoint2.3%?) ? ٥ĤƤʤ?˥ޥʹ?Ӗ, True Positivey 루ʣ, 2.3%ȸƤǤ. ? λdetectionGround Truth?, PAFsy?, mAP 88.3%. ΈDetectionˤ`oΤ, PCKh铂ˤ餺?. ? λconnectionGround Truth?, detect?`Ǥ, mAP 81.6%. C PAFˤconnectionж, Ground Truthʹ?ǤۤȤɾȤʤ. (79.4% vs 81.6%) . PAF, ??ȤǗʳǤ뤳?Ƥ. ? (b) stageȤmAP?^, ϤäƤ뤳Ȥ狼. 23
  • 24. Results > Results on the COCO Keypoints Challenge ? 10?Ϥ?, 100Ϥkeypoints. ? COCOǤuǤ, object keypoint similarity (OKS) x, 10 OKS铂Ǥmean average precision (AP) ʹ?. C OKS, JRˤIoUͬ۸. C ?Υ`, GTλäyβ, ΁I??Ӌ㤵. ? 3: top teamȤ?^. ? ?^?`?(APM)춤Ƥ, top-downץ`? ˾ؓƤ. C : ?Ǥ, Ϥ뤫?`, ڤȫƤ???˒Q ʤȤʤ. ? top-downץ`Ǥ, ?֪Ƥ?ȤФȡä ?ƒQΤ, ?`ˤӰ푤?^Ăʤ. 24
  • 25. a?: IoU ? ?: http://www.pyimagesearch.com/2016/11/07/intersection- over-union-iou-for-object-detection/ 25
  • 26. Results > Results on the COCO Keypoints Challenge ? xk줿^åȤΥ֥åȤǤ?^. ? GT bounding box single person CPMʹ?, CPM?top-downץ` upper-bond(62.7%)˵_Ǥ. ? SSD(Single shot multibox detecter)?, ѥե`ޥ󥹤10%¤. ? ?Υܥȥॢå?ʽǤ, 58.4%AP_. ? ?`󥰤줿I, ???CPMm?뤳Ȥ, 2.6%AP Ƥ. (I?precisionrecallθƤ˿ΤΤߥåץǩ`) C ?ʥ`Ǥsearch, ܥȥॢå?Υѥե`ޥ󥹤Ϥ뤳Ȥڴ . 26
  • 27. Results > Runtime Analysis ? Top-downץ``, Bottom-upץ`Ǥ, Runtime? ??ƤۤȤɉʤȤ狼. C CNN΄I?ˌO(1), multi-person parsingO(n2)⤽Ӌ ``?Τ, ӋBottom-upץ`, ?ˌƤۤȤɉ仯 ʤRuntime֤. 27
  • 28. Discussion ? ڤ?ˤ, }?2Ԫ˄?. ? ؤ? C 1. λλä, ?΁I?ŻF C 2. λʳvSԤͬѧ֥`ƥ C 3. greedy르ꥺˤ?ƤӋ֤, ?. ? Ĥʧ 28
  • 29. gװ ? C++https://github.com/CMU-Perceptual-Computing-Lab/openpose ? Caffehttps://github.com/ZheC/Realtime_Multi- Person_Pose_Estimation ? PyTorchhttps://github.com/tensorboy/pytorch_Realtime_Multi- Person_Pose_Estimation (trainδgװ) 29