�ݺ�ߣ

Deep Image:
Scaling up Image Recognition

Ren Wu

Distinguished Scientist, Baidu

wuren@

@��ڰٶ�

The Color of the Dress

Color Constancy

Human vs.

Arti?cial Intelligence

Summary @ GTC14

Big data + Deep learning + High performance computing =
Intelligence

Big data + Deep learning + Heterogeneous computing =
Success

GTC��14: Deep Learning Meets Heterogeneous Computing

Big Data
?? 2000PBStorage
?? 10-100PB/dayProcessing
?? 100b-1000bWebpages
?? 100b-1000bIndex
?? 1b-10b/dayUpdate
?? 100TB~1PB/dayLog

Computer Chess and Moore��s Law

Deep Blue

A classic example of application-speci?c system design comprised
of an IBM supercomputer with 480 custom-madeVLSI chess chips, running
massively parallel search algorithm with highly optimized implementation.

Heterogeneous Computing

1993 world #1

Think Machine CM5/1024

131 GFlops

2013

Samsung Note 3 smartphone

(Qualcomm SnapDragon 800)

129 G?ops

2000 world #1

ASCI White (IBM RS/6000SP)

6MW power, 106 tons

12.3 TFlops

2013

Two MacPro workstation

(dual AMD GPUs each)

14 TFlops

Deep Learning Applications

?? Speech recognition

?? Image recognition

?? Optical character recognition (OCR)

?? Language translation

?? Web search

?? Computational Ads (CTR)

?? ��

ImageNet Large-Scale Visual Recognition Challenge

?? ImageNet dataset

?? More than 15 million images belonging to about 22,000 categories

?? ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)

?? Classi?cation task: 1.2 million images contains 1,000 categories

?? One of the most challenging computer vision benchmarks

?? Increasing attention both from industry and academic communities

* Olga Russakovsky et al. ECCV 2014

ImageNet Classi?cation Challenge

ImageNet classi?cation 2012-2014

Team

Year

Place

Error (top-5)

Uses external
data

SuperVision

2012

-

16.4%

no

SuperVision

2012

1st

15.3%

ImageNet 22k

Clarifai

2013

-

11.7%

no

Clarifai

2013

1st

11.2%

ImageNet 22k

MSRA

2014

3rd

7.35%

no

VGG

2014

2nd

7.32%

no

GoogLeNet

2014

1st

6.67%

no

�ݺ�ߣ credit: Yangqing Jia, Google

Invincible ?

Our approach �C Insights and inspirations
��ʤ��㲻ʤ

��?�� (544-496 BC) ��ƪ

More calculations win, few
calculation lose

ԪԪ��?��Ǣ��

�� (32-92 AD) ?��

Meaning the more you see the
more you know

��?��Բ��֮ĩ

��?�� (372-289 BC) ��?��

ability to see very ?ne details

Project Minwa �C �ٶ��

?? Minerva + Athena + ?Ů�

?? Athena: Goddess of Wisdom,Warfare,
Divine Intelligence,Architecture, and Crafts

?? Minerva: Goddess of wisdom, magic,
medicine, arts, commerce and defense

?? ?Ů�: ��?��?��, ��?ʯ��, ��, ��

World��s Largest Arti?cial Neural Networks

v??Pushing the State-of-the-Art

v??~ 100x bigger than previous ones

v??New kind of Intelligence?

Hardware/Software Co-design
?? Stochastic gradient decent (SGD)

?? High compute density

?? Scale up, up to 100 nodes

?? High bandwidth low latency

?? 36 nodes, 144 GPUs, 6.9TB Host, 1.7TB Device

?? 0.6 PFLOPS

?? Highly Optimized software stack

?? RDMA/GPU Direct

?? New data partition and communication
strategies

GPUs
In?niband

Speedup ( wall time for convergence )
Validation set accuracy for different numbers of GPUs
0
?
0.1
?
0.2
?
0.3
?
0.4
?
0.5
?
0.6
?
0.7
?
0.8
?
0.9
?
0.25
? 0.5
? 1
? 2
? 4
? 8
? 16
? 32
? 64
? 128
? 256
?
Accuracy
Time (hours)
32 GPU
16 GPU
1 GPU
Accuracy 80%
32 GPU: 8.6 hours
1 GPU: 212 hours
Speedup: 24.7x

Never have enough training
examples!

Key observations

?? Invariant to illuminant of the
scene

?? Invariant to observers

Augmentation approaches

?? Color casting

?? Optical distortion

?? Rotation and cropping etc

Data Augmentation

��?��ʶ??�㡱

And the Color Constancy

Key observations

?? Invariant to illuminant of the scene

?? Invariant to observers

Augmentation approaches

?? Color casting

?? Optical distortion

?? Rotation and cropping etc

The color of the Dress

��Inspired by the color constancy principal.
Essentially, this ��forces�� our neural network to
develop its own color constancy ability.��

Data Augmentation

Augmentation The number of possible changes
Color casting 68920
Vignetting 1960
Lens distortion 260
Rotation 20
Flipping 2
Cropping 82944(crop size is 224x224, input image
size is 512x512)
Possible variations
The Deep Image system learned from ~2 billion examples, out
of 90 billion possible candidates.

Data augmentation vs. Over?tting

Examples

Bathtub
Isopod
Indian elephant
Ice bear
Some hard cases addressed by adding our data augmentation.

Multi-scale training

?? Same crop size, different
resolution

?? Fixed-size 224*224

?? Downsized training images

?? Reduces computational costs

?? But not for state-of-the-art

?? Different models trained by
different image sizes

256*256
512*512
?? High-resolution model works

?? 256x256: top-5 7.96%

?? 512x512: top-5 7.42%

?? Multi-scale models are
complementary

?? Fused model: 6.97%

��

Multi-scale training

Tricycle
Washer
Backpack
Little blue heron

Model

?? One basic con?guration has 16 layers

?? The number of weights in our con?guration is 212.7M

?? About 40% bigger than VGG��s

Team Top-1 val. error Top-5 val. error
GoogLeNet - 7.89%
VGG 25.9% 8.0%
Deep Image 24.88% 7.42%

Compare to state-of-the-art

Deep Image has set the new record of 5.98% top-5 error rate for test dataset, a
10.2% relative improvement than the previous best result.
Team Year Place Top-5 test error
SuperVision 2012 1 16.42%
ISI 2012 2 26.17%
VGG 2012 3 26.98%
Clarifai 2013 1 11.74%
NUS 2013 2 12.95%
ZF 2013 3 13.51%
GoogLeNet 2014 1 6.66%
VGG 2014 2 7.32%
MSRA 2014 3 8.06%
Andrew Howard 2014 4 8.11%
DeeperVision 2014 5 9.51%
Deep Image - -
5.98%

Latest results

Team Date Top-5 test error
GoogLeNet 2014 6.66%
Deep Image 01/12/2015 5.98%
Deep Image 02/05/2015 5.33%
Microsoft 02/05/2015 4.94%
Google 03/02/2015 4.82%
Deep Image 03/17/2015 4.83%

Major differentiators

?? Customized built supercomputer dedicated for DL

?? Simple, scalable algorithm + Fully optimized
software stack

?? Larger models

?? More Aggressive data augmentation

?? Multi-scale, include high-resolution images

Brute force + Insights

and push for extreme

�ݺ�ߣ

DeepImage_GTC15_public

More Related Content

DeepImage_GTC15_public