Deep Image is Baidu's project to build very large neural networks for image recognition using a customized supercomputer with over 100 nodes and 144 GPUs. Their approach focuses on scaling up models, aggressively augmenting data, and training on multiple image scales including high-resolution images. Their best model achieved a top-5 error rate of 5.33% on ImageNet, beating the previous state-of-the-art. They were able to train their largest model 24.7 times faster than a single GPU through optimized distributed training software and hardware.
1 of 42
Downloaded 95 times
More Related Content
DeepImage_GTC15_public
1. Deep Image:
Scaling up Image Recognition
Ren Wu
Distinguished Scientist, Baidu
wuren@
@ÈÍÔÚ°Ù¶È
3. The Color of the Dress
Color Constancy
Human vs.
Arti?cial Intelligence
5. Summary @ GTC14
Big data + Deep learning + High performance computing =
Intelligence
Big data + Deep learning + Heterogeneous computing =
Success
GTC¡¯14: Deep Learning Meets Heterogeneous Computing
10. Deep Blue
A classic example of application-speci?c system design comprised
of an IBM supercomputer with 480 custom-madeVLSI chess chips, running
massively parallel search algorithm with highly optimized implementation.
12. Deep Learning Applications
?? Speech recognition
?? Image recognition
?? Optical character recognition (OCR)
?? Language translation
?? Web search
?? Computational Ads (CTR)
?? ¡
13. ImageNet Large-Scale Visual Recognition Challenge
?? ImageNet dataset
?? More than 15 million images belonging to about 22,000 categories
?? ILSVRC (ImageNet Large-Scale Visual Recognition Challenge)
?? Classi?cation task: 1.2 million images contains 1,000 categories
?? One of the most challenging computer vision benchmarks
?? Increasing attention both from industry and academic communities
* Olga Russakovsky et al. ECCV 2014
21. Never have enough training
examples!
Key observations
?? Invariant to illuminant of the
scene
?? Invariant to observers
Augmentation approaches
?? Color casting
?? Optical distortion
?? Rotation and cropping etc
Data Augmentation
¡°?¼û¶àʶ??¹ã¡±
22. And the Color Constancy
Key observations
?? Invariant to illuminant of the scene
?? Invariant to observers
Augmentation approaches
?? Color casting
?? Optical distortion
?? Rotation and cropping etc
The color of the Dress
¡°Inspired by the color constancy principal.
Essentially, this ¡®forces¡¯ our neural network to
develop its own color constancy ability.¡±
23. Data Augmentation
Augmentation The number of possible changes
Color casting 68920
Vignetting 1960
Lens distortion 260
Rotation 20
Flipping 2
Cropping 82944(crop size is 224x224, input image
size is 512x512)
Possible variations
The Deep Image system learned from ~2 billion examples, out
of 90 billion possible candidates.
26. Multi-scale training
?? Same crop size, different
resolution
?? Fixed-size 224*224
?? Downsized training images
?? Reduces computational costs
?? But not for state-of-the-art
?? Different models trained by
different image sizes
256*256
512*512
?? High-resolution model works
?? 256x256: top-5 7.96%
?? 512x512: top-5 7.42%
?? Multi-scale models are
complementary
?? Fused model: 6.97%
¡°Ã÷²éÇïºÁ¡±
29. Model
?? One basic con?guration has 16 layers
?? The number of weights in our con?guration is 212.7M
?? About 40% bigger than VGG¡¯s
Team Top-1 val. error Top-5 val. error
GoogLeNet - 7.89%
VGG 25.9% 8.0%
Deep Image 24.88% 7.42%
30. Compare to state-of-the-art
Deep Image has set the new record of 5.98% top-5 error rate for test dataset, a
10.2% relative improvement than the previous best result.
Team Year Place Top-5 test error
SuperVision 2012 1 16.42%
ISI 2012 2 26.17%
VGG 2012 3 26.98%
Clarifai 2013 1 11.74%
NUS 2013 2 12.95%
ZF 2013 3 13.51%
GoogLeNet 2014 1 6.66%
VGG 2014 2 7.32%
MSRA 2014 3 8.06%
Andrew Howard 2014 4 8.11%
DeeperVision 2014 5 9.51%
Deep Image - -
5.98%
31. Latest results
Team Date Top-5 test error
GoogLeNet 2014 6.66%
Deep Image 01/12/2015 5.98%
Deep Image 02/05/2015 5.33%
Microsoft 02/05/2015 4.94%
Google 03/02/2015 4.82%
Deep Image 03/17/2015 4.83%
41. Major differentiators
?? Customized built supercomputer dedicated for DL
?? Simple, scalable algorithm + Fully optimized
software stack
?? Larger models
?? More Aggressive data augmentation
?? Multi-scale, include high-resolution images
Brute force + Insights
and push for extreme