The digital era has brought with it an enormous explosion of data.
More robust models and algorithms can be proposed by exploiting these images.
How such data can be utilized and organized ?
In this paper, introduce a new image database called ImageNet.
2. Outline
1. Introduction
2. Properties of ImageNet
3. Constructing ImageNet
4. ImageNet Applications
5. Future Work
3. The digital era has brought with it an enormous explosion of data.
More robust models and algorithms can be proposed by exploiting these
images.
How such data can be utilized and organized ?
In this paper, introduce a new image database called ImageNet.
Introduction 1
4. ImageNet uses the hierarchical structure of WordNet [9].
Each meaningful concept in WordNet, possibly described
by multiple words or word phrase, is called a synset.
In ImageNet, we aim to provide on average 500-1000 images to illustrate each synset.
Images of each concept are quality-controlled.
ImageNet, therefore, will offer tens of millions of cleanly sorted images.
Introduction
9. 80 synsets randomly sampled
Average of 99.7% precision
Accuracy
AccuracyProperties of ImageNet
Figure 4: Percent of clean images at different tree depth levels in ImageNet.
10. DiversityProperties of ImageNet
(a) Comparison of the lossless JPG 鍖le sizes of average images for four different synsets in ImageNet and Caltech101.
(b) Example images from ImageNet and average images for each synset indicated by (a).
(c) Examples images from Caltech101 and average images.
Figure 5: ImageNet provides diversi鍖ed images.
11. Related
Datasets
Properties of ImageNet
Small image datasets
A number of well labeled small datasets (Caltech101/256, MSRC, PASCAL etc.) .
Most common to use of todays computer vision .
ImageNet offers 20x the number of categories, and 100x the number of total
images than these datasets .
12. Properties of ImageNet Related
Datasets
TinyImage
- TinyImage is a dataset of 80 million 32 32 low resolution images .
- Each synset contains an average of 1000 images .
- 10-25% are possibly clean images .
LabelMe and Lotus Hill datasets
- Provide 30k and 50k labeled and segmented images .
- Both have around 200 categories .
- Outlines and locations of objects are provided .
- Images are largely uploaded or provided by users or researchers of the datasets .
13. Properties of ImageNet
ESP dataset
Acquire through an online game .
Labels largely concentrate at the basic level
of the semantic hierarchy .
Most of the ESP dataset is not publicly available .
Only 60k images can be accessed .
ESP dataset
Figure 6: Comparison of the distribution of mammal labels .
14. Properties of ImageNet
Table 1: Comparison of some of the properties of ImageNet versus other existing datasets.
15. Constructing ImageNet
Collecting Candidate Images
Collect from the Internet by querying several image search engines .
The queries are the set of WordNet synonyms .
Search engines typically limit the number of images .
Expend the query set by appending the queries with the word from parent synset .
Translate the queries into other languages, including Chinese, Spanish, Dutch and Italian .
3
16. Constructing ImageNet
Cleaning Candidate Image
Rely on humans to verify each candidate image .
By using the service of Amazon Mechanical Turk (AMT).
Ask the users to verify whether each image contains objects
of the synset .
Have multiple users independently label the same image .
among users . Different categories require different levels of consensus
17. A simple algorithm to determine the number of agreements needed for
different categories of images .
For each synset, randomly sample an initial subset of images At least
10 users to vote .
Obtain a con鍖dence score table .
For each of remaining candidate images, proceed with the AMT user
labeling until a con鍖dence score threshold is reached .
Constructing ImageNet
18. Object Recognition
- NN-voting + noisy ImageNet
Use original candidate images .
Down sample to 32 x 32 .
- NN-voting + clean ImageNet
Use clean images .
ImageNet Applications 4
19. NBNN
- SIFT descriptors are used .
- Compute the query-class distance .
NBNN-100
- Limit the number of images per category to 100 .
ImageNet Applications
20. Tree Based Image Classi鍖cation
A simple object classi鍖cation method tree-max classi鍖er .
Imagine you have a classi鍖er at each synset node of the tree
and you want to decide whether an image contains an object
of that synset or not .
The maximum of all the classi鍖er responses in this subtree
becomes the classi鍖cation score of the query image .
ImageNet Applications
21. ImageNet Applications
Automatic Object Localization
[14] L.-J. Li, G. Wang, and L. Fei-Fei. OPTIMOL: automatic Online Picture
collection via Incremental Model Learning.
We annotated 100 images in 22 different categories of the mammal
and vehicle subtrees with bounding boxes around the objects of that category.
23. Future Work
Completing ImageNet
- Have roughly 50 million clean, diverse and full resolution
images spread over approximately 50K synsets
- Make it publicly available and readily accessible online .
- Extend ImageNet to include more information .
Foster an ImageNet community and develop an online platform .
5