�ݺ�ߣ

A Large-scale hierarchical image database
‫پایگاه‬‫بزرگ‬ ‫مقیاس‬ ‫در‬ ‫مراتبی‬ ‫سلسله‬ ‫تصویر‬ ‫های‬ ‫داده‬
ImageNet
research fellow : Reza Pourabbas , Saber Qasemian

Outline
• 1. Introduction
• 2. Properties of ImageNet
• 3. Constructing ImageNet
• 4. ImageNet Applications
• 5. Future Work

• The digital era has brought with it an enormous explosion of data.
• More robust models and algorithms can be proposed by exploiting these
images.
• How such data can be utilized and organized ?
• In this paper, introduce a new image database called “ImageNet”.
Introduction 1

• ImageNet uses the hierarchical structure of WordNet [9].
Each meaningful concept in WordNet, possibly described
by multiple words or word phrase, is called a “synset”.
• In ImageNet, we aim to provide on average 500-1000 images to illustrate each synset.
• Images of each concept are quality-controlled.
• ImageNet, therefore, will offer tens of millions of cleanly sorted images.
Introduction

2Properties of ImageNet
2.1 Scale
2.2 Hierarchy
2.3 Accuracy
2.3.1 Clean dataset
2.4 Diversity
2.4.1 Variable appearances, positions, view points

ScaleProperties of ImageNet
3.2 million images
Over 5274 categories
Over 600 images for each synset

HierarchyProperties of ImageNet
Figure 3: Comparison of the “cat” and “cattle” subtrees between ESP [25] and ImageNet.

• 80 synsets randomly sampled
• Average of 99.7% precision
Accuracy
AccuracyProperties of ImageNet
Figure 4: Percent of clean images at different tree depth levels in ImageNet.

DiversityProperties of ImageNet
(a) Comparison of the lossless JPG ﬁle sizes of average images for four different synsets in ImageNet and Caltech101.
(b) Example images from ImageNet and average images for each synset indicated by (a).
(c) Examples images from Caltech101 and average images.
Figure 5: ImageNet provides diversiﬁed images.

Related
Datasets
Properties of ImageNet
• Small image datasets
A number of well labeled small datasets (Caltech101/256, MSRC, PASCAL etc.) .
Most common to use of today’s computer vision .
ImageNet offers 20x the number of categories, and 100x the number of total
images than these datasets .

Properties of ImageNet Related
Datasets
• TinyImage
- TinyImage is a dataset of 80 million 32 × 32 low resolution images .
- Each synset contains an average of 1000 images .
- 10-25% are possibly clean images .
• LabelMe and Lotus Hill datasets
- Provide 30k and 50k labeled and segmented images .
- Both have around 200 categories .
- Outlines and locations of objects are provided .
- Images are largely uploaded or provided by users or researchers of the datasets .

ESP dataset
Acquire through an online game .
Labels largely concentrate at the “basic level”
of the semantic hierarchy .
Most of the ESP dataset is not publicly available .
Only 60k images can be accessed .
ESP dataset
Figure 6: Comparison of the distribution of “mammal” labels .

Table 1: Comparison of some of the properties of ImageNet versus other existing datasets.

Constructing ImageNet
Collecting Candidate Images
Collect from the Internet by querying several image search engines .
The queries are the set of WordNet synonyms .
Search engines typically limit the number of images .
Expend the query set by appending the queries with the word from parent synset .
Translate the queries into other languages, including Chinese, Spanish, Dutch and Italian .
3

Cleaning Candidate Image
• Rely on humans to verify each candidate image .
• By using the service of Amazon Mechanical Turk (AMT).
• Ask the users to verify whether each image contains objects
of the synset .
• Have multiple users independently label the same image .
among users . Different categories require different levels of consensus

• A simple algorithm to determine the number of agreements needed for
different categories of images .
• For each synset, randomly sample an initial subset of images At least
10 users to vote .
• Obtain a conﬁdence score table .
• For each of remaining candidate images, proceed with the AMT user
labeling until a conﬁdence score threshold is reached .

Object Recognition
- NN-voting + noisy ImageNet
• Use original candidate images .
• Down sample to 32 x 32 .
- NN-voting + clean ImageNet
• Use clean images .
ImageNet Applications 4

• NBNN
- SIFT descriptors are used .
- Compute the query-class distance .
• NBNN-100
- Limit the number of images per category to 100 .
ImageNet Applications

• Tree Based Image Classification
• A simple object classification method “tree-max classifier” .
• Imagine you have a classifier at each synset node of the tree
and you want to decide whether an image contains an object
of that synset or not .
• The maximum of all the classifier responses in this subtree
becomes the classification score of the query image .

Automatic Object Localization
[14] L.-J. Li, G. Wang, and L. Fei-Fei. OPTIMOL: automatic Online Picture
collection via Incremental Model Learning.
We annotated 100 images in 22 different categories of the mammal
and vehicle subtrees with bounding boxes around the objects of that category.

Future Work
• Completing ImageNet
- Have roughly 50 million clean, diverse and full resolution
images spread over approximately 50K synsets
- Make it publicly available and readily accessible online .
- Extend ImageNet to include more information .
• Foster an ImageNet community and develop an online platform .
5

�ݺ�ߣ

A Large-scale hierarchical image database

More Related Content

A Large-scale hierarchical image database