This document summarizes a project to develop an image recognition application for identifying Korean foods and estimating their calorie contents. It used Faster R-CNN with a VGG16 model trained on 40 food classes with 300 training and 100 test images per class. Initial mean average precision results were low for some classes like cooked rice. Further analysis found the model was confusing similar looking foods. To address this, a post-processing layer was added to merge similar predictions and select the most probable class based on intersection over union and probability. Live demos of the web application showed it could now accurately identify multiple foods in single images and estimate calorie totals.
8. Training Outline
300 training images per class
100 test images per class
40 classes
• Batch size: 128
• Iteration: 140,000
• Pre trained CNN model
○ VGG16 on Imagenet
• Hardware:
○ NVIDIA GTX 1080
• Software: Caffe
30. Post-processing layer
2. Confusion problem
If IoU > 0.7 :
Merge 2 boxes
Select class with larger p(x) in a greedy way
𝐼𝑜𝑈 =
𝐴 ∩ 𝐵
𝐴 ∪ 𝐵
Intersection over Union
46. Discussion
1. Faster-RCNN’s nature:
• Learns multiple classes simultaneously
• Needs high quality data set that has every class labeled in the image
47. Discussion
2. Discovered Faster-RCNN’s downside:
• Even though it uses CNN,
it does not consider the detected object’s visual context
○ Architectural trade-off
• Vanilla CNN is better for catching context
• Even naïve bayes would help: the combination of foods