狠狠撸

AI画像認識事例介
噂の〇〇判定機
富田篤 / Atsushi “Bird” Tomita
Facebook @bird.tomita
twitter @bird_tomita
Instagram @bird_tomita
IBM
Cloud
Community
名古屋女子部発足記念！！勉強会

で、画像認識って何よ
？

身近な画像認識
QRコードタグ付けAR写真アプリ
画像認識
画像解析物体検出颜识别

身近な画像認識
シーン判別/
タグ付け
お掃除ロボレシートアプリ
OCR?形状認識空間認識
画像認識
シーン解析

Watsonにも画像認識
サービスあります

https://visual-recognition-demo.mybluemix.net
Watson Visual Recognition

? 一般画像分類
? 顔の位置と年齢?性別
? 食べ物判定
? 文字認識
(Closed Beta?英語のみ）
? カスタム分类器の作成
できること

やってみた

やっぱり
オリジナルの画像認識を
作りたい！

Watsonの前に
流行りのTensorflowで
分類器を学習させるには

手書き文字認識のサンプル
import tensorflow as tf
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
sess = tf.InteractiveSession()
x = tf.placeholder("float", shape=[None, 784])
x_image = tf.reshape(x, [-1,28,28,1])
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
W_fc1 = weight_variable([7*7*64,1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024,10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
y_ = tf.placeholder("float", shape=[None, 10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
feed_dict = {x:batch[0],y_:batch[1],keep_prob:1.0}
train_accuracy = accuracy.eval(feed_dict=feed_dict)
print("step %d, training accuracy %g" % (i, train_accuracy))
train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.5})
feed_dict={x:mnist.test.images, y_: mnist.test.labels,keep_prob:1.0}
print("test accuracy %g" % accuracy.eval(feed_dict=feed_dict))

TensorFlowなどの場合
? 画像認識のネットワーク構築には優秀なエンジニア
が必要
? 学習用の教師データの加工が大変
? 学習データ量が多く必要（１クラスに1000枚以上）
? 高速なGPUマシンが必要
? 何より多くの学習コストがかかる

Watsonなどの
サービスを活用しよう

ps://www.ibm.com/watson/developercloud/visual-recognition/api/v
Watson Visual Recognition API reference

Curl
curl -X POST -F "beagle_positive_examples=@beagle.zip" -F "g

Python
import json
from os.path import join, dirname
from os import environ
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3('2016-05-20', api_key='{api_key}')
with open(join(dirname(__file__), '../resources/trucks.zip'), 'rb') as trucks,
open(join(dirname(__file__), '../resources/cars.zip'), 'rb') as cars:
print(json.dumps(visual_recognition.create_classifier('CarsvsTrucks', trucks_positive_examples

Node.js
var watson = require('watson-developer-cloud');
var fs = require('fs');
var visual_recognition = watson.visual_recognition({
api_key: '{api_key}',
version: 'v3',
version_date: '2016-05-20'
});
var params = {
name: 'fruit',
apple_positive_examples: fs.createReadStream('./apples.zip'),
banana_positive_examples: fs.createReadStream('./yellow.zip'),
orange_positive_examples: fs.createReadStream('./pos_ex.zip'),
negative_examples: fs.createReadStream('./vegetables.zip')
};
visual_recognition.createClassifier(params,
function(err, response) {
if (err)
console.log(err);
else
console.log(JSON.stringify(response, null, 2));
});

コード書きたくない？
めんどくさい？

GUIツールあります
Tool

始め方

叠濒耻别尘颈虫にログインしてカタログを选択

カテゴリ-&驳迟;奥补迟蝉辞苍

基本的にそのままで翱碍

価格プランを决めて作成

無料プランあり?
（1日250イベントまで）

サービス資格情報から
api_keyをコピー

先ほどの补辫颈冲办别测でログイン

カスタム分类器の作り方

画像を用意
? Visual Recognitionは「１クラス最低10枚」「推奨５
０枚以上」の画像を入れれば分類器が作成できる。

フォルダごとに画像を分けたら「アーカイブユーティリティ」
にドラッグすれば、フォルダごとの圧縮ファイルができます

分類器の名前を決めて(例：dogs)
クラスの名前を決めて
(例：Chiwawa)
zipファイルをドラッグ
クラスを追加
作成！

１?５分ほど（画像が多ければ30分くらい）待って
training が ready になったら学习终了です

A「B子ってなんかアイドルにいそうだよねー」
俺（…どこがだよ。
だいたいAKBだとか乃木坂だとか、
どれがどれだかわかんねーよ）

ってことで、
アイドルグループの
どこに似てるのか、向いてるのか
判定するAIを作ってみました

m.com/blogs/bluemix/2016/10/watson-visual-recognition-training-
アプリの方向性を決める前に気をつけること
Examples of difficult use cases
While Watson Visual Recognition is highly flexible, there have been a number of recurring use case that we’ve seen the API either
struggle on or require significant pre/post-work from the user.
Face Recognition: Visual Recognition is capable of face detection (detecting the presence of faces) not face recognition
(identifying individuals).
Detecting details: Occasionally, users want to classify an image based on a small section of an image or details scattered
within an image. Because Watson analyzes the entire image when training, it may struggle on classifications that depend on
small details. Some users have adopted the strategy of breaking the image into pieces or zooming into relevant parts of an
image. See this hail classification use case as an example (video).
Emotion: Emotion classification (whether facial emotion or contextual emotion) is not a feature currently supported by Visual
Recognition. Some users have attempted to do this through custom classifiers, but this is an edge case and we cannot
estimate the accuracy of this type of training.
顔認識については、VisualRecogniotionは、
＜人間の顔＞を認識するには向いているが、
＜個人の識別＞をするのには向いていない

ふむ、、個人判定は難しいのか。。。
まあ！ざっくりだ
ざっくり
グループならいいだろう！
AKBっぽいとかももクロっぽいとか

グッドトレーニングのための
ガイドライン
? https://www.ibm.com/watson/developercloud/doc/visual-recognition/customizing.html#guidelines-for-good-
training
? zipファイルあたり50枚以上の画像を推奨
? 多くの画像があれば精度はあがる。5000枚くらいだといい。これ以上増やしても劇的に上がることはな
い。
? .zipファイルごとに合計150?200枚の画像をアップロードすると、訓練にかかる時間と分類子の精度向上
のバランスが最適になります。 200以上の画像は時間を増加させ、精度を向上させますが、時間がかかる
。
? 各zipファイルには、ほぼ同じ数のイメージを含めます。不均等な数の画像を含むと、訓練された分類器
の品質が低下する可能性がある。
? 画像の品質によって分類器の品質も変わるので、スマホで撮った写真だけでなく、プロの素材などもある
と良いです。
? 画像は幅?高さ320px以下に抑えた方がいい。高解像度である必要はありません。

やること
? 各グループの画像を集める。（某M社の検索エンジンAPI
でごっそり集めました。バストアップのみ）
? グループ毎のフォルダにダウンロード。サイズを圧縮（こ
こまでPythonのスクリプトでできるようにしたけど、手
作業でも大丈夫）
? Visual Recognition Toolに識別器とクラスを作って、フォ
ルダを圧縮したZIPを入れる。
? しばらく待つ

せっかくだからでNode-redで
アプリにしてみる

お寿司判定AI
（ただしマグロに限る
）

の場合
? 画像認識のネットワーク構築は不要。機械学習エン
ジニアである必要はない。
? 学習用の教師データ加工をしなくても大丈夫
? 学習データ量が少なくてもいい（最低２０枚から）
? 高速なGPUマシンは不要。
? 何よりすぐ始められて学習コストがほとんど０

今すぐ始められる画像認識
あなたなら何を作る？

ありがとうございました

狠狠撸

Watson visual recognition_tool_bluemix女子部名古屋勉強会

More Related Content

Watson visual recognition_tool_bluemix女子部名古屋勉強会