狠狠撸

狠狠撸Share a Scribd company logo
What’s svm
? In machine learning, support vector machines
(SVMs, also support vector networks[1]
) are
supervised learning models with associated
learning algorithms that analyze data and
recognize patterns, used for classification and
regression analysis . It’s widely used at many
International Trade Site for product classification
.
? The author of this ppt. work for taobao.com and
mfrbee.com about 10 years
SVMLight
? SVMLight is an implementation of Support
Vector Machine (SVM) in C.
? Download source from :
http://svmlight.joachims.org/
Detailed description about:
?What are the features of SVMLight?
?How to install it?
?How to use it?
?…
Training Step
? svm-learn [-option] train_file model_file
?train_file contains training data;
?The filename of train_file can be any filename;
?The extension of train_file can be defined by user arbitrarily;
?model_file contains the model built based on training data by SVM;
Format of input file (training data)
? For text classification, training data is a collection of
documents;
? Each line represents a document;
? Each feature represents a term (word) in the document;
– The label and each of the feature: value pairs are separated by a
space character
– Feature: value pairs MUST be ordered by increasing feature
number
? Feature value : e.g., tf-idf;
Testing Step
? svm-classify test_file model_file predictions
?The format of test_file is exactly the same as train_file;
?Needs to be scaled into same range;
?We use the model built based on training data to classify test data,
and compare the predictions with the original label of each test
document;
Which means the first
document is classified
correctly but the second
one is incorrectly.
Example
? In test_file, we have:
1 101:0.2 205:4 209:0.2 304:0.2…
-1 202:0.1 203:0.1 208:0.1 209:0.3…
…
…
After running the svm_classify, the
Predictions may be:
1.045
-0.987
…
…
Which means this classifier
classify these two documents
Correctly.
1.045
0.987
…
…
or
Confusion Matrix
?a is the number of correct predictions that an instance is negative;
?b is the number of incorrect predictions that an instance is positive;
?c is the number of incorrect predictions that an instance if negative;
?d is the number of correct predictions that an instance is positive;
Predicted
negative positive
Actual negative a b
positive c d
Evaluations of Performance
? Accuracy (AC) is the proportion of the total number of predictions
that were correct.
AC = (a + d) / (a + b + c + d)
? Recall is the proportion of positive cases that were correctly
identified.
R = d / (c + d)
? Precision is the proportion of the predicted positive cases that were
correct.
P = d / (b + d)
?
Actual positive cases number
predicted positive cases number
Example
4 5 0 " - "
5 5 0 " + "
A c t u a l T e s t C a s e s :
4 0 0
5 3 0
P r e d i c t e d :
5 0
2 0
For this classifier:
a = 400
b = 50
c = 20
d = 530
Accuracy = (400 + 530) / 1000 = 93%
Precision = d / (b + d) = 530 / 580 = 91.4%
Recall = d / (c + d) = 530 / 550 = 96.4%
Example
4 5 0 " - "
5 5 0 " + "
A c t u a l T e s t C a s e s :
4 0 0
5 3 0
P r e d i c t e d :
5 0
2 0
For this classifier:
a = 400
b = 50
c = 20
d = 530
Accuracy = (400 + 530) / 1000 = 93%
Precision = d / (b + d) = 530 / 580 = 91.4%
Recall = d / (c + d) = 530 / 550 = 96.4%

More Related Content

Svm light at E-commerce Website

  • 1. What’s svm ? In machine learning, support vector machines (SVMs, also support vector networks[1] ) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis . It’s widely used at many International Trade Site for product classification . ? The author of this ppt. work for taobao.com and mfrbee.com about 10 years
  • 2. SVMLight ? SVMLight is an implementation of Support Vector Machine (SVM) in C. ? Download source from : http://svmlight.joachims.org/ Detailed description about: ?What are the features of SVMLight? ?How to install it? ?How to use it? ?…
  • 3. Training Step ? svm-learn [-option] train_file model_file ?train_file contains training data; ?The filename of train_file can be any filename; ?The extension of train_file can be defined by user arbitrarily; ?model_file contains the model built based on training data by SVM;
  • 4. Format of input file (training data) ? For text classification, training data is a collection of documents; ? Each line represents a document; ? Each feature represents a term (word) in the document; – The label and each of the feature: value pairs are separated by a space character – Feature: value pairs MUST be ordered by increasing feature number ? Feature value : e.g., tf-idf;
  • 5. Testing Step ? svm-classify test_file model_file predictions ?The format of test_file is exactly the same as train_file; ?Needs to be scaled into same range; ?We use the model built based on training data to classify test data, and compare the predictions with the original label of each test document;
  • 6. Which means the first document is classified correctly but the second one is incorrectly. Example ? In test_file, we have: 1 101:0.2 205:4 209:0.2 304:0.2… -1 202:0.1 203:0.1 208:0.1 209:0.3… … … After running the svm_classify, the Predictions may be: 1.045 -0.987 … … Which means this classifier classify these two documents Correctly. 1.045 0.987 … … or
  • 7. Confusion Matrix ?a is the number of correct predictions that an instance is negative; ?b is the number of incorrect predictions that an instance is positive; ?c is the number of incorrect predictions that an instance if negative; ?d is the number of correct predictions that an instance is positive; Predicted negative positive Actual negative a b positive c d
  • 8. Evaluations of Performance ? Accuracy (AC) is the proportion of the total number of predictions that were correct. AC = (a + d) / (a + b + c + d) ? Recall is the proportion of positive cases that were correctly identified. R = d / (c + d) ? Precision is the proportion of the predicted positive cases that were correct. P = d / (b + d) ? Actual positive cases number predicted positive cases number
  • 9. Example 4 5 0 " - " 5 5 0 " + " A c t u a l T e s t C a s e s : 4 0 0 5 3 0 P r e d i c t e d : 5 0 2 0 For this classifier: a = 400 b = 50 c = 20 d = 530 Accuracy = (400 + 530) / 1000 = 93% Precision = d / (b + d) = 530 / 580 = 91.4% Recall = d / (c + d) = 530 / 550 = 96.4%
  • 10. Example 4 5 0 " - " 5 5 0 " + " A c t u a l T e s t C a s e s : 4 0 0 5 3 0 P r e d i c t e d : 5 0 2 0 For this classifier: a = 400 b = 50 c = 20 d = 530 Accuracy = (400 + 530) / 1000 = 93% Precision = d / (b + d) = 530 / 580 = 91.4% Recall = d / (c + d) = 530 / 550 = 96.4%