�ݺ�ߣ

IRE major project group 22 IIITH

What ? – Introduction
Why ? – Uses
How ? – Steps
Experiments
Results
Conclusion
References
Agenda

Prediction of Stock Market movements and approx. percentage
changes without use of traditional stock algorithms.
Using the cauldron of emotions, thought processes and sentiments
expressed in news articles and blogs related to financial news.
Multiple techniques of IRE used.
Statistical Models derived and experimented with.
Exciting results
What? - Introduction

Attempt to find out
• whether sentiments have direct relationship with Stock Market
movements
• techniques for disambiguation of entities extracted from free
text
RealWorld Uses
• Guide a stock market trader/investor to gauge the market
approximately as expressed by co-traders, journalists , bloggers
etc.
Why ? - Uses

How ?
Google News
Alerts (For
Stock Market
News).
Parse
news
articles /
blogs etc.
Extract entities
using entity
extraction
libraries from
GATE.
Calculate multiple
sentiments using
entity position
tags and various
keywords such as
Stock, English
andContextual
etc.
Mine sentiment
of the news
article as a
whole using
NLP libraries
from Stanford.
CollectTraining
datasets
Create
models
usingWEKA
( 10 Fold
Cross
Validation )

How ?
UBUNTU/LINUX
PENTAHO
JAVA
WEKA GATE
ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION
MySQL
NEWS
FEED
ETL DB
MACHINE
LEARNING
PREDICTIVE
ANALYSIS
STOCK MARKET
PREDICTIONS
Training Date Range 1/1/2013 to
31/12/2013
No. of news articles ~35000
No. of unique
company/stock code
extracted from
news/blog articles
~1400
No. of Stock Keywords ~2156
No. of English
Keywords
~2838
No. of records used for
building model
~7000

Experiments (Classification)
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4535 59.6397 %
Incorrectly Classified Instances 3069 40.3603 %
Kappa statistic 0.0271
Mean absolute error 0.4853
Root mean squared error 0.499
Relative absolute error 103.6642 %
Root relative squared error 103.1445 %
Total Number of Instances 7604
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.846 0.822 0.633 0.846 0.724 0.531
DOWN
0.178 0.154 0.408 0.178 0.248 0.531 UP
WeightedAvg. 0.596 0.572 0.549 0.596 0.546
0.531
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4623 60.7969 %
Incorrectly Classified Instances 2981 39.2031 %
Kappa statistic 0.1212
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.757 0.641 0.664 0.757 0.707 0.578
DOWN
0.359 0.243 0.468 0.359 0.406 0.578 UP
WeightedAvg. 0.608 0.492 0.591 0.608 0.595
0.578
NATIVE BAYES RANDOM FOREST

=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.0666
Experiments (Regression)
SMOreg

Results
NATIVE BAYES RANDOM FOREST
Classification of Stock Movements

Results
Prediction of Stock Percentages (SMOreg)
APPLE STOCK INDEX MICROSOFT STOCK INDEX

The experiment was successful to a large extent for classifying the stock
movements over a period of time for both the classification models
The accuracy percentage achieved for classification was about 80% which
is significant.
However a lot challenges remain for predicting the percentage changes
just by using stock news.This maybe due the lack of following factors
• Not using any stochastic parameters in the model
• Disambiguation of Stock Code entities
• Classification of Stock Companies in groups
• The prediction is not RealTime ‘as and when’ news is published hence it may introduce a lot
of noise.
• There are a lot of uncertainties involved in predicting stock indices some of them are not
addressable by sentiments only
Conclusions

https://gate.ac.uk/
http://www-nlp.stanford.edu/
http://weka.wikispaces.com/
http://www.pentaho.com/
References

�ݺ�ߣ

IRE major project group 22 IIITH

More Related Content

IRE major project group 22 IIITH