際際滷

際際滷Share a Scribd company logo
IRE major project group 22 IIITH
What ?  Introduction
Why ?  Uses
How ?  Steps
Experiments
Results
Conclusion
References
Agenda
Prediction of Stock Market movements and approx. percentage
changes without use of traditional stock algorithms.
Using the cauldron of emotions, thought processes and sentiments
expressed in news articles and blogs related to financial news.
Multiple techniques of IRE used.
Statistical Models derived and experimented with.
Exciting results
What? - Introduction
Attempt to find out
 whether sentiments have direct relationship with Stock Market
movements
 techniques for disambiguation of entities extracted from free
text
RealWorld Uses
 Guide a stock market trader/investor to gauge the market
approximately as expressed by co-traders, journalists , bloggers
etc.
Why ? - Uses
How ?
Google News
Alerts (For
Stock Market
News).
Parse
news
articles /
blogs etc.
Extract entities
using entity
extraction
libraries from
GATE.
Calculate multiple
sentiments using
entity position
tags and various
keywords such as
Stock, English
andContextual
etc.
Mine sentiment
of the news
article as a
whole using
NLP libraries
from Stanford.
CollectTraining
datasets
Create
models
usingWEKA
( 10 Fold
Cross
Validation )
How ?
UBUNTU/LINUX
PENTAHO
JAVA
WEKA GATE
ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION
MySQL
NEWS
FEED
ETL DB
MACHINE
LEARNING
PREDICTIVE
ANALYSIS
STOCK MARKET
PREDICTIONS
Training Date Range 1/1/2013 to
31/12/2013
No. of news articles ~35000
No. of unique
company/stock code
extracted from
news/blog articles
~1400
No. of Stock Keywords ~2156
No. of English
Keywords
~2838
No. of records used for
building model
~7000
Experiments (Classification)
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4535 59.6397 %
Incorrectly Classified Instances 3069 40.3603 %
Kappa statistic 0.0271
Mean absolute error 0.4853
Root mean squared error 0.499
Relative absolute error 103.6642 %
Root relative squared error 103.1445 %
Total Number of Instances 7604
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.846 0.822 0.633 0.846 0.724 0.531
DOWN
0.178 0.154 0.408 0.178 0.248 0.531 UP
WeightedAvg. 0.596 0.572 0.549 0.596 0.546
0.531
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4623 60.7969 %
Incorrectly Classified Instances 2981 39.2031 %
Kappa statistic 0.1212
Mean absolute error 0.4342
Root mean squared error 0.5242
Relative absolute error 92.753 %
Root relative squared error 108.3439 %
Total Number of Instances 7604
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.757 0.641 0.664 0.757 0.707 0.578
DOWN
0.359 0.243 0.468 0.359 0.406 0.578 UP
WeightedAvg. 0.608 0.492 0.591 0.608 0.595
0.578
NATIVE BAYES RANDOM FOREST
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.0666
Mean absolute error 2.9873
Root mean squared error 6.4104
Relative absolute error 99.4892 %
Root relative squared error 99.8253 %
Total Number of Instances 7604
Experiments (Regression)
SMOreg
Results
NATIVE BAYES RANDOM FOREST
Classification of Stock Movements
Results
Prediction of Stock Percentages (SMOreg)
APPLE STOCK INDEX MICROSOFT STOCK INDEX
The experiment was successful to a large extent for classifying the stock
movements over a period of time for both the classification models
The accuracy percentage achieved for classification was about 80% which
is significant.
However a lot challenges remain for predicting the percentage changes
just by using stock news.This maybe due the lack of following factors
 Not using any stochastic parameters in the model
 Disambiguation of Stock Code entities
 Classification of Stock Companies in groups
 The prediction is not RealTime as and when news is published hence it may introduce a lot
of noise.
 There are a lot of uncertainties involved in predicting stock indices some of them are not
addressable by sentiments only
Conclusions
https://gate.ac.uk/
http://www-nlp.stanford.edu/
http://weka.wikispaces.com/
http://www.pentaho.com/
References

More Related Content

IRE major project group 22 IIITH

  • 2. What ? Introduction Why ? Uses How ? Steps Experiments Results Conclusion References Agenda
  • 3. Prediction of Stock Market movements and approx. percentage changes without use of traditional stock algorithms. Using the cauldron of emotions, thought processes and sentiments expressed in news articles and blogs related to financial news. Multiple techniques of IRE used. Statistical Models derived and experimented with. Exciting results What? - Introduction
  • 4. Attempt to find out whether sentiments have direct relationship with Stock Market movements techniques for disambiguation of entities extracted from free text RealWorld Uses Guide a stock market trader/investor to gauge the market approximately as expressed by co-traders, journalists , bloggers etc. Why ? - Uses
  • 5. How ? Google News Alerts (For Stock Market News). Parse news articles / blogs etc. Extract entities using entity extraction libraries from GATE. Calculate multiple sentiments using entity position tags and various keywords such as Stock, English andContextual etc. Mine sentiment of the news article as a whole using NLP libraries from Stanford. CollectTraining datasets Create models usingWEKA ( 10 Fold Cross Validation )
  • 6. How ? UBUNTU/LINUX PENTAHO JAVA WEKA GATE ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION MySQL NEWS FEED ETL DB MACHINE LEARNING PREDICTIVE ANALYSIS STOCK MARKET PREDICTIONS Training Date Range 1/1/2013 to 31/12/2013 No. of news articles ~35000 No. of unique company/stock code extracted from news/blog articles ~1400 No. of Stock Keywords ~2156 No. of English Keywords ~2838 No. of records used for building model ~7000
  • 7. Experiments (Classification) ===Stratified cross-validation === ===Summary === Correctly Classified Instances 4535 59.6397 % Incorrectly Classified Instances 3069 40.3603 % Kappa statistic 0.0271 Mean absolute error 0.4853 Root mean squared error 0.499 Relative absolute error 103.6642 % Root relative squared error 103.1445 % Total Number of Instances 7604 === DetailedAccuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.846 0.822 0.633 0.846 0.724 0.531 DOWN 0.178 0.154 0.408 0.178 0.248 0.531 UP WeightedAvg. 0.596 0.572 0.549 0.596 0.546 0.531 ===Stratified cross-validation === ===Summary === Correctly Classified Instances 4623 60.7969 % Incorrectly Classified Instances 2981 39.2031 % Kappa statistic 0.1212 Mean absolute error 0.4342 Root mean squared error 0.5242 Relative absolute error 92.753 % Root relative squared error 108.3439 % Total Number of Instances 7604 === DetailedAccuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.757 0.641 0.664 0.757 0.707 0.578 DOWN 0.359 0.243 0.468 0.359 0.406 0.578 UP WeightedAvg. 0.608 0.492 0.591 0.608 0.595 0.578 NATIVE BAYES RANDOM FOREST
  • 8. === Cross-validation === === Summary === Correlation coefficient 0.0666 Mean absolute error 2.9873 Root mean squared error 6.4104 Relative absolute error 99.4892 % Root relative squared error 99.8253 % Total Number of Instances 7604 Experiments (Regression) SMOreg
  • 9. Results NATIVE BAYES RANDOM FOREST Classification of Stock Movements
  • 10. Results Prediction of Stock Percentages (SMOreg) APPLE STOCK INDEX MICROSOFT STOCK INDEX
  • 11. The experiment was successful to a large extent for classifying the stock movements over a period of time for both the classification models The accuracy percentage achieved for classification was about 80% which is significant. However a lot challenges remain for predicting the percentage changes just by using stock news.This maybe due the lack of following factors Not using any stochastic parameters in the model Disambiguation of Stock Code entities Classification of Stock Companies in groups The prediction is not RealTime as and when news is published hence it may introduce a lot of noise. There are a lot of uncertainties involved in predicting stock indices some of them are not addressable by sentiments only Conclusions