Over the years there have been a lot of attempts to predict stock market movements using various techniques and hundreds of parameters. Some of the algorithms used are Exponential Moving Average and Head & Shoulders. Artificial Neural Networks and Genetic Algorithms are also used heavily. Many analysts use more traditional techniques such as P/E Ratio too. All these techniques used Stock Market prices, stock volumes traded and dividends paid etc. However there has been no single solution which has been perfected, generally an ensemble of algorithms are used for this purpose. However our attempt was to highlight how market and news/blogs sentiment can be harnessed and used for predicting Stock Movements without these traditional techniques
1 of 12
Download to read offline
More Related Content
IRE major project group 22 IIITH
2. What ? Introduction
Why ? Uses
How ? Steps
Experiments
Results
Conclusion
References
Agenda
3. Prediction of Stock Market movements and approx. percentage
changes without use of traditional stock algorithms.
Using the cauldron of emotions, thought processes and sentiments
expressed in news articles and blogs related to financial news.
Multiple techniques of IRE used.
Statistical Models derived and experimented with.
Exciting results
What? - Introduction
4. Attempt to find out
whether sentiments have direct relationship with Stock Market
movements
techniques for disambiguation of entities extracted from free
text
RealWorld Uses
Guide a stock market trader/investor to gauge the market
approximately as expressed by co-traders, journalists , bloggers
etc.
Why ? - Uses
5. How ?
Google News
Alerts (For
Stock Market
News).
Parse
news
articles /
blogs etc.
Extract entities
using entity
extraction
libraries from
GATE.
Calculate multiple
sentiments using
entity position
tags and various
keywords such as
Stock, English
andContextual
etc.
Mine sentiment
of the news
article as a
whole using
NLP libraries
from Stanford.
CollectTraining
datasets
Create
models
usingWEKA
( 10 Fold
Cross
Validation )
6. How ?
UBUNTU/LINUX
PENTAHO
JAVA
WEKA GATE
ETL MACHINE LEARNING PREDICTIVE ANALYSIS ENTITY EXTRACTION
MySQL
NEWS
FEED
ETL DB
MACHINE
LEARNING
PREDICTIVE
ANALYSIS
STOCK MARKET
PREDICTIONS
Training Date Range 1/1/2013 to
31/12/2013
No. of news articles ~35000
No. of unique
company/stock code
extracted from
news/blog articles
~1400
No. of Stock Keywords ~2156
No. of English
Keywords
~2838
No. of records used for
building model
~7000
7. Experiments (Classification)
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4535 59.6397 %
Incorrectly Classified Instances 3069 40.3603 %
Kappa statistic 0.0271
Mean absolute error 0.4853
Root mean squared error 0.499
Relative absolute error 103.6642 %
Root relative squared error 103.1445 %
Total Number of Instances 7604
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.846 0.822 0.633 0.846 0.724 0.531
DOWN
0.178 0.154 0.408 0.178 0.248 0.531 UP
WeightedAvg. 0.596 0.572 0.549 0.596 0.546
0.531
===Stratified cross-validation ===
===Summary ===
Correctly Classified Instances 4623 60.7969 %
Incorrectly Classified Instances 2981 39.2031 %
Kappa statistic 0.1212
Mean absolute error 0.4342
Root mean squared error 0.5242
Relative absolute error 92.753 %
Root relative squared error 108.3439 %
Total Number of Instances 7604
=== DetailedAccuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC
Area Class
0.757 0.641 0.664 0.757 0.707 0.578
DOWN
0.359 0.243 0.468 0.359 0.406 0.578 UP
WeightedAvg. 0.608 0.492 0.591 0.608 0.595
0.578
NATIVE BAYES RANDOM FOREST
8. === Cross-validation ===
=== Summary ===
Correlation coefficient 0.0666
Mean absolute error 2.9873
Root mean squared error 6.4104
Relative absolute error 99.4892 %
Root relative squared error 99.8253 %
Total Number of Instances 7604
Experiments (Regression)
SMOreg
11. The experiment was successful to a large extent for classifying the stock
movements over a period of time for both the classification models
The accuracy percentage achieved for classification was about 80% which
is significant.
However a lot challenges remain for predicting the percentage changes
just by using stock news.This maybe due the lack of following factors
Not using any stochastic parameters in the model
Disambiguation of Stock Code entities
Classification of Stock Companies in groups
The prediction is not RealTime as and when news is published hence it may introduce a lot
of noise.
There are a lot of uncertainties involved in predicting stock indices some of them are not
addressable by sentiments only
Conclusions