CricAstro

•Download as PPTX, PDF•

0 likes•160 views

The document describes a cricket match prediction system that uses both historical and social media data. The system aims to develop a consistent statistical method to predict cricket match results by collecting vital attributes from a large dataset containing historical match statistics and real-time social media information. It analyzes this data using machine learning techniques to provide accurate predictions of cricket match outcomes before matches are played.

CRICASTROCRICKET GAME PREDICTOR
Presented By
Faraz Javed
1822-FBAS/BSSE/F12
Usama Tasneem
1814-FBAS/BSSE/F12

Tell Me
about the
system
 Predict Cricket
Match
 Historical &
social media
Data
 Accurate Results

Why
Predicting
Cricket?
 2 – 3 BILLION
Television
Audience is a fan
of cricket game.
 Cricket is a
million dollar
game

AIM &
Objective
 A consistent
statistical method
to predict results
 Develop a dataset
containing vital
attributes
 Predict the
outcome before
match is being
played

Data
Collection
Data
Filtration
Feature
Construction
Training
Testing
Prediction
HISTORICAL
DATA

Data
Collection
Data
Filtration
Feature
Reduction
Training
Testing
Prediction
SOCIAL
MEDIA

CricAstro

1. CRICASTROCRICKET GAME PREDICTOR Presented By Faraz Javed 1822-FBAS/BSSE/F12 Usama Tasneem 1814-FBAS/BSSE/F12

2. Tell Me about the system  Predict Cricket Match  Historical & social media Data  Accurate Results

3. Why Predicting Cricket?  2 – 3 BILLION Television Audience is a fan of cricket game.  Cricket is a million dollar game

4. AIM & Objective  A consistent statistical method to predict results  Develop a dataset containing vital attributes  Predict the outcome before match is being played

5. Historical Data SYSTEM DESIGN

6. Data Collection Data Filtration Feature Construction Training Testing Prediction HISTORICAL DATA

7. Social Media Data SYSTEM DESIGN

8. Data Collection Data Filtration Feature Reduction Training Testing Prediction SOCIAL MEDIA

9. Historical Data SYSTEM Implementation

10. Data Collection Data Filtration Feature Construction Training Testing Prediction HISTORICAL DATA

11. Historical Data Block Diagram

12. Social Media Data SYSTEM Implementation

13. Data Collection Data Filtration Feature Reduction Training Testing Prediction SOCIAL MEDIA

14. Social Media Data Block Diagram

15. Prediction through Streaming tweets

16. Any Question? THANK YOU

Editor's Notes

#3: Our system will predict the outcome of the cricket match on the basis of historical & data from social media In this project, different approaches for a new time series prediction problem i.e. predicting the outcome of One-Day International (ODI) cricket match has been presented. With the combination of both Historical Data and social media data we are able to predict the result more accurately.
#4: Today’s sports professionals include not only the sportsmen actively participating in the game, but also their coaches, trainers, physiotherapists, and in many cases, strategists. Players and team management (collectively often referred to as the team think-tank in sports) perform as a “human expert system”, relying on experiences, expertise and analytic ability to arrive at the best-possible course of action before as well as during a game. Vast amount of raw data and statistics are available to aid in the decision-making process, but determining what it takes to win a game is extremely challenging.
#5: The primary aim of this project is to establish a consistent statistical approach to predict the outcome of the match. To develop a dataset containing vital attributes that define match outcome. To predict the outcome of the match before match is being played. To help teams to be more focused on the match according to the prediction.
#7: Data Collection: Previous match data was scraped from the cricket site cricsheet.org Data Filtration: The data from cricsheet have ball by ball data for every single match. We don’t need ball by ball data, we just need summarize data that can give us the complete picture of the match, and for this purpose we perform data filtration. Feature Construction 31 features are formed with in a clear hierarchy with 3 different levels. Basic Feature Net Features Difference Features Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. Testing We use 30% of our data set to check the accuracy of our results. Prediction With a specific end goal to make prediction we simply need to give the name of the team and rival name the system will give the expectation of the match in the wake of figuring diverse task. This is further explain in next section.
#9: Data Collection: The first challenge was to collect the right data. We applied multiple queries to fetch huge data from twitter. Once we got the data, we extracted the selected attributes that helped us in data filtering step Data Filtration: After obtaining the data, we filtered the tweets from spam user or spam content. We considered several factors that classify the tweets from spam or ham. Feature Reduction: Structured Tweets are generally in sentence format, with URLs specified for images or blog articles. To get data that is in usable format we remove the stop words that contains general terms like a, the, etc and emoticons. Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction Testing We use 30% of our data set to check the accuracy of our results. Prediction In order to make prediction we just have to give the name of team and opponent name the system will give the prediction of the match after calculating different task. This is further explain in next chapter
#11: Data Collection: Data was collected from cricsheet. The data is provided in YAML format, a human-readable data format. There are libraries available to parse this in multiple languages. In order to summarize data we have use R API called Yorkr. This R package can be used to analyze performances of cricketers based on match data from Cricsheet Using this R API we make processing on that yaml data to create database for all Matches. Team Opponent Venue Date Runs Scored Overs Bat Wicket Lost Runs Conceded Overs Bowled Wicket Taken Result Feature Construction After applying York r on data, we retrieved the stored data from Mongo dB using apache spark for feature construction. Then 31 features are created that list down in chapter 3. These features are then saved in a CSV file Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. We have used five different model for training our data set. Naive Bayes, Logistic Regression, Random Forests, Decision Trees and SVM. Testing We use 30% of our data set to check the accuracy of our results. This 30% data were randomly tested with all above mentioned models to test the accuracy. Using Naïve Bayes, we got a maximum of accuracy that is 68% approximately. Prediction In order to make predictions we just have to give the name of team and opponent name the system gets the features from historical data for these two teams analyze the record and predict the outcome of the match
#14: Data Collection: We applied multiple queries to fetch huge data from twitter. The whole process was completed in 5 steps Queries Twitter API Result Data extraction Saving in MongoDB Data Filtration: We retrieved the stored data from mongodb using apache spark and passed it through a series of filtration steps. We considered 6 factors that classify the tweets from spam or ham. Those factors are Content requesting re-tweets and follows Short content length Large numbers of hashtags Bot-Friendly content source Users that create little content Feature Reduction We performed following operations on tweets in cleansing and normalizing phase. Remove Retweets Replace Usernames Replace URLs Remove Repeated letters Remove Short Words Remove Stop Words Remove Non English Words Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. Testing We use 30% of our data set to check the accuracy of our results. This 30% data were randomly tested with all above mentioned models to test the accuracy Prediction In order to make predictions we just have to give the name of team and opponent name the system gets the features from historical data for these two teams analyze the record and predict the outcome of the match
#16: PREDICTION THROUGH STREAMING TWEETS: Twitter open sourced its Hosebird client (hbc), a robust Java HTTP library for consuming Twitter’s Streaming API. We used hbc to create a Kafka twitter stream producer, which tracked our query terms in twitter statuses and produced a Kafka stream out of it, which was utilized later for sending that data from Kafka to Spark Streaming. SPARK STREAMING: Spark Streaming is a real-time processing tool that runs on top of the Spark engine. Apache Zookeeper Apache Zookeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination. Apache Kafka Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

�ݺ�ߣ

CricAstro

More Related Content

CricAstro

Editor's Notes