際際滷

際際滷Share a Scribd company logo
CRICASTROCRICKET GAME PREDICTOR
Presented By
Faraz Javed
1822-FBAS/BSSE/F12
Usama Tasneem
1814-FBAS/BSSE/F12
Tell Me
about the
system
 Predict Cricket
Match
 Historical &
social media
Data
 Accurate Results
Why
Predicting
Cricket?
 2  3 BILLION
Television
Audience is a fan
of cricket game.
 Cricket is a
million dollar
game
AIM &
Objective
 A consistent
statistical method
to predict results
 Develop a dataset
containing vital
attributes
 Predict the
outcome before
match is being
played
Historical Data
SYSTEM
DESIGN
Data
Collection
Data
Filtration
Feature
Construction
Training
Testing
Prediction
HISTORICAL
DATA
Social Media Data
SYSTEM
DESIGN
Data
Collection
Data
Filtration
Feature
Reduction
Training
Testing
Prediction
SOCIAL
MEDIA
Historical Data
SYSTEM
Implementation
Data
Collection
Data
Filtration
Feature
Construction
Training
Testing
Prediction
HISTORICAL
DATA
Historical Data Block Diagram
Social Media Data
SYSTEM
Implementation
Data
Collection
Data
Filtration
Feature
Reduction
Training
Testing
Prediction
SOCIAL
MEDIA
Social Media Data Block Diagram
Prediction
through
Streaming
tweets
Any Question?
THANK
YOU

More Related Content

Editor's Notes

  • #3: Our system will predict the outcome of the cricket match on the basis of historical & data from social media In this project, different approaches for a new time series prediction problem i.e. predicting the outcome of One-Day International (ODI) cricket match has been presented. With the combination of both Historical Data and social media data we are able to predict the result more accurately.
  • #4: Todays sports professionals include not only the sportsmen actively participating in the game, but also their coaches, trainers, physiotherapists, and in many cases, strategists. Players and team management (collectively often referred to as the team think-tank in sports) perform as a human expert system, relying on experiences, expertise and analytic ability to arrive at the best-possible course of action before as well as during a game. Vast amount of raw data and statistics are available to aid in the decision-making process, but determining what it takes to win a game is extremely challenging.
  • #5: The primary aim of this project is to establish a consistent statistical approach to predict the outcome of the match. To develop a dataset containing vital attributes that define match outcome. To predict the outcome of the match before match is being played. To help teams to be more focused on the match according to the prediction.
  • #7: Data Collection: Previous match data was scraped from the cricket site cricsheet.org Data Filtration: The data from cricsheet have ball by ball data for every single match. We dont need ball by ball data, we just need summarize data that can give us the complete picture of the match, and for this purpose we perform data filtration. Feature Construction 31 features are formed with in a clear hierarchy with 3 different levels. Basic Feature Net Features Difference Features Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. Testing We use 30% of our data set to check the accuracy of our results. Prediction With a specific end goal to make prediction we simply need to give the name of the team and rival name the system will give the expectation of the match in the wake of figuring diverse task. This is further explain in next section.
  • #9: Data Collection: The first challenge was to collect the right data. We applied multiple queries to fetch huge data from twitter. Once we got the data, we extracted the selected attributes that helped us in data filtering step Data Filtration: After obtaining the data, we filtered the tweets from spam user or spam content. We considered several factors that classify the tweets from spam or ham. Feature Reduction: Structured Tweets are generally in sentence format, with URLs specified for images or blog articles. To get data that is in usable format we remove the stop words that contains general terms like a, the, etc and emoticons. Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction Testing We use 30% of our data set to check the accuracy of our results. Prediction In order to make prediction we just have to give the name of team and opponent name the system will give the prediction of the match after calculating different task. This is further explain in next chapter
  • #11: Data Collection: Data was collected from cricsheet. The data is provided inYAMLformat, a human-readable data format. There are libraries available to parse this in multiple languages. In order to summarize data we have use R API called Yorkr. This R package can be used to analyze performances of cricketers based on match data fromCricsheet Using this R API we make processing on that yaml data to create database for all Matches. Team Opponent Venue Date Runs Scored Overs Bat Wicket Lost Runs Conceded Overs Bowled Wicket Taken Result Feature Construction After applyingYork ron data, we retrieved the stored data fromMongodB using apache spark for feature construction. Then 31 features are created that list down in chapter 3. These featuresare then savedina CSV file Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. We have used five different model for training our data set. Naive Bayes, Logistic Regression, Random Forests, Decision Trees and SVM. Testing We use 30% of our data set to check the accuracy of our results. This 30% datawere randomly testedwith all above mentioned models to test the accuracy. Using Na誰veBayes, wegota maximumof accuracy that is 68% approximately. Prediction In order to make predictions we just have to give the name of team and opponent name the system gets the features from historical data for these two teams analyze the record and predict the outcome of the match
  • #14: Data Collection: We applied multiple queries to fetch huge data from twitter. The whole process was completed in 5 steps Queries Twitter API Result Data extraction Saving in MongoDB Data Filtration: We retrieved the stored data from mongodb using apache spark and passed it through a series of filtration steps. We considered 6 factors that classify the tweets from spam or ham. Those factors are Content requesting re-tweets and follows Short content length Large numbers of hashtags Bot-Friendly content source Users that create little content Feature Reduction We performed following operations on tweets in cleansing and normalizing phase. Remove Retweets Replace Usernames Replace URLs Remove Repeated letters Remove Short Words Remove Stop Words Remove Non English Words Training For training and classification we use 70% of our data for training. Use multiple models on that data so that we can improve our prediction. Testing We use 30% of our data set to check the accuracy of our results. This 30% datawere randomly testedwith all above mentioned models to test the accuracy Prediction In order to make predictions we just have to give the name of team and opponent name the system gets the features from historical data for these two teams analyze the record and predict the outcome of the match
  • #16: PREDICTION THROUGH STREAMING TWEETS: Twitter opensourced its Hosebird client (hbc),a robust Java HTTP library for consuming Twitters Streaming API. We used hbcto create a Kafka twitter stream producer, which tracked our query terms in twitter statusesand produced a Kafka stream out of it, which was utilized later for sending that data from Kafka to Spark Streaming. SPARK STREAMING: Spark Streaming is a real-time processing tool that runs on top of the Spark engine. Apache Zookeeper Apache Zookeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination. Apache Kafka Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.