際際滷

際際滷Share a Scribd company logo
Mining Users Opinions
               in Hotel
              TEY JUN HONG U095074X
Content
      Background

 Formulating the problem

   Data Mining Process

       Techniques

        Analysis
What is Data Mining?

 Extraction of patterns

 Automatic Means

 Little human Interactions
The Web



http://www
Users Opinions in Hotel

 Identify Potential Hotel

 Predict what ASPECTS customers like

 Sales and Margin


Sentiment Analysis
Some Limitations of machines

 Unable to read like a human

 Cannot detect sarcasm

 Expression of sentiments in different topic and domain

 Polarity analysis

 Facts Vs Opinion
Some machine limitation examples

 The service is as good as none. Negation not obvious to
  machine

 Swimming pool is big enough to swim with comfort ,
  There is a big crowd at the counter complaining. Polarity
  might change with context.

 The room is warmer than the lobby. Comparisons are
  hard to classify
Project
Sentiment Analysis

 Prediction of sentence polarity

 Classification of polarity for sentiment lexicon

 Detection of relations
Data Mining Process
Cleaning The Dirty Reviews
 Frequent problem : Data inconsistencies

 Duplicate data

 Spelling Errors != Trim from data

 Foreign accent and characters

 Singular / Plural conversion

 Punctuations removal / replacement

 Noise and incomplete data

 Naming convention misused, same name but different meaning
Data Preprocessing

 Part Of Speech Tags
Data Preprocessing

 Polarity tagging using sentiment lexicon

            Occurrence
              HIGH
                                 Sentiment Lexicon
                                       Tag
             The Word
                                        +VE
             BEST
                                 Part of Speech Tag
                                        ADJ
Findings

 Part of Speech Tagging (POS) using Brill Tagger - NO
  PROBLEM

  -95% accuracy of POS tagging words after data cleaning
Findings

 Polarity tagging using sentiment lexicon  BIG PROBLEM

   -40% sentiment words not found in sentiment lexicon

 -10% sentiment words with a positive or negative polarity
    found are in the neutral section of sentiment lexicon
Problems

 Sentiment lexicon not comprehensive

 Domain Independent Sentiment Words

 Domain Dependent Sentiment Words
Solutions

 Rule Based Mining

 Relation Based Mining
Rule Based Mining
Relation Based Mining
Analysis - Bayesian
 To determine polarity of sentiments



                  P(X | Y) = P(X) P(Y | X) / P(Y)


 Probability that a sentiments is positive or negative, given
  it's contents
 P(sentiment | sentence) = P(sentiment)P(sentence |
  sentiment) / P(sentence)
Validation

   Precision = N (agree & found) / N (found)

   High precision means most of the correct sentiment
    words are found by the system

   Recall = N (agree & found) / N (agree)

   High recall means most of found sentiment words are
    correctly labeled by the system
Validation Results
Validation Results

 It is found that out of the 350 aspect-unlabelled sentiment
  word pairs,

 294 are founded by the methods. Thus, the precision is
  about 84%.

 The recall : 276 words are corrected labelled by the
  system, which is about 78%
Application

 Reviews Rating

 Aspect Rating

 Summary of reviews

More Related Content

Viewers also liked (8)

Fypca4
Fypca4Fypca4
Fypca4
Haha Teh
Fypca4
Fypca4Fypca4
Fypca4
Haha Teh
Teziv 1.3.1
Teziv 1.3.1Teziv 1.3.1
Teziv 1.3.1
cheenmoyprakash
CRM
CRMCRM
CRM
smit02
The way we will complain (sourced now)
The way we will complain (sourced now)The way we will complain (sourced now)
The way we will complain (sourced now)
Esteban Kolsky
D:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 ConceptsD:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 Concepts
michellelbuit0821
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience Management
Business Over Broadway
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
Sharon Moses
CRM
CRMCRM
CRM
smit02
The way we will complain (sourced now)
The way we will complain (sourced now)The way we will complain (sourced now)
The way we will complain (sourced now)
Esteban Kolsky
D:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 ConceptsD:\Agsb Subjects\Markma\My 10 Concepts
D:\Agsb Subjects\Markma\My 10 Concepts
michellelbuit0821
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience Management
Business Over Broadway
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
Sharon Moses

Similar to Fypca5 (18)

RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
bohanairl
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
bohanairl
8_POSNER_university_of Azad_Jammau_kashmir.pptx
8_POSNER_university_of Azad_Jammau_kashmir.pptx8_POSNER_university_of Azad_Jammau_kashmir.pptx
8_POSNER_university_of Azad_Jammau_kashmir.pptx
mh187782
Fyp ca2
Fyp ca2Fyp ca2
Fyp ca2
Haha Teh
Seminar1
Seminar1Seminar1
Seminar1
Natalia Ostapuk
Zizka synasc 2012
Zizka synasc 2012Zizka synasc 2012
Zizka synasc 2012
Natalia Ostapuk
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
padatascience
Additional2
Additional2Additional2
Additional2
Natalia Ostapuk
Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
Naveen Ashish
An overview of concepts of Sentiment Analysis
An overview of concepts of Sentiment AnalysisAn overview of concepts of Sentiment Analysis
An overview of concepts of Sentiment Analysis
Ravi Kumar Lanke
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.ppt
visheshs4
DETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTDETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENT
WarNik Chow
Taxonomies in Search
Taxonomies in SearchTaxonomies in Search
Taxonomies in Search
TSoholt
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
Enrico Santus Aversano
Additional1
Additional1Additional1
Additional1
Natalia Ostapuk
Lac presentation
Lac presentationLac presentation
Lac presentation
Roseline Antai
Zouaq wole2013
Zouaq wole2013Zouaq wole2013
Zouaq wole2013
Amal Zouaq
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
bohanairl
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
bohanairl
8_POSNER_university_of Azad_Jammau_kashmir.pptx
8_POSNER_university_of Azad_Jammau_kashmir.pptx8_POSNER_university_of Azad_Jammau_kashmir.pptx
8_POSNER_university_of Azad_Jammau_kashmir.pptx
mh187782
Fyp ca2
Fyp ca2Fyp ca2
Fyp ca2
Haha Teh
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
padatascience
Deep Machine Reading
Deep Machine ReadingDeep Machine Reading
Deep Machine Reading
Naveen Ashish
An overview of concepts of Sentiment Analysis
An overview of concepts of Sentiment AnalysisAn overview of concepts of Sentiment Analysis
An overview of concepts of Sentiment Analysis
Ravi Kumar Lanke
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.ppt
visheshs4
DETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENTDETECTING OXYMORON IN A SINGLE STATEMENT
DETECTING OXYMORON IN A SINGLE STATEMENT
WarNik Chow
Taxonomies in Search
Taxonomies in SearchTaxonomies in Search
Taxonomies in Search
TSoholt
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
Enrico Santus Aversano
Zouaq wole2013
Zouaq wole2013Zouaq wole2013
Zouaq wole2013
Amal Zouaq
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta

Fypca5

  • 1. Mining Users Opinions in Hotel TEY JUN HONG U095074X
  • 2. Content Background Formulating the problem Data Mining Process Techniques Analysis
  • 3. What is Data Mining? Extraction of patterns Automatic Means Little human Interactions
  • 5. Users Opinions in Hotel Identify Potential Hotel Predict what ASPECTS customers like Sales and Margin Sentiment Analysis
  • 6. Some Limitations of machines Unable to read like a human Cannot detect sarcasm Expression of sentiments in different topic and domain Polarity analysis Facts Vs Opinion
  • 7. Some machine limitation examples The service is as good as none. Negation not obvious to machine Swimming pool is big enough to swim with comfort , There is a big crowd at the counter complaining. Polarity might change with context. The room is warmer than the lobby. Comparisons are hard to classify
  • 9. Sentiment Analysis Prediction of sentence polarity Classification of polarity for sentiment lexicon Detection of relations
  • 11. Cleaning The Dirty Reviews Frequent problem : Data inconsistencies Duplicate data Spelling Errors != Trim from data Foreign accent and characters Singular / Plural conversion Punctuations removal / replacement Noise and incomplete data Naming convention misused, same name but different meaning
  • 12. Data Preprocessing Part Of Speech Tags
  • 13. Data Preprocessing Polarity tagging using sentiment lexicon Occurrence HIGH Sentiment Lexicon Tag The Word +VE BEST Part of Speech Tag ADJ
  • 14. Findings Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy of POS tagging words after data cleaning
  • 15. Findings Polarity tagging using sentiment lexicon BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
  • 16. Problems Sentiment lexicon not comprehensive Domain Independent Sentiment Words Domain Dependent Sentiment Words
  • 17. Solutions Rule Based Mining Relation Based Mining
  • 20. Analysis - Bayesian To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given it's contents P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • 21. Validation Precision = N (agree & found) / N (found) High precision means most of the correct sentiment words are found by the system Recall = N (agree & found) / N (agree) High recall means most of found sentiment words are correctly labeled by the system
  • 23. Validation Results It is found that out of the 350 aspect-unlabelled sentiment word pairs, 294 are founded by the methods. Thus, the precision is about 84%. The recall : 276 words are corrected labelled by the system, which is about 78%
  • 24. Application Reviews Rating Aspect Rating Summary of reviews

Editor's Notes

  • #4: Process of exploration and analysisBy automatic / semi automatic meansWith little or no human interactionsTo discover meaningful patterns and rulesExponential growth of users opinionsLimitations of human analysisAccuracy of human analysisMachines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
  • #6: Increase in social media and web user Increase in valuable opinion oriented data in Hotel due to web expansionIdentify potential hotel to stay by looking at the aspectsIdentify best prospects (ASPECTS), and retain customersPredict what ASPECTS customers like and promote accordinglyLearn parameters influencing trends in sales and margins Identification of opinions for customers