際際滷

際際滷Share a Scribd company logo
Users
Opinions
in Hotel
                                    TEY JUN HONG
                                      U095074X
 National University Of Singapore
Content
    1. Background
   2.Formulating the
         problem
3. Data Mining Process
     4. Techniques
      5. Analysis



          01
What is Data
       Mining?
 Extraction of meaningful /
  useful / Interesting patterns
  from a large volume of data
  sources
 In this project, the source will
  be large volume of WEB HOTEL
  REVIEWS data
 Data mining is one of the top
  ten emerging technology
            MITs TECHNOLOGY REVIEW 2004
What is Data
      Mining?
 Process of exploration and
  analysis
 By automatic / semi automatic
  means
 With little or no human
  interactions
 To discover meaningful
  patterns and rulesAND LINOFF, 2000
       MASTERING DATA MINING BY BERRY
Users Opinions in
 Increase in social
         Hotel
  media and web user
 Increase in valuable
  opinion oriented data
  in Hotel due to web
  expansion
 Identify potential hotel
  to stay by looking at
  the aspects
 Overall Sentiments on
  hotel are greatly
  sought on the web for
What can Data Mining
   Identify best prospects
            do?
    (ASPECTS), and retain
    customers
   Predict what ASPECTS
    customers like and
    promote accordingly
   Learn parameters
    influencing trends in
    sales and margins
   Identification of
    opinions for customers
What are the
 Exponential growth of
    problems?
  users opinions
 Limitations of human
  analysis
 Accuracy of human
  analysis

Machines can be trained
 to take over human
 analysis with advanced
 computer technology
 and it is done with LOW
Some Limitations of
  Unable to read like a
      machines
   human
  No emotions
  Cannot detect
   sarcasm
  Expression of
   sentiments in different
   topic and domain
  Polarity analysis
  Facts Vs Opinion
Some machine
   The service is as
limitation examples
    good as none.
    Negation not obvious
    to machine

   Swimming pool is big
    enough to swim with
    comfort , There is a
    big crowd at the
    counter complaining.
    Polarity might change
    with context.
Sentiment
 Analysis
Machine
     Learning
 A tool for data mining and
  intelligent decision support
 Application of computer
  algorithms that improve
  automatically through
  experience


      MASTERING DATA MINING BY BERRY AND LINOFF, 2000
Types of Machine
 Supervised Learning
       learning
   A training set is
    provided (data with
    correct answers)
    which is used to mine
    for known pattern
 Unsupervised Learning
   Data are provided
    with no prior
    knowledge of the
    hidden patterns that
    they contain.
Supervised Learning
   Rule Mining and Rule
      techniques
    learning
   Bayesian Networks
   Support Vector
    Machine
Project
    Objective
 Prediction of sentence polarity
 Classification of polarity for
  sentiment lexicon
 Detection of relations
Pre-requisite
 Large data set
 Relevant Prior
  Knowledge to domain,
  in our case the hotel
  domain
   Eg. Rating
 Sentiment lexicon for
  sentiment analysis
 Data selection for
  reliability and
  standards
Data Mining Process
Cleaning the Dirty
 Frequent problem : Data
Data (60% of effort)
  inconsistencies
 Duplicate data
 Spelling Errors != Trim from
    data
   Foreign accent and characters
   Singular / Plural conversion
   Punctuations removal /
    replacement
   Noise and incomplete data
   Naming convention misused,
Data Preprocessing
   Part of Speech Tagging (POS)
         (Laundering)
    using Brill Tagger




   Polarity tagging using
Findings
   Part of Speech Tagging (POS)
    using Brill Tagger - NO
    PROBLEM
     -95% accuracy POS tagging
       words after data cleaning
Findings
Polarity tagging using
 sentiment lexicon  BIG
 PROBLEM
-40% sentiment words not found
        in sentiment lexicon
  -10% sentiment words with a
    positive or negative polarity
 found are in the neutral section
       of sentiment lexicon
Problems
   Sentiment lexicon not
    comprehensive to fulfill
    machine learning technique
    adopted
   Polarity of sentiment words
    who are domain dependent are
    founded in neutral section of
    sentiment lexicon
   Polarity of sentiment words
    can also change within the
    domain even though they are
    domain dependent
Solution
 Classify the polarity of
  unlabeled sentiment word
  using rule based mining
 Classify domain dependent
  sentiment words
 Establish word relations
  between labeled and unlabeled
  sentiment words
Data Processing
    Rule based mining using
     conjunction and punctuation
    Polarity Assignment Rules

       Same           Adj  AND/OR - Adj

      Opposite     Neg - Adj  AND/OR - Adj /
                    Adj  AND/OR - Neg- Adj
       Same      Neg - Adj  AND/OR - Neg- Adj

      Opposite        Adj  BUT/NOR  Adj

       Same       Neg - Adj  BUT/NOR - Adj /
                   Adj  BUT/NOR - Neg- Adj
      Opposite   Neg - Adj  BUT/NOR - Neg- Adj

       Same                 Adj , Adj
Data Processing
   Relation Network  Aspect 
    Sentiment word pair
Data Processing
   Relation Network  Aspect 
    Sentiment word pair
Analysis
 Using the expanded sentiment
  lexicon, we analyze the polarity
  sentiment by doing a sentiment
  lookup using Bayesian Network
Bayesian
   To determine polarity of
    sentiments

     P(X | Y) = P(X) P(Y | X) / P(Y)


   Probability that a sentiments is
    positive or negative, given it's
    contents
   Assumptions: There is no link
    between words
   P(sentiment | sentence) =
Validation
 Precision = N (agree & found) /
  N (found)
 High precision means most of
  the correct sentiment words
  are found by the system
 Recall = N (agree & found) / N
  (agree)
 High recall means most of
Validation Results
   It is found that out of the 350
    aspect-unlabelled sentiment
    word pairs,
   Only 194 are founded by the
    methods. Thus, the precision is
    about 57%.
   The recall is also not very high;
    only 126 words are corrected
    labelled by the system, which is
    about 63%.
Discussion
   The results will improve if more
    rules are applied such the
    inclusion of more adverbs such
    as excessively as negation
    words.
   There might not be enough
    dataset for the system to work
    on. There are only 350 aspect-
    unlabelled sentiment word
    pairs for the application to
    work with.
   This, however requires more
Conclusion
 Comprehensive Sentiment
  Lexicon is a simple yet
  effective solution to sentiment
  analysis as it does not requires
  prior training
 Current sentiment lexicon does
  not capture such domain and
  context sensitivities of
  sentiment expressions
Conclusion
 This leads to poor coverage
 Thus, expanding general
  sentiment lexicon to capture
  domain and context
  sensitivities of sentiment
  expressions are advocated
Question
  s?
   01   DEMO

More Related Content

What's hot (19)

A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
Panos Alexopoulos
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviews
iosrjce
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Amenda Joy
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
Gan Keng Hoon
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
Ahmed Magdy Ezzeldin, MSc.
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
Final deck
Final deckFinal deck
Final deck
Swapna Lekkala
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
Rachna Raveendran
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
fmguler
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Panos Alexopoulos
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
alessio_ferrari
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
Jonathas Magalh達es
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Sagar Ahire
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
GunjanSrivastava23
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
Traian Rebedea
Using nvivo to tell the story, the power of coding
Using nvivo to tell the story, the power of codingUsing nvivo to tell the story, the power of coding
Using nvivo to tell the story, the power of coding
QSR International
Arabic question answering
Arabic question answering Arabic question answering
Arabic question answering
Arabic_NLP_ImamU2013
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
PratisthaSingh5
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
Manuel Coppotelli
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
Panos Alexopoulos
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviews
iosrjce
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Amenda Joy
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
Gan Keng Hoon
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
Ahmed Magdy Ezzeldin, MSc.
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
Rachna Raveendran
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
fmguler
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Panos Alexopoulos
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
alessio_ferrari
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
Jonathas Magalh達es
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Sagar Ahire
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
GunjanSrivastava23
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
Traian Rebedea
Using nvivo to tell the story, the power of coding
Using nvivo to tell the story, the power of codingUsing nvivo to tell the story, the power of coding
Using nvivo to tell the story, the power of coding
QSR International
Social Media Sentiments Analysis
Social Media Sentiments AnalysisSocial Media Sentiments Analysis
Social Media Sentiments Analysis
PratisthaSingh5
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
Manuel Coppotelli

Viewers also liked (8)

Ee3702
Ee3702Ee3702
Ee3702
Haha Teh
Fypca5
Fypca5Fypca5
Fypca5
Haha Teh
3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)
proudyproud
Fypca4
Fypca4Fypca4
Fypca4
Haha Teh
urban area picture
urban area pictureurban area picture
urban area picture
proudyproud
Fypca5
Fypca5Fypca5
Fypca5
Haha Teh
3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)
proudyproud
urban area picture
urban area pictureurban area picture
urban area picture
proudyproud

Similar to Fypca4 (20)

Emerging Techniques in Machine Learning, Data Science and Internet of Things
Emerging Techniques in Machine Learning, Data Science and Internet of ThingsEmerging Techniques in Machine Learning, Data Science and Internet of Things
Emerging Techniques in Machine Learning, Data Science and Internet of Things
chitram48
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all students
talldesalegn
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
Dr. Haxel Consult
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
hajinouha0
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppthjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
SurabhiChahar
Social Media Sentiment Analysis using NLP..pptx
Social Media Sentiment Analysis using NLP..pptxSocial Media Sentiment Analysis using NLP..pptx
Social Media Sentiment Analysis using NLP..pptx
PavithranRaja
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
Vaibhav Varshney
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
diannepatricia
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
John Kinmonth
SciBite
SciBiteSciBite
SciBite
Dr. Haxel Consult
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
Dr. Amit Sachan
Regenstrief WIP 07012015
Regenstrief WIP 07012015Regenstrief WIP 07012015
Regenstrief WIP 07012015
Suranga Nath Kasthurirathne
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
Adel Rahimi
Collective sensing
Collective sensingCollective sensing
Collective sensing
mahdikianirad1
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
Peter Varhol
20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide
Satoshi Kawamoto
Support Optimization
Support OptimizationSupport Optimization
Support Optimization
Lymba
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
Emerging Techniques in Machine Learning, Data Science and Internet of Things
Emerging Techniques in Machine Learning, Data Science and Internet of ThingsEmerging Techniques in Machine Learning, Data Science and Internet of Things
Emerging Techniques in Machine Learning, Data Science and Internet of Things
chitram48
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all students
talldesalegn
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
Dr. Haxel Consult
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
hajinouha0
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppthjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
hjhjhjhkhjhkhkhhjhjhkjhjkhjIR-Lecture-6b.ppt
SurabhiChahar
Social Media Sentiment Analysis using NLP..pptx
Social Media Sentiment Analysis using NLP..pptxSocial Media Sentiment Analysis using NLP..pptx
Social Media Sentiment Analysis using NLP..pptx
PavithranRaja
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
Vaibhav Varshney
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
diannepatricia
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
John Kinmonth
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
Dr. Amit Sachan
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
Adel Rahimi
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
Peter Varhol
20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide
Satoshi Kawamoto
Support Optimization
Support OptimizationSupport Optimization
Support Optimization
Lymba
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen

Fypca4

  • 1. Users Opinions in Hotel TEY JUN HONG U095074X National University Of Singapore
  • 2. Content 1. Background 2.Formulating the problem 3. Data Mining Process 4. Techniques 5. Analysis 01
  • 3. What is Data Mining? Extraction of meaningful / useful / Interesting patterns from a large volume of data sources In this project, the source will be large volume of WEB HOTEL REVIEWS data Data mining is one of the top ten emerging technology MITs TECHNOLOGY REVIEW 2004
  • 4. What is Data Mining? Process of exploration and analysis By automatic / semi automatic means With little or no human interactions To discover meaningful patterns and rulesAND LINOFF, 2000 MASTERING DATA MINING BY BERRY
  • 5. Users Opinions in Increase in social Hotel media and web user Increase in valuable opinion oriented data in Hotel due to web expansion Identify potential hotel to stay by looking at the aspects Overall Sentiments on hotel are greatly sought on the web for
  • 6. What can Data Mining Identify best prospects do? (ASPECTS), and retain customers Predict what ASPECTS customers like and promote accordingly Learn parameters influencing trends in sales and margins Identification of opinions for customers
  • 7. What are the Exponential growth of problems? users opinions Limitations of human analysis Accuracy of human analysis Machines can be trained to take over human analysis with advanced computer technology and it is done with LOW
  • 8. Some Limitations of Unable to read like a machines human No emotions Cannot detect sarcasm Expression of sentiments in different topic and domain Polarity analysis Facts Vs Opinion
  • 9. Some machine The service is as limitation examples good as none. Negation not obvious to machine Swimming pool is big enough to swim with comfort , There is a big crowd at the counter complaining. Polarity might change with context.
  • 11. Machine Learning A tool for data mining and intelligent decision support Application of computer algorithms that improve automatically through experience MASTERING DATA MINING BY BERRY AND LINOFF, 2000
  • 12. Types of Machine Supervised Learning learning A training set is provided (data with correct answers) which is used to mine for known pattern Unsupervised Learning Data are provided with no prior knowledge of the hidden patterns that they contain.
  • 13. Supervised Learning Rule Mining and Rule techniques learning Bayesian Networks Support Vector Machine
  • 14. Project Objective Prediction of sentence polarity Classification of polarity for sentiment lexicon Detection of relations
  • 15. Pre-requisite Large data set Relevant Prior Knowledge to domain, in our case the hotel domain Eg. Rating Sentiment lexicon for sentiment analysis Data selection for reliability and standards
  • 17. Cleaning the Dirty Frequent problem : Data Data (60% of effort) inconsistencies Duplicate data Spelling Errors != Trim from data Foreign accent and characters Singular / Plural conversion Punctuations removal / replacement Noise and incomplete data Naming convention misused,
  • 18. Data Preprocessing Part of Speech Tagging (POS) (Laundering) using Brill Tagger Polarity tagging using
  • 19. Findings Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy POS tagging words after data cleaning
  • 20. Findings Polarity tagging using sentiment lexicon BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
  • 21. Problems Sentiment lexicon not comprehensive to fulfill machine learning technique adopted Polarity of sentiment words who are domain dependent are founded in neutral section of sentiment lexicon Polarity of sentiment words can also change within the domain even though they are domain dependent
  • 22. Solution Classify the polarity of unlabeled sentiment word using rule based mining Classify domain dependent sentiment words Establish word relations between labeled and unlabeled sentiment words
  • 23. Data Processing Rule based mining using conjunction and punctuation Polarity Assignment Rules Same Adj AND/OR - Adj Opposite Neg - Adj AND/OR - Adj / Adj AND/OR - Neg- Adj Same Neg - Adj AND/OR - Neg- Adj Opposite Adj BUT/NOR Adj Same Neg - Adj BUT/NOR - Adj / Adj BUT/NOR - Neg- Adj Opposite Neg - Adj BUT/NOR - Neg- Adj Same Adj , Adj
  • 24. Data Processing Relation Network Aspect Sentiment word pair
  • 25. Data Processing Relation Network Aspect Sentiment word pair
  • 26. Analysis Using the expanded sentiment lexicon, we analyze the polarity sentiment by doing a sentiment lookup using Bayesian Network
  • 27. Bayesian To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y) Probability that a sentiments is positive or negative, given it's contents Assumptions: There is no link between words P(sentiment | sentence) =
  • 28. Validation Precision = N (agree & found) / N (found) High precision means most of the correct sentiment words are found by the system Recall = N (agree & found) / N (agree) High recall means most of
  • 29. Validation Results It is found that out of the 350 aspect-unlabelled sentiment word pairs, Only 194 are founded by the methods. Thus, the precision is about 57%. The recall is also not very high; only 126 words are corrected labelled by the system, which is about 63%.
  • 30. Discussion The results will improve if more rules are applied such the inclusion of more adverbs such as excessively as negation words. There might not be enough dataset for the system to work on. There are only 350 aspect- unlabelled sentiment word pairs for the application to work with. This, however requires more
  • 31. Conclusion Comprehensive Sentiment Lexicon is a simple yet effective solution to sentiment analysis as it does not requires prior training Current sentiment lexicon does not capture such domain and context sensitivities of sentiment expressions
  • 32. Conclusion This leads to poor coverage Thus, expanding general sentiment lexicon to capture domain and context sensitivities of sentiment expressions are advocated
  • 33. Question s? 01 DEMO

Editor's Notes

  • #6: What can we infer from user opinions of hotel
  • #7: What can data mining do in a hotel domain, in other words, learn the market
  • #8: Impossible for humans to read every single opinions Biased of humans to read certain opinions Machines Allow fast access to vast amount of data Allow computational intensive algorithm and statistical methods
  • #9: Impossible for humans to read every single opinions Biased of humans to read certain opinions Machines Allow fast access to vast amount of data Allow computational intensive algorithm and statistical methods
  • #11: Many fields of data mining and in this project we will focus on these 4
  • #12: Growing data volume , limitation of humans and low cost to human
  • #13: The goal for unsupervised learning is to discover these patterns Semi Knowledge is known and applied from one data collection in order to mine, classify, analyze, interpret a related data collection
  • #15: Some of the problems to be solved by data mining Prediction of sentence polarity Classification of polarity for sentiment lexicon Detection of relations
  • #18: Data inconsistencies: Say good in the title but in the review say bad
  • #19: Assigning a label to every word in the text to allow machine to do something with it
  • #20: Pos tagging wrong due to some word like heart having double tagging
  • #22: For example, in the domain of handheld devices, the word large can express positivity for screen size but negativity in the phone size.
  • #24: Assigning a label to every word in the text to allow machine to do something with it
  • #25: After establishing relations, we have a graph of nodes (Sentiments / Aspects) Determine the probability that the node is positive or negative given its surrounding nodes Start with a high frequency unlabelled sentiment word-aspect pair then based on the aspect and its label semtiment pair, determine the polarity for the unlabel This process iterate till all unlabe found their polarity
  • #26: After establishing relations, we have a graph of nodes (Sentiments / Aspects) Determine the probability that the node is positive or negative given its surrounding nodes Start with a high frequency unlabelled sentiment word-aspect pair then based on the aspect and its label semtiment pair, determine the polarity for the unlabel This process iterate till all unlabe found their polarity
  • #30: Assigning a label to every word in the text to allow machine to do something with it
  • #32: A comprehensive sentiment lexicon can provide a simple yet effective solution to sentiment analysis, because it is general and does not require prior training. Therefore, attention and effort have been paid to the construction of such lexicons. However, a significant challenge to this approach is that the polarity of many words is domain and context dependent. For example, long is positive in long battery life and negative in long shutter lag. Current sentiment lexicons do not capture such domain and context sensitivities of sentiment expressions. They either exclude such domain and context dependent sentiment expressions or tag them with an overall polarity tendency based on statistics gathered from certain corpus such as the world wide web accessed via the internet. While excluding such expressions leads to poor coverage, simply tagging them with a polarity tendency leads to poor precision.
  • #33: AThey either exclude such domain and context dependent sentiment expressions or tag them with an overall polarity tendency based on statistics gathered from certain corpus such as the world wide web accessed via the internet. While excluding such expressions leads to poor coverage, simply tagging them with a polarity tendency leads to poor precision.