�ݺ�ߣ

User’s
Opinions
in Hotel
TEY JUN HONG
U095074X
National University Of Singapore

Content
1. Background
2.Formulating the
problem
3. Data Mining Process
4. Techniques
5. Analysis

01

What is Data
Mining?
• Extraction of meaningful /
useful / Interesting patterns
from a large volume of data
sources
• In this project, the source will
be large volume of WEB HOTEL
REVIEWS data
• Data mining is one of the top
ten emerging technology
MIT’s TECHNOLOGY REVIEW 2004

What is Data
Mining?
• Process of exploration and
analysis
• By automatic / semi automatic
means
• With little or no human
interactions
• To discover meaningful
patterns and rulesAND LINOFF, 2000
MASTERING DATA MINING BY BERRY

User’s Opinions in
• Increase in social
Hotel
media and web user
• Increase in valuable
opinion oriented data
in Hotel due to web
expansion
• Identify potential hotel
to stay by looking at
the aspects
• Overall Sentiments on
hotel are greatly
sought on the web for

What can Data Mining
• Identify best prospects
do?
(ASPECTS), and retain
customers
• Predict what ASPECTS
customers like and
promote accordingly
• Learn parameters
influencing trends in
sales and margins
• Identification of
opinions for customers

What are the
• Exponential growth of
problems?
user’s opinions
• Limitations of human
analysis
• Accuracy of human
analysis

Machines can be trained
to take over human
analysis with advanced
computer technology
and it is done with LOW

Some Limitations of
• Unable to read like a
machines
human
• No emotions
• Cannot detect
sarcasm
• Expression of
sentiments in different
topic and domain
• Polarity analysis
• Facts Vs Opinion

Some machine
• “The service is as
limitation examples
good as none”.
Negation not obvious
to machine

• “Swimming pool is big
enough to swim with
comfort” , “There is a
big crowd at the
counter complaining”.
Polarity might change
with context.

Machine
Learning
• A tool for data mining and
intelligent decision support
• Application of computer
algorithms that improve
automatically through
experience

MASTERING DATA MINING BY BERRY AND LINOFF, 2000

Types of Machine
• Supervised Learning
learning
• A training set is
provided (data with
correct answers)
which is used to mine
for known pattern
• Unsupervised Learning
• Data are provided
with no prior
knowledge of the
hidden patterns that
they contain.

Supervised Learning
• Rule Mining and Rule
techniques
learning
• Bayesian Networks
• Support Vector
Machine

Project
Objective
• Prediction of sentence polarity
• Classification of polarity for
sentiment lexicon
• Detection of relations

Pre-requisite
• Large data set
• Relevant Prior
Knowledge to domain,
in our case the hotel
domain
• Eg. Rating
• Sentiment lexicon for
sentiment analysis
• Data selection for
reliability and
standards

Cleaning the “Dirty”
• Frequent problem : Data
Data (60% of effort)
inconsistencies
• Duplicate data
• Spelling Errors != Trim from
data
• Foreign accent and characters
• Singular / Plural conversion
• Punctuations removal /
replacement
• Noise and incomplete data
• Naming convention misused,

Data Preprocessing
• Part of Speech Tagging (POS)
(Laundering)
using Brill Tagger

• Polarity tagging using

Findings
• Part of Speech Tagging (POS)
using Brill Tagger - NO
PROBLEM
-95% accuracy POS tagging
words after data cleaning

Findings
•Polarity tagging using
sentiment lexicon – BIG
PROBLEM
-40% sentiment words not found
in sentiment lexicon
-10% sentiment words with a
positive or negative polarity
found are in the neutral section
of sentiment lexicon

Problems
• Sentiment lexicon not
comprehensive to fulfill
machine learning technique
adopted
• Polarity of sentiment words
who are domain dependent are
founded in neutral section of
sentiment lexicon
• Polarity of sentiment words
can also change within the
domain even though they are
domain dependent

Solution
• Classify the polarity of
unlabeled sentiment word
using rule based mining
• Classify domain dependent
sentiment words
• Establish word relations
between labeled and unlabeled
sentiment words

Data Processing
• Rule based mining using
conjunction and punctuation
Polarity Assignment Rules

Same Adj – AND/OR - Adj

Opposite Neg - Adj – AND/OR - Adj /
Adj – AND/OR - Neg- Adj
Same Neg - Adj – AND/OR - Neg- Adj

Opposite Adj – BUT/NOR – Adj

Same Neg - Adj – BUT/NOR - Adj /
Adj – BUT/NOR - Neg- Adj
Opposite Neg - Adj – BUT/NOR - Neg- Adj

Same Adj , Adj

Data Processing
• Relation Network – Aspect –
Sentiment word pair

Analysis
• Using the expanded sentiment
lexicon, we analyze the polarity
sentiment by doing a sentiment
lookup using Bayesian Network

Bayesian
• To determine polarity of
sentiments

P(X | Y) = P(X) P(Y | X) / P(Y)

• Probability that a sentiments is
positive or negative, given it's
contents
• Assumptions: There is no link
between words
• P(sentiment | sentence) =

Validation
• Precision = N (agree & found) /
N (found)
• High precision means most of
the correct sentiment words
are found by the system
• Recall = N (agree & found) / N
(agree)
• High recall means most of

Validation Results
• It is found that out of the 350
aspect-unlabelled sentiment
word pairs,
• Only 194 are founded by the
methods. Thus, the precision is
about 57%.
• The recall is also not very high;
only 126 words are corrected
labelled by the system, which is
about 63%.

Discussion
• The results will improve if more
rules are applied such the
inclusion of more adverbs such
as “excessively” as negation
words.
• There might not be enough
dataset for the system to work
on. There are only 350 aspect-
unlabelled sentiment word
pairs for the application to
work with.
• This, however requires more

Conclusion
• Comprehensive Sentiment
Lexicon is a simple yet
effective solution to sentiment
analysis as it does not requires
prior training
• Current sentiment lexicon does
not capture such domain and
context sensitivities of
sentiment expressions

Conclusion
• This leads to poor coverage
• Thus, expanding general
sentiment lexicon to capture
domain and context
sensitivities of sentiment
expressions are advocated

�ݺ�ߣ

Fypca4

Recommended

More Related Content

What's hot (19)

Viewers also liked (8)

Similar to Fypca4 (20)

Fypca4

Editor's Notes