際際滷

際際滷Share a Scribd company logo
Predicting Star Ratings
based on Annotated Reviews
of Mobile Apps
Talk at the 6th International Workshop on Advances in Semantic Information Retrieval
ASIR 2016
Prof. Dr. Dagmar Monett, Hermann Stolte
D. Monett
Reviews and star ratings
2Gdask, Poland, September 11  14, 2016
Example of reviews and star ratings of the
Evernote App, Google Play Store (07/2016)
D. Monett
Star ratings matter
3Gdask, Poland, September 11  14, 2016
15% would consider downloading an app with a 2-star rating
50% would consider downloading an app with a 3-star rating
96% would consider downloading an app with a 4-star rating
Source: Aptentive 2015 Consumer Study
The Mobile Marketers Guide to App Store Ratings & Reviews
D. Monett
Star ratings matter
4Gdask, Poland, September 11  14, 2016
息 and source: Aptentive 2015 Consumer Study
The Mobile Marketers Guide to App Store Ratings & Reviews
D. Monett 5Gdask, Poland, September 11  14, 2016
Our motivation
D. Monett
Some questions
6Gdask, Poland, September 11  14, 2016
 Could we (a program) teach users how to rate
apps consistently with the review they are writing
for a mobile app?
 I.e., could we (a program) suggest to users the
most adequate star rating they should give to a
product depending on the semantic orientation of
what they have already written in the review?
 Would it mean an improvement of users'
engagement and satisfaction with the app?
D. Monett 7Gdask, Poland, September 11  14, 2016
Background
D. Monett 8Gdask, Poland, September 11  14, 2016
Review rating prediction
 Also sentiment rating prediction:
 a task that deals with the inference of an
author's implied numerical rating, i.e. on the
prediction of a rating score, from a given written
review
 E.g., recommendation systems often suggest
products based on star ratings of similar
products previously rated by other users
D. Monett 9Gdask, Poland, September 11  14, 2016
Suggested readings
D. Monett 10Gdask, Poland, September 11  14, 2016
Other related work
 Analysing textual reviews and inferring sentiment
polarity positive/negative/neutral (Pang et al. 2002;
Liu, 2010)
 Using not only textual semantics but also other
information, e.g., about the author and/or the
product (Tang et al., 2015; Li et al. 2011)
 Considering phrase-level sentiment polarity (Qu et
al., 2010)
 Considering aspect-based opinion mining (Zhang et
al., 2006; Ganu et al., 2013; Klinger & Cimiano, 2013; S辰nger, 2015)
D. Monett 11Gdask, Poland, September 11  14, 2016
Our approach
D. Monett 12Gdask, Poland, September 11  14, 2016
Our approach
 We do not deal with aspect identification nor with
sentiment classification
 We are assuming that these tasks are already
performed before the star ratings are predicted
 We focus on predicting star ratings based solely
on available annotated, fine-granular opinions
 I.e., a complement to works like (S辰nger, 2015) which
extends (Klinger & Cimiano, 2013) and use a German
annotated corpus of mobile apps
D. Monett 13Gdask, Poland, September 11  14, 2016
The Data
D. Monett 14Gdask, Poland, September 11  14, 2016
SCARE Corpus
Mario S辰nger, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger.
SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in
German. In Proceedings of the Tenth International Conference on Language
Resources and Evaluation (LREC'16), Portoro転, Slovenia, May 2016. European
Language Resources Association (ELRA).
 Fine-grained annotations for mobile application
reviews from the Google Play Store
 1,760 German application reviews with 2,487
aspects and 3,959 subjective phrases
 SCARE corpus v.1.0.0 (annotations only)
 Available at http://www.romanklinger.de/scare/
D. Monett 15Gdask, Poland, September 11  14, 2016
Analysing the Data
D. Monett 16Gdask, Poland, September 11  14, 2016
Polarity and star ratings
69.1%
23.1%
Thumbs-up-thumbs-down
(Liu, 2012)
D. Monett
Avg. of labelled star ratings vs.
avg. of subjective phrases polarity
17Gdask, Poland, September 11  14, 2016
D. Monett
Number of star ratings vs.
number of subjective phrases
18Gdask, Poland, September 11  14, 2016
D. Monett 19Gdask, Poland, September 11  14, 2016
Predicting
Star Ratings
D. Monett
Prediction process
20Gdask, Poland, September 11  14, 2016
D. Monett 21Gdask, Poland, September 11  14, 2016
We played with
different models
D. Monett
Computational models
22Gdask, Poland, September 11  14, 2016
For example,
x0=1
x1 : no. of subjective phrases with positive polarity
x2 : no. of subjective phrases with negative polarity
x3 : no. of subjective phrases with neutral polarity
D. Monett
Computational models
23Gdask, Poland, September 11  14, 2016
RSS: review rating score (Ganu et al., 2009, 2013)
D. Monett
Experiments
24Gdask, Poland, September 11  14, 2016
(1) Assessing the importance of sentiment in the
reviews:
 Neutral phrases (yes/no)?
 Reviews with no sentiment (yes/no)?
(2) Using other predictors
 Each individual experiment is run 10,000 times
 A Monte Carlo cross-validation: 70% training
dataset and 30% testing dataset, randomly on each
iteration.
D. Monett 25Gdask, Poland, September 11  14, 2016
Some results
D. Monett
Best model, exp. (1)
26Gdask, Poland, September 11  14, 2016
 It considers only the average value of the
polarities of a review in one feature:
 Plus:
 filtering both subjective phrases with neutral
polarity and reviews with no sentiment
orientation at all
 No normalisation
D. Monett
Results
27Gdask, Poland, September 11  14, 2016
D. Monett 28Gdask, Poland, September 11  14, 2016
Conclusion
D. Monett
Conclusion
29Gdask, Poland, September 11  14, 2016
 Textually-derived rating prediction can be
performed well even when only phrase-level
sentiment polarity is available
 Phrases with neutral sentiment could be filtered
out of the corpus
 Computing the overall sentiment of a review using
the review rating score (Ganu et al., 2009, 2013) provides
the best star rating predictions
D. Monett
Further work
30Gdask, Poland, September 11  14, 2016
 To consider the aspects relevance
 aspect-oriented subjective phrases
 To analyse the strengths of the opinions (Wilson et al.,
2004)
 not only positive/negative/neutral sentiment
 To deal with other types of models different than
linear, multivariate regression ones
D. Monett
Sources
31Gdask, Poland, September 11  14, 2016
Related work:
- See references list on our paper!
 https://www.researchgate.net/publication/304244445_Predi
cting_Star_Ratings_based_on_Annotated_Reviews_of_Mo
bile_Apps
dagmar@monettdiaz.com
monettdiaz
Contact:

More Related Content

Predicting Star Ratings based on Annotated Reviewss of Mobile Apps [際際滷s]

  • 1. Predicting Star Ratings based on Annotated Reviews of Mobile Apps Talk at the 6th International Workshop on Advances in Semantic Information Retrieval ASIR 2016 Prof. Dr. Dagmar Monett, Hermann Stolte
  • 2. D. Monett Reviews and star ratings 2Gdask, Poland, September 11 14, 2016 Example of reviews and star ratings of the Evernote App, Google Play Store (07/2016)
  • 3. D. Monett Star ratings matter 3Gdask, Poland, September 11 14, 2016 15% would consider downloading an app with a 2-star rating 50% would consider downloading an app with a 3-star rating 96% would consider downloading an app with a 4-star rating Source: Aptentive 2015 Consumer Study The Mobile Marketers Guide to App Store Ratings & Reviews
  • 4. D. Monett Star ratings matter 4Gdask, Poland, September 11 14, 2016 息 and source: Aptentive 2015 Consumer Study The Mobile Marketers Guide to App Store Ratings & Reviews
  • 5. D. Monett 5Gdask, Poland, September 11 14, 2016 Our motivation
  • 6. D. Monett Some questions 6Gdask, Poland, September 11 14, 2016 Could we (a program) teach users how to rate apps consistently with the review they are writing for a mobile app? I.e., could we (a program) suggest to users the most adequate star rating they should give to a product depending on the semantic orientation of what they have already written in the review? Would it mean an improvement of users' engagement and satisfaction with the app?
  • 7. D. Monett 7Gdask, Poland, September 11 14, 2016 Background
  • 8. D. Monett 8Gdask, Poland, September 11 14, 2016 Review rating prediction Also sentiment rating prediction: a task that deals with the inference of an author's implied numerical rating, i.e. on the prediction of a rating score, from a given written review E.g., recommendation systems often suggest products based on star ratings of similar products previously rated by other users
  • 9. D. Monett 9Gdask, Poland, September 11 14, 2016 Suggested readings
  • 10. D. Monett 10Gdask, Poland, September 11 14, 2016 Other related work Analysing textual reviews and inferring sentiment polarity positive/negative/neutral (Pang et al. 2002; Liu, 2010) Using not only textual semantics but also other information, e.g., about the author and/or the product (Tang et al., 2015; Li et al. 2011) Considering phrase-level sentiment polarity (Qu et al., 2010) Considering aspect-based opinion mining (Zhang et al., 2006; Ganu et al., 2013; Klinger & Cimiano, 2013; S辰nger, 2015)
  • 11. D. Monett 11Gdask, Poland, September 11 14, 2016 Our approach
  • 12. D. Monett 12Gdask, Poland, September 11 14, 2016 Our approach We do not deal with aspect identification nor with sentiment classification We are assuming that these tasks are already performed before the star ratings are predicted We focus on predicting star ratings based solely on available annotated, fine-granular opinions I.e., a complement to works like (S辰nger, 2015) which extends (Klinger & Cimiano, 2013) and use a German annotated corpus of mobile apps
  • 13. D. Monett 13Gdask, Poland, September 11 14, 2016 The Data
  • 14. D. Monett 14Gdask, Poland, September 11 14, 2016 SCARE Corpus Mario S辰nger, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger. SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in German. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), Portoro転, Slovenia, May 2016. European Language Resources Association (ELRA). Fine-grained annotations for mobile application reviews from the Google Play Store 1,760 German application reviews with 2,487 aspects and 3,959 subjective phrases SCARE corpus v.1.0.0 (annotations only) Available at http://www.romanklinger.de/scare/
  • 15. D. Monett 15Gdask, Poland, September 11 14, 2016 Analysing the Data
  • 16. D. Monett 16Gdask, Poland, September 11 14, 2016 Polarity and star ratings 69.1% 23.1% Thumbs-up-thumbs-down (Liu, 2012)
  • 17. D. Monett Avg. of labelled star ratings vs. avg. of subjective phrases polarity 17Gdask, Poland, September 11 14, 2016
  • 18. D. Monett Number of star ratings vs. number of subjective phrases 18Gdask, Poland, September 11 14, 2016
  • 19. D. Monett 19Gdask, Poland, September 11 14, 2016 Predicting Star Ratings
  • 20. D. Monett Prediction process 20Gdask, Poland, September 11 14, 2016
  • 21. D. Monett 21Gdask, Poland, September 11 14, 2016 We played with different models
  • 22. D. Monett Computational models 22Gdask, Poland, September 11 14, 2016 For example, x0=1 x1 : no. of subjective phrases with positive polarity x2 : no. of subjective phrases with negative polarity x3 : no. of subjective phrases with neutral polarity
  • 23. D. Monett Computational models 23Gdask, Poland, September 11 14, 2016 RSS: review rating score (Ganu et al., 2009, 2013)
  • 24. D. Monett Experiments 24Gdask, Poland, September 11 14, 2016 (1) Assessing the importance of sentiment in the reviews: Neutral phrases (yes/no)? Reviews with no sentiment (yes/no)? (2) Using other predictors Each individual experiment is run 10,000 times A Monte Carlo cross-validation: 70% training dataset and 30% testing dataset, randomly on each iteration.
  • 25. D. Monett 25Gdask, Poland, September 11 14, 2016 Some results
  • 26. D. Monett Best model, exp. (1) 26Gdask, Poland, September 11 14, 2016 It considers only the average value of the polarities of a review in one feature: Plus: filtering both subjective phrases with neutral polarity and reviews with no sentiment orientation at all No normalisation
  • 27. D. Monett Results 27Gdask, Poland, September 11 14, 2016
  • 28. D. Monett 28Gdask, Poland, September 11 14, 2016 Conclusion
  • 29. D. Monett Conclusion 29Gdask, Poland, September 11 14, 2016 Textually-derived rating prediction can be performed well even when only phrase-level sentiment polarity is available Phrases with neutral sentiment could be filtered out of the corpus Computing the overall sentiment of a review using the review rating score (Ganu et al., 2009, 2013) provides the best star rating predictions
  • 30. D. Monett Further work 30Gdask, Poland, September 11 14, 2016 To consider the aspects relevance aspect-oriented subjective phrases To analyse the strengths of the opinions (Wilson et al., 2004) not only positive/negative/neutral sentiment To deal with other types of models different than linear, multivariate regression ones
  • 31. D. Monett Sources 31Gdask, Poland, September 11 14, 2016 Related work: - See references list on our paper! https://www.researchgate.net/publication/304244445_Predi cting_Star_Ratings_based_on_Annotated_Reviews_of_Mo bile_Apps