Too Many User-Reviews, What Should App Developers Look at First?
1. Does the paper propose a new opinion mining approach?
No
2. Which opinion mining techniques are used (list all of them, clearly stating their name/reference)?
Two approaches are tested: - SentiStrength: http://sentistrength.wlv.ac.uk/ - Natural LanguageToolkit (NLTK): https://www.nltk.org/ After the preliminary sanity check on a small subset (see answer to question 9 below), the authors used Sentistrength in their full study.
3. Which opinion mining approaches in the paper are publicly available? Write down their name and links. If no approach is publicly available, leave it blank or None.
- SentiStrength: http://sentistrength.wlv.ac.uk/ - Natural LanguageToolkit (NLTK): https://www.nltk.org/
4. What is the main goal of the whole study?
The paper presents an approach to help developers find the key topics of user-reviews that are significantly related to star-ratings of a given category
5. What the researchers want to achieve by applying the technique(s) (e.g., calculate the sentiment polarity of app reviews)?
Sentiment analysis is used to preprocess the dataset by filtering out reviews. From the paper: 'we only consider the user-reviews with consistent sentiment scores and star-ratings to reduce the risk of having inconsistent user-reviews skew the findings.'
6. Which dataset(s) the technique is applied on?
4,193,549 user-reviews of 623 Android apps that were collected from Google Play Store in ten different categories
7. Is/Are the dataset(s) publicly available online? If yes, please indicate their name and links.
No
8. Is the application context (dataset or application domain) different from that for which the technique was originally designed?
Yes, both Sentistrength and NLTK are general-purpose sentiment analysis tools validated on non-technical data.
9. Is the performance (precision, recall, run-time, etc.) of the technique verified? If yes, how did they verify it and what are the results?
We manually investigated the output of both SentiStrength and NLTK on a sample of user-reviews with a size of 384, a confidence level of 95%, and a confidence interval of 5. SentiStrength achieved 74% correct sentiment scores, while NTLK achieved 62% correct sentiment scores. Based on this sanity check, the authors decided to adopt Sentistrength in their study.
10. Does the paper replicate the results of previous work? If yes, leave a summary of the findings (confirm/partially confirms/contradicts).
No
11. What success metrics are used?
Sentiment analysis is assessed in terms of accuracy (% of correctly classified cases).
12. Write down any other comments/notes here.
-