Retrieving Diverse Opinions from App Reviews

1. Does the paper propose a new opinion mining approach?

Yes

2. Which opinion mining techniques are used (list all of them, clearly stating their name/reference)?

DIVERSE (a feature and sentiment centric retrieval approach) - NLTK (extracting features from the user reviews) - SentiStrength (calculating the sentiment score) - greedy algorithm: (retrieving a set of reviews mentioning a large number of features with a wide range of associated sentiments)

3. Which opinion mining approaches in the paper are publicly available? Write down their name and links. If no approach is publicly available, leave it blank or None.

NLTK, SentiStrength

4. What is the main goal of the whole study?

To provide developers with a diverse sample of user reviews that is representative of the different opinions and experiences mentioned in the whole set of reviews

5. What the researchers want to achieve by applying the technique(s) (e.g., calculate the sentiment polarity of app reviews)?

When querying for reviews mentioning a feature(s) of their interest (e.g., "share files"), DIVERSE returns a set of reviews which mention the queried feature(s) and are representative of the positive, negative or neutral experiences and opinions users have concerning the feature(s)

6. Which dataset(s) the technique is applied on?

2800 manually labelled reviews (400 for each app), part of "E. Guzman and W. Maalej. How do users like this Feature? A fine grained sentiment analysis of app reviews. In Proc. of the International Conference on Requirements Engineering (RE)"

7. Is/Are the dataset(s) publicly available online? If yes, please indicate their name and links.

The website redirects to an invalid dropbox link

8. Is the application context (dataset or application domain) different from that for which the technique was originally designed?

New approach, N/A NLTK, SentiStrenth, yes

9. Is the performance (precision, recall, run-time, etc.) of the technique verified? If yes, how did they verify it and what are the results?

For the diversity of reviews: considered each feature and its associated sentiment to be an information entity and each review as a document, calculate the diversity metrics. For impact and usefulness: conducted a controlled experiment for impact in terms of time spent when analyzing reviews; analyzed the usefulness for making decisions concerning the evolution of features and for detecting conflicting opinions.

10. Does the paper replicate the results of previous work? If yes, leave a summary of the findings (confirm/partially confirms/contradicts).

no

11. What success metrics are used?

diversity metrics: α Normalized Discounted Cumulative Gain measure (α-nDCG) proposed by Clarke et al. "Novelty and diversity in information retrieval evaluation" impact and usefulness metrics: Review browsing and answering time for impact, Perceived difficulty for tasks questionnaire, and semi-structured interview for usefulness

12. Write down any other comments/notes here.

-