Evaluating software quality in use using user reviews mining
1. Does the paper propose a new opinion mining approach?
Yes
2. Which opinion mining techniques are used (list all of them, clearly stating their name/reference)?
Our classification approach is applied from two lists of sentiment words (positive and negative words) combining with rule-based classification method.
3. Which opinion mining approaches in the paper are publicly available? Write down their name and links. If no approach is publicly available, leave it blank or None.
NA
4. What is the main goal of the whole study?
Mining user feedback to evaluate software quality. Two lists of sentiment words are used combining with rule-based classifier which is used as machine learning approach. Ontology and classifier take part in information extraction from users' reviews. This information will be used to generate the quality in use scores proposed by our research.
5. What the researchers want to achieve by applying the technique(s) (e.g., calculate the sentiment polarity of app reviews)?
Sentiment polarity of users' reviews.
6. Which dataset(s) the technique is applied on?
In order to evaluate our work, we prepared the data set which consists of the review information from the website (www.Cnet.com). We collected them from 3 categories of software (antivirus, video player and maintenance & optimization). Each category contains 10 software product information. 100 reviews are collected from each software product. Therefore, there are 3,000 reviews in the data set. In each review, we gathered five items: number of rating stars, one-line summary, pros, cons and summary. In detail, 18,679 sentences in 3,000 reviews are denoted as positive, neutral or negative opinions that are involved which characteristic in quality in use classified by expert.
7. Is/Are the dataset(s) publicly available online? If yes, please indicate their name and links.
No
8. Is the application context (dataset or application domain) different from that for which the technique was originally designed?
NO
9. Is the performance (precision, recall, run-time, etc.) of the technique verified? If yes, how did they verify it and what are the results?
Prec, rec, f1, accuracy. Performance for various settings are provided (e.g., 2 vs. 4 rule sets) at different levels (polarity of sentence, polarity of reviews). Best performance reported for polarity of sentence: F1 = 58.51% for negative, and F1 = 80.77% for positive. For polarity of reviews, only accuracy is provided (best accuracy = 58.51%)
10. Does the paper replicate the results of previous work? If yes, leave a summary of the findings (confirm/partially confirms/contradicts).
No
11. What success metrics are used?
see previous answer about evaluation
12. Write down any other comments/notes here.
-