Leveraging small software engineering data sets with pre-trained neural networks

1. Does the paper propose a new opinion mining approach?

Yes

2. Which opinion mining techniques are used (list all of them, clearly stating their name/reference)?

They retrain a deep learning model AWD-LSTM (pre-trained/fro scratch), other results are taken from Lin et al ICSE 2018

3. Which opinion mining approaches in the paper are publicly available? Write down their name and links. If no approach is publicly available, leave it blank or None.

I cannot find the model they trained or code they used

4. What is the main goal of the whole study?

Use pre-trained NN to analyse sentiment of SE texts

5. What the researchers want to achieve by applying the technique(s) (e.g., calculate the sentiment polarity of app reviews)?

Idem

6. Which dataset(s) the technique is applied on?

Stack Overflow sentences (1,500 sentences), App Reviews (341 app reviews), and JIRA issues (926 issues) collected by Lin et al.

7. Is/Are the dataset(s) publicly available online? If yes, please indicate their name and links.

All data comes from the same paper https://sentiment-se.github.io/replication.zip

8. Is the application context (dataset or application domain) different from that for which the technique was originally designed?

No

9. Is the performance (precision, recall, run-time, etc.) of the technique verified? If yes, how did they verify it and what are the results?

It is not verification, the authors report Correct Predictions, Accuracy, Positive F-score , Neutral F-score and Negative F-score. The AWD-LSTM outperforms other techniques compared (SentiStrength, SentiStrength-SE, NLTK, Stanford CoreNLP and Stanford CoreNLP SO).

10. Does the paper replicate the results of previous work? If yes, leave a summary of the findings (confirm/partially confirms/contradicts).

Kind of. They use the same data and compare results with previously published work but they do not position it as a replication since their focus is not on the results but on benchmarking techniqus.

11. What success metrics are used?

N/A

12. Write down any other comments/notes here.

I wonder whether we can get the code...