This is a review of the bioRxiv preprint "EMT network-based feature selection improves prognosis prediction in lung adenocarcinoma" by Borong Shao, Maria Bjaanæs, Åslaug Helland, Christof Schütte, Tim Conrad,
doi:10.1101/410472. This review was compiled from a discussion during the live-streamed Bioinformatics preprint journal club as part of an Open Access Week effort organized by the PREreview team and PLOS. Event details can be found
here, and the collaborative Etherpad showing all the journal club notes can be found
here.
In addition to those named as authors above, the participants who wished to be acknowledged for their contributions to this review are as follows: Samantha Hindle, Paul Goetsch, and Bradly Alicea.
Summary
The goal of this preprint is to demonstrate the utility of using a phenotype relevant network-based feature selection (PRNFS) framework to improve prediction of cancer prognosis from multiple sets of high-dimensional omics data. The proposed network described biological interactions pertaining to epithelial-to-mesenchymal transition (EMT), with the goal of improving the prognosis prediction of lung adenocarcinoma.
All participants found the research very interesting and reported that, for the most part, the results supported the conclusions. However, one third of the participants reported having problems with understanding the methods because they appeared incomplete or not sufficiently clear for another researcher to replicate the findings. The major problem reported by two thirds of the participants was related to the figures and tables being hard to read and interpret.
Major comments
Many journal club participants recognized the importance of the study and the applicability of the results to research questions beyond cancer prognosis. Throughout the preprint, the authors make connections between gene expression networks and phenotypic processes in a novel way using a variety of methods. Many participants suggested the inclusion of a figure showing the network and a diagram to help the reader navigate the comparisons between methods and appreciate the advantages and improvements of the proposed approach over alternative ones. Given that the leading author was present during the journal club, we learned that more details on the network are available on the author's GitHub repository and the link is in the manuscript – however, it currently leads to a 404 page. Given that many readers missed this, it was suggested that the authors emphasize this more in the manuscript. Additionally, in order for the GitHub repository to be useful and accessible, it would help to have a short description of its content in a README.md page (see
GitHub guide).
Furthermore, it would be helpful for the reader to be able to tease apart the differences between feature selection alone and classification methods. For example, this would help address how selecting features based on the EMT-based phenotype GRN would improve predictions compared to random signature. One participant suggested using a framework developed by \citet{Venet_2011} to rapidly assess comparisons between networks and validate improvements of one over another.
Other comments from the participants were also related to suggesting ways to improve the readability of the manuscript and help readers to more easily understand the main takeaways of the results. For example, it was suggested that the authors combined Figures S5, S6, S7 into one figure with three panels for a more direct comparison between GE+DM and GE or DM independently. For similar reasons, it was also suggested to select a fewer number of "essential" figures and tables for the main manuscript, and move the remaining figures and tables to supporting information. Additional suggestions are listed below.
Minor comments, suggestions, and typos