Published February 11, 2020 | Version v0.1
Journal article Open

How Wikipedia disease information evolve over time? An analysis of disease-based articles changes


Wikipedia, also known as "The Free Encyclopaedia”, is one of the largest online repositories of biomedical information in the world, and is nowadays increasingly been used by medical researchers and health professionals alike. In spite of its rising popularity, little attention has been devoted to the understanding of how such medical information is organised, and especially how it evolves through time. We here present an analysis aimed at characterising such evolution, with a focus on the effects that such dynamic may have on an automated knowledge extraction process. For that, we start from a data set comprising a large number of snapshots of Wikipedia’s disease articles, and the corresponding diagnostic elements as provided by the DISNET project ( We then track and analyse how different metrics evolve through time, such as the total article length or the number of medical terms and references. Results highlight some expected facts, as for instance that most articles increase their content through time; and that hot topics, as Alzheimer’s disease, attract the highest number of editions and views. On the other hand, relevant behaviours are observed for less well-known diseases, including abrupt changes in the text and the concentration of contributions in a handful of editors. These results stress the importance of using correctly filtered and up-to-date datasets, and more general of considering the temporal evolution of the information in Wikipedia.


The paper is a result of the project "DISNET (Creation and analysis of disease networks for drug repurposing from heterogeneous data sources applied to rare diseases)", that is being developed under grant "RTI2018-094576-A-I00" from the Spanish Ministerio de Ciencia, Innovación y Universidades. Gerardo Lagunes-Garcia work is supported by Mexican Consejo Nacional de Ciencia y Tecnología (CONACYT) (CVU: 340523) under the programme "291114 - BECAS CONACYT AL EXTRANJERO". Lucia Prieto Santamaría's work is supported by "Programa de fomento de la investigación y la innovación (Doctorados Industriales") from Comunidad de Madrid (grant IND2019/TIC-17159).


Files (8.2 MB)

Name Size Download all
8.2 MB Download