Published February 14, 2024 | Version v1
Preprint Open

Dataset Artefacts are the Hidden Drivers of the Declining Disruptiveness in Science

  • 1. ROR icon Vrije Universiteit Brussel
  • 2. ROR icon KU Leuven
  • 3. ROR icon Harvard University

Description

3 pages, 2 figures, 3 extended data figures, and Supplementary Information. In submission to Nature.

Abstract

Park et al. [1] reported a decline in the disruptiveness of scientific and technological knowledge over time. Their main finding is based on the computation of CD indices, a measure of disruption in citation networks [2], across almost 45 million papers and 3.9 million patents. Due to a factual plotting mistake, database entries with zero references were omitted in the CD index distributions, hiding a large number of outliers with a maximum CD index of one, while keeping them in the analysis [1]. Our reanalysis shows that the reported decline in disruptiveness can be attributed to a relative decline of these database entries with zero references. Notably, this was not caught by the robustness checks included in the manuscript. The regression adjustment fails to control for the hidden outliers as they correspond to a discontinuity in the CD index. Proper evaluation of the Monte-Carlo simulations reveals that, because of the preservation of the hidden outliers, even random citation behaviour replicates the observed decline in disruptiveness. Finally, while these papers and patents with supposedly zero references are the hidden drivers of the reported decline, their source documents predominantly do make references, exposing them as pure dataset artefacts.

Files

Dataset_Artefacts_are_the_Hidden_Drivers_of_the_Declining_Disruptiveness_in_Science.pdf

Additional details

References

  • [1] Park, M., Leahey, E. & Funk, R. J. Papers and patents are becoming less disruptive over time. Nature 613, 138–144 (2023).
  • [2] Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management science 63, 791–817 (2017).
  • [3] Waskom, M. Treat binwidth as approximate to avoid dropping outermost datapoints. (2023). https://github.com/mwaskom/seaborn/pull/3489.
  • [4] Lin, Z., Yin, Y., Liu, L. & Wang, D. Sciscinet: A large-scale open data lake for the science of science research. Scientific Data 10, 315 (2023).
  • [5] Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).
  • [6] Hofmann, H., Wickham, H. & Kafadar, K. value plots: Boxplots for large data. Journal of Computational and Graphical Statistics 26, 469–477 (2017).
  • [7] Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
  • [8] Lin, Y., Frey, C. B. & Wu, L. Remote collaboration fuses fewer breakthrough ideas. Nature 623, 987–991 (2023).
  • [9] Tang, J. et al. Arnetminer: Extraction and mining of academic social networks, KDD '08, 990–998 (Association for Computing Machinery, New York, NY, USA, 2008).
  • [10] Ruan, X., Lyu, D., Gong, K., Cheng, Y. & Li, J. Rethinking the disruption index as a measure of scientific and technological advances. Technological Forecasting and Social Change 172, 121071 (2021).
  • [11] Macher, J. T., Rutzer, C. & Weder, R. The illusive slump of disruptive patents. arXiv preprint arXiv:2306.10774 (2023).