1 Software publishing, licensing, and citation Matthias Liffers Australian Research Data Commons ARDC is enabled by NCRIS. DOI 10.5281/zenodo.5091717 This presentation is intended to be delivered in approximately thirty minutes, and often stimulates enough discussion to fill an entire hour. Presenters should familiarise themselves with: DOIs Code repositories like GitHub and GitLab Publication platforms like Zenodo and Figshare Software licences 2 We’re already familiar with citing sources in papers - it’s how we avoid plagiarism and make sure that we give credit to all those on whose work we build. Over the past decade, the world has also increasingly acknowledged data as a research output that should be published and cited. 3 Why do we want to cite research software? 4 Most software developed for research is, unfortunately, funded on a rather temporary basis, or not funded at all! After all, research funding can generally only be spent on research and not on developing the infrastructure on which research depends. Video: Research Software Sustainability Webinar - Recorded 27 May 2020 Dan Katz is an international expert on the topic of research software sustainability - making sure that the research software developed today remains available in the long term. In mid-2020 he gave an excellent webinar on all the different issues that I encourage you to watch. https://www.youtube.com/watch?v=RAwO8Q9oKLA 5 An increasing number of publishers require code and/or software used in analyses to be made available alongside data This is a statement from the AGU: For the purposes of this policy, data include, but are not limited to, the following: Data used to generate, or be displayed in, figures, graphs, plots, videos, animations, or tables in a paper. New protocols or methods used to generate the data in a paper. New code/computer software used to generate results or analyses reported in the paper. Derived data products reported or described in a paper. https://publications.agu.org/author-resource-center/publication-policies/data-policy/ AGU - probably the leading publisher of geological research. 6 Nature also requires the availability of code for at least the peer review process. Authors must make available upon request, to editors and reviewers, any previously unreported custom computer code or algorithm used to generate results that are reported in the paper and central to its main claims. Any reason that would preclude the need for code or algorithm sharing will be evaluated by the editors who reserve the right to decline the paper if important code is unavailable. Nature also requires the availability of code for at least the peer review process. 7 Why is it important for important code to be made available? Take these scripts, developed in Python, that have been used in hundreds of computational chemistry studies and publications. The script relies on the underlying operating system to provide it with lists of data files in a particular order. The problem is that different operating systems provide these file lists in different orders, which means that when you run the script on Windows, you will get different results to if you run it on Linux. The upshot is that the analytical results would be out by about 10% or so. This difference wasn’t big enough to be immediately noticeable to researchers, but it was big enough to change the outcome of the research. If these scripts were not available for others to inspect, this bug might not have been discovered. 8 Making software citable also means that we are co-opting an existing mechanism for academic recognition - citations! There are very few institutions that do not rely on citation counts and derivative metrics like H-indices for academic recruitment and promotion. The pictured tweet is an example of a research using some software that the CSIRO has made available - here it has calculate the sizes of grains in a petri dish. Wouldn’t it be great if Stephanie could also cite the software in her paper, and give formal credit to the research software engineers? Even better would be if the research software engineers could then use the citations of their software when applying for jobs or academic promotion. 9 Zenodo has started doing some great work by partnering with the Astrophysics Data Service https://blog.zenodo.org/2019/01/10/2019-01-10-asclepias 10.5281/zenodo.598352 Now the problem is that the mechanisms and workflows for counting software citations are still in their infancy. Zenodo has started doing some great work by partnering with ADS - the Astrophysics Data Service - to provide citation counts for Zenodo objects cited in papers published in ADS. This is by no means a global solution for all disciplines and software repositories, but it’s a good first step. With the example 10.5281/zenodo.598352 we can have a closer look at a record in Zenodo that describes some software code. Note that there are multiple DOIs for this code - there is an umbrella that refers to the software code in general, and then each release of the code has its own DOI. DOI versioning is especially important as software changes its behaviour between releases. Bugs are fixed, new features are introduced, and the results you get from a particular analysis can change. When citing software, cite the exact version (or release) that you used in your analysis. This makes it easier to track the provenance of your results and save yourself from any future confusion. 10 Making your Software Citable Three prerequisites, two steps Getting a DOI and making your software code citable is much more straightforward than it used to be - there are automated workflows that lets you embed the DOI process in your normal software development practices. You do, however, need to do a little setup before it will work for you. I will outline three things that you need, and then two things you need to do, in order to make your software easily citable by others. 11 Prerequisite one: A code repository https://github.com/ Using a code repository in conjunction with a version control system adds a layer of resilience to your software project, and helps to establish good habits around development, especially if you work in a team. You will be able to track changes to your code and, should you need it, allow you to backtrack to a previous point in time. Tracking versions of your code is also important for understanding the way in which your analytical methods or models change over time. Commercial software development platforms include Bitbucket, GitHub, and Gitlab. Your institution may also have an in-house code repository, check with your IT department. 12 Prerequisite two: An ORCiD https://orcid.org/ An ORCID iD allows you to unambiguously associate yourself with your research outputs (not just publications), and makes it easier for you to upload your academic work to research management systems, such as those used by research funders. Getting an ORCID iD costs nothing, and it will follow you throughout your career. Using your ORCID iD in outputs including publications often leads to these being automatically recorded, saving you time in the long run. Note that if you have written your software in a team, you should encourage your colleagues to get an ORCID iD too. 13 Prerequisite three: A licence https://choosealicense.com/ Applying a licence to your software lets third parties know under which circumstances your software may be used and reused, if at all. There are many software licences to choose from, so pick one that will provide you with the features that are appropriate for you and your project. 14 Step one: Link to Zenodo (or Figshare) https://guides.github.com/activities/citable-code/ https://help.figshare.com/article/how-to-connect-figshare-with-your-github-account Take a copy of your software from your code repository and upload it to an open access repository. You can use a public repository such as Zenodo, which enjoys an integration with GitHub that automates the transfer process. Along with other metadata, It is a good idea to give your code a formal version number, even if it is single use and you do not anticipate releasing another version. Ideally, the open access repository will provide you with a DOI that you can use to unambiguously identify a particular version of your code. If your chosen open access repository does not provide a DOI, consider asking your institution’s library whether they can mint a DOI for you. Zenodo can also provide a DOI that points to all versions of your snapshotted software. As I previously showed on the Zenodo record - updating software may change the results of analyses and it is 15 Step two: Create a citation statement And add it to your software if you haven’t already been doing so Add a “cite as” statement to the documentation for your software, including the DOI. This should be structured similarly to a standard bibliographic entry. Cite your own software when publishing your papers. Your papers and code can be considered related, but separate, research outputs 16 For example: Coon, E., Berndt, M., Jan, A., et al. (2020, March 25). Advanced Terrestrial Simulator (ATS) v0.88 (Version 0.88) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.3727209 More options for structuring citations are outlined in: Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide [version 2; peer review: 2 approved]. F1000Research 2021, 9:1257 https://doi.org/10.12688/f1000research.26932.2 17 Provide useful documentation to your software, pre-built packages or container images that others can use Image Will it work? by Randall Monroe CC BY-NC 2.5 Consider this a good time to also add some useful documentation to your software. Better yet, make the code easy to run on other systems by providing pre-built packages or container images that others can use. 18 Thank you Liffers, Matthias (2021, July 12). Software publishing, licensing, and citation. Zenodo. http://doi.org/10.5281/zenodo.5091717 ARDC is enabled by NCRIS. CONTACT ardc.edu.au contact@ardc.edu.au FOLLOW Twitter: @ARDC_AU LinkedIN: Australian-research-data-commons