# Appendix: Finding Open Data

The reusability of openly shared data relies on the prospects of it being found in the first place, therefore data findability is a key step in accessing and utilizing data. There are three major ways to find Open Data that are shared by researchers – repository, web search, and literature search.

### Repositories

Ideally, Open Data should be available in repositories where the datasets are properly indexed and assigned a unique persistent identifier (as discussed in **Lesson 6 – Sharing Open Data**) thereby ensuring the data is unambiguously identifiable, searchable, discoverable along with associated metadata and documentations.

Therefore, the first step in finding Open Data related to your field is to identify discipline specific repositories (if there are any) and search for datasets there (see **Lesson 6.4 – Repositories and Other Sharing Methods**).

 Find repositories in your field:



* _[Re3data.org](http://re3data.org) is a global registry of research data repositories that covers research data repositories from different academic disciplines._
* _[FAIRsharing](https://fairsharing.org/) is a curated, informative, and educational resource on data and metadata standards, inter-related to databases and data policies._
* _Recommended repositories by publishers (e.g., Recommended Data Repositories suggested by [Scientific Data](https://www.nature.com/sdata/policies/repositories#envgeo) and[ PLOS One](https://journals.plos.org/plosone/s/recommended-repositories))_
* _[World Data System](https://www.worlddatasystem.org/) represents a network of repositories._

_Examples of generic repositories:_



* _[Zenodo](https://zenodo.org/)_
* _[Mendeley Data](https://data.mendeley.com/)_
* _[Figshare](https://figshare.com/)_
* _[Dryad](https://datadryad.org/stash)_

The[ Generalist Repository Comparison Chart](https://zenodo.org/record/3946720#.YUKQ18RS-Uk) is a tool you can use to decide where to store and share their FAIR data outside of their institutional repositories. Dataverse has also published a[ comparative review of eight data repositories.](https://dataverse.org/blog/comparative-review-various-data-repositories)


### Web-searches

To explore a wide variety of datasets from projects or popular topics, the use of a more general search engine can be helpful. Some disciplines or large institutions such as NASA and the National Institute of Health’s National Center for Biotechnology Information (NCBI) offer their own portal where you can search for their datasets, related publications and oftentimes tools for analysis (e.g., EMBL's European Bioinformatics Institute[ https://www.ebi.ac.uk/](https://www.ebi.ac.uk/) ). There are also an increasing number of international and national data portals to enable data discoveries.

### **Generic data search portals:**

* Google[ https://datasetsearch.research.google.com/](https://datasetsearch.research.google.com/)
* Kaggle[ https://www.kaggle.com/datasets](https://www.kaggle.com/datasets)
* Wikidata[ https://www.wikidata.org/wiki/Wikidata:Main_Page](https://www.wikidata.org/wiki/Wikidata:Main_Page)
* Open Data Network [https://www.opendatanetwork.com/](https://www.opendatanetwork.com/)
* Awesome Public Datasets[ https://github.com/awesomedata/awesome-public-datasets#readme](https://github.com/awesomedata/awesome-public-datasets#readme)

### **Examples of Discipline specific:**

* NASA Earth[ https://www.earthdata.nasa.gov/](https://www.earthdata.nasa.gov/)
* Cern[ https://opendata.cern.ch/](https://opendata.cern.ch/)
* NCBI National Center for Biotechnology Information[ https://www.ncbi.nlm.nih.gov/](https://www.ncbi.nlm.nih.gov/)
* EMBL's European Bioinformatics Institute[ https://www.ebi.ac.uk/](https://www.ebi.ac.uk/)
* ISPCR[ https://www.icpsr.umich.edu/web/pages/](https://www.icpsr.umich.edu/web/pages/)
* International Monetary Fund  [https://www.imf.org/en/Data](https://www.imf.org/en/Data)
* NOAA Climate Data Online [https://www.ncdc.noaa.gov/cdo-web/datasets](https://www.ncdc.noaa.gov/cdo-web/datasets)  
* Federal Reserve Economic Research [https://fred.stlouisfed.org/](https://fred.stlouisfed.org/)
* USGS EarthExplorer [https://earthexplorer.usgs.gov/](https://earthexplorer.usgs.gov/)
* Open Science Data Cloud (OSDC) [https://www.opensciencedatacloud.org/](https://www.opensciencedatacloud.org/)
* NASA Planetary Data System [https://pds.nasa.gov/](https://pds.nasa.gov/)


### **Examples of National or international data portal**

* US Federal data[ https://data.gov/](https://data.gov/)
* EU Data Portal[ https://data.europa.eu/en](https://data.europa.eu/en)
* WHO[ https://apps.who.int/gho/data/node.home](https://apps.who.int/gho/data/node.home)
* THE WORLD BANK [https://data.worldbank.org/](https://data.worldbank.org/)
* DATA.GOV.UK [https://www.data.gov.uk/](https://www.data.gov.uk/)
* UNICEF [https://data.unicef.org/](https://data.unicef.org/)


### Literature search

While not ideal, datasets are often attached to scholarly publications in the form of supplementary material, or referenced in text where to find them e.g. GitHub repository or personal/institutional websites. In addition, there are emerging journals and special collections/issues focused on describing and publishing data (e.g. Nucleic Acids Research database issues[ https://doi.org/10.1093/nar/gkab1195](https://doi.org/10.1093/nar/gkab1195), Scientific Data, Earth System Science Data, etc.). In other words, while the datasets are openly available in these media, they are not properly indexed and therefore not very findable nor machine readable.

Finding academic publications can be a challenge in itself depending on the discipline and field of study. For instance, in life science and biomedical research, there are a number of repositories and search engines (e.g. PubMed, EuropePMC) indexing research outputs (e.g. publications, abstracts, references and communications) from various journals.

However in other disciplines (e.g. arts and humanities), search is often carried out with general search engines or research databases such as Google Scholar and JSTOR. In that case, it is advisable to reach out to library personnel and community members for further advice on where to find related literature and data, see lesson 5.4 Help section.

**Generic:**

* Google Scholar[ https://scholar.google.com](https://scholar.google.com)
* Open knowledge map: A visual interface allowing the exploration of interconnected topics with relevant documents and concepts.  [https://openknowledgemaps.org/](https://openknowledgemaps.org/)
* JSTOR a wide range of scholarly content[ https://www.jstor.org/](https://www.jstor.org/)
* ResearchGate[ https://www.researchgate.net/search](https://www.researchgate.net/search)

**Discipline specific:**

* EuropePMC Life sciences [https://europepmc.org/](https://europepmc.org/)
* Pubmed biomedical literature [https://pubmed.ncbi.nlm.nih.gov/](https://pubmed.ncbi.nlm.nih.gov/)
* arXiv is a free distribution service and an open-access archive for scholarly pre-prints in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics [https://arxiv.org/](https://arxiv.org/)
* Biorxiv Preprint server for biology [https://www.biorxiv.org/](https://www.biorxiv.org/)
* EarthArXiv ([https://eartharxiv.org](https://eartharxiv.org)) and Earth and Space Science Open Archive ([https://essoar.org](https://essoar.org))
* ASAPbio provides a catalog of preprint servers [https://asapbio.org/preprint-servers](https://asapbio.org/preprint-servers)  



