Data and scripts for bibliographic analysis
===========================================

Data files (folder Data)
------------------------

- `rp.bib`: BibTeX file with the bibliographic data (this file should be in the root folder, not in the Data folder)
- `affiliations.json`: json file with affiliations and countries of each author of each paper, labels as in `rp.bib`, created by `get_affiliations.py`
- `affiliationsCountry_network.gephi`: Gephi project file with the affiliation network on the country level, based on `affiliationsCountry.graphml.gz`
- `affiliationsCountry.graphml.gz`: graphML file with the collaboration network on country level, created by `affiliations_network.py`
- `affiliationsInstitute_network.gephi`: Gephi project file with the affiliation network, based on `affiliationsInstitute.graphml.gz`
- `affiliationsInstitute.graphml.gz`: graphML file with the collaboration network on institutional level, created by `affiliations_network.py`
- `authorperpaper.txt`: ascii file with the number of authors for each paper, labels as in `rp.bib`, created by `coauthor_network.py`
- `authorspublications.txt`: ascii file with the number of papers as first author and total number of papers for each author, created by `coauthor_network.py`
- `authorstime.txt`: ascii file with the time an author has spent in the field of recurrence analysis, created by `coauthor_network.py`
- `cache_crossref.json`: cache file consisting of the crossref ID and the corresponding DOI and publication year, created and used by `get_citations.py`
- `citations_citationTime.txt`: ascii file with the list of cited papers and the time between its publication and the citations, created by `get_citations.py`
- `citations_network.gephi`: Gephi project file with the citation network, based on `citations.graphml.gz`
- `citations_years.txt`: ascii file with the list of papers and the number of their citations, sorted in descending way, created by `citation_network.py`
- `citations.graphml.gz`: graphml file with the co-author network, created by `citation_network.py`
- `citations.json`: json file with the references for each paper, represented as the BibTeX labels as in `rp.bib`, created by `get_citations.py`
- `cluster_allyears.txt`: ascii file with all clusters and the assigned papers (given by the BibTeX labels), created by `cluster_subjects.py`
- `cluster_DBindex.txt`: ascii file with the Davies Bouldin index to find the optimal number of clusters, created by `cluster_subjects.py`, used by `fig_paperClusters.m`
- `cluster_individualyears.txt`: ascii file with all clusters and the assigned papers for time intervals (given by the BibTeX labels), created by `cluster_subjects.py`
- `cluster_stat.txt`: ascii file with the fraction of each cluster for time intervals, created by `cluster_subjects.py`, used by `fig_paperClusters.m`
- `cluster_topWords.txt`: ascii file with the top feature words, created by `cluster_subjects.py`, used to assign subjects to the clusters
- `coauthor_network_filtered.gephi`: Gephi project file with the filtered co-author network on the country level, based on `coauthor_network.graphml.gz`
- `coauthor_network_full.gephi`: Gephi project file with the unfiltered co-author network on the country level, based on `coauthor_network.graphml.gz`
- `coauthor_network.csv`: csv file with the nodes of the full co-author network, created in Gephi from `coauthor_network_full.gephi`
- `coauthor_network.graphml.gz`: graphml file with the co-author network, created with `coauthor_network.py`
- `country_coordinates.json`: json file with coordinates of countries, created and used by `affiliations_network.py`
- `journals.txt`: ascii file with the journals rank, number of published papers and fraction, created by `fig_journals.m`
- `rp_time.dat`: ascii file with the monthly number of publications in the `rp.bib` (last 5 years), provided from the monthly update process 
- `subjects.json`: json file with the subjects for each paper, labels as in `rp.bib`, created by `get_subjects.py`; but not used in the paper

Scripts for data retrieval
--------------------------

- `get_affiliations.py`: Python script to retrieve the affiliations/ institutes for each co-author of each paper using Scopus, results are stored in `affiliations.json`
- `get_citations.py`: Python script to retrieve the references for each paper using crossref and semanticscholar, results are stored in `citations.json`, supporting file `cache_crossref.json`
- `get_subjects.py`: Python script to retrieve the subjects for each paper using crossref, results are stored in `subjects.json`

Scripts for analysis
--------------------

- `affiliations_network.py`: Python script to create collaboration network on the country level, uses `affiliations.json` and `country_coordinates.json`
- `citation_network.py`: Python script to prepare basic information and data on citations, needs `rp.bib` and `citations.json`
- `cluster_subjects.py`: Python script to find clusters of the papers using simple KMeans clustering, needs `rp.bib`
- `coauthor_network.py`: Python script to create lists of authors and firstauthors, and creates a co-author network
- `fig_affiliations.m`: MATLAB script to create a figure
- `fig_authors.m`: MATLAB script to create a figure on all and new authors per year, needs `rp.bib` and `coauthor_network.csv`
- `fig_citations.m`: MATLAB script to create figures with the publication year of the most cited papers, distribution of time between publication and citation, distribution of citation numbers, and table on top cited papers; needs `citations_years.txt`, `citations_citationTime.txt`
- `fig_database.m`: MATLAB script to create figure on monthly database entries, needs `rp_time.dat`
- `fig_journals.m`: MATLAB script to create a ranklist of journals and a figure on journal development regarding number of publications 
- `fig_paperClusters.m`: MATLAB script to create figures related to clustering, i.e., plot the Davies Bouldin index, the fraction of each cluster for time intervals, needs `cluster_DBindex.txt`, `cluster_stat.txt`
- `fig_publicationsPerYear.m`: MATLAB script to create a figure with the number of publications per year, needs `rp.bib`

Supporting files
----------------

- `readBibTeX.m`: MATLAB function to import BibTeX content as a data structure

