# Indo-German Literature Dataset 

Indo-German Literature Dataset (IGLD) *1990–2022*.

## Description

The IGLD is a dataset which is a mirror of the data utilised in the [SEASON project](https://github.com/kalawinka/season) selected from OpenAlex. It contains Indo-German research articles for research of academic collaboration between 1990 and 2022.

## Keywords

OpenAlex, Indo-German collaboration, reproducible bibliometrics dataset

## Paper

Our paper describing our work in the SEASON project will be published soon:

Mir, A. A., Smirnova, N., Ramalingam, J., & Mayr, P. (2024). *The rise of Indo-German collaborative research: 1990–2022*.   

## Usage

#### Description of Selection and Cleaning

The following search query: CU (“GERMANY” AND “INDIA”) was used to retrieve the data from WoS. The data were retrieved from the year 1990 till the 30th of November 2022. A total of 36,999 records were retrieved against the employed query. For the present dataset, we retrieved only articles identical to those from WoS.

Our original dataset retrieved from WoS consisted of 36,999 entries. 33,319 entries possess a valid DOI, and 3,680 entries do not have a DOI. Therefore, we developed two approaches for retrieving desired data from the Openalex collection. Articles possessing a DOI were matched by DOI (dataset 1), and articles without DOI (dataset 2) were matched by article title and publication year. 

Afterwards, DOIs in dataset 1 were additionally compared to the DOIs from the original WoS dataset, all inconsistencies were removed. 

For dataset 2, authors were additionally checked. Authors’ surnames from dataset 2 and authors’ surnames from corresponding articles (matching by title and publication year) from the WoS dataset were compared. Only articles with matching publishing years, author surnames lists and titles were considered for the Openalex dataset. Following, dataset 1 and dataset 2 were combined into one final dataset (Openalex data). 

Additionally, all duplicates (by article ID) were removed from the Openalex data. In the final step, we checked if all entries contained both German and Indian affiliations. Some inconsistencies with the WoS data were observed: 5,584 entries, which have both Indian and German affiliations in WoS had only one of the indicated above affiliations in the Openalex. These entries were removed from the final dataset. The final dataset resulted in 22,844 unique entries. 

#### Column Descriptions

These descriptions are relevant summaries or extracts from the documentation at   
<https://docs.openalex.org/api-entities/works/work-object>,   
<https://docs.openalex.org/api-entities/authors/author-object> and   
<https://docs.openalex.org/api-entities/institutions/institution-object>. 

- article_id  
  (Work attribute)
  - OpenAlex identifier for the article / work.   
    To retrieve the work you can visit https://openalex.org/works/<article_id>
- doi  
  (Work attribute)
  - Digital Object Identifier for the work.   
    Consists of a URL to doi.org
- title  
  (Work attribute)
  - Title of the work.
- article_display_name  
  (Work attribute)
  - Duplicate of "title" column, retained to match other OpenAlex objects' attribute.
- publication_year  
  (Work attribute)
  - The year in which the work was published.  
    Please note that this is respective to the version of the work captured by OpenAlex as this particular entry. Other and potentially earlier published versions may be accessible in the work's location field, accessible from OpenAlex.
- publication_date  
  (Work attribute)
  - An ISO 8601 formatted date for the publication of the work.   
    The same caveat to publication_year applies to publication_date.
- article_type  
  (Work attribute)
  - Type of work.  
    E.g. Article, conference paper, report, dataset, etc.
- article_type_crossref  
  (Work attribute)
  - Legacy type information inherited from Crossref.
- article_cited_by_count  
  (Work attribute)
  - Number of citations to the work.
- article_cited_by_api_url  
  (Work attribute)
  - A OpenAlex URL that allows the user to view the works which cite this work.

  article_grants  
  (Work attribute)
  - A list of details for the grants which the work is in receipt from.  
    This information is gathered from Crossref and is described by OpenAlex at time of publication as "limited".
- article_referenced_works_count  
  (Work attribute)
  - Number of works within OpenAlex that this work cites.  
    Please note that the total number of references in the work may be higher
- language  
  (Work attribute)
  - The ISO 639-1 style Language of the work.  
    This attribute is inferred a software library ([langdetect](https://pypi.org/project/langdetect/)) used by OpenAlex based on the abstract, or title if the abstract is not available.
- article_counts_by_year  
  (Work attribute)
  - A list of the citation count of this work per year, for up to the last 10 years.
- article_locations_count  
  (Work attribute)
  - Number of locations this work can be found.  
    In OpenAlex, "locations" refer to the places on the internet where versions of this work is accessible.
- author_id  
  (Author attribute)
  - OpenAlex identifier for an author of the work.  
    To retrieve OpenAlex's bibliography for this user you may visit <https://openalex.org/authors/<author_id>>.   
    The following author attributes are associated with the author identifier in each row, please note that a work with multiple authors may have multiple rows, one for each author in OpenAlex.
- orcid  
  (Author attribute)
  - ORCID identifier for the author.
- author_name  
  (Author attribute)
  - Name of the author.
- author_name_alternatives  
  (Author attribute)
  - Alternative formats for the author's name which OpenAlex has observed.
- author_works_count  
  (Author attribute)
  - Number of works the author has created.
- author_cited_by_count  
  (Author attribute)
  - Number of works which cite a work the author has created.
- author_last_known_institution  
  (Author attribute)
  - Identifier for the institution with which the author is affiliated with, in the most recent publication from the author containing an institutional identifier.  
    Please note this may differ from the institution associated with the author at time of the work's release, which is listed in this database as "institution_id".
- author_summary_stats  
  (Author attribute)
  - OpenAlex's citation metrics for the author.  
    These include citation count, i10-index, h-index and more.
- institution_id  
  (Institution attribute)
  - OpenAlex identifier for the institution associated with the author when the work was published.
- ror  
  (Institution attribute)
  - Research Organization Registry (ROR) identifier for the institution.
- institution_name  
  (Institution attribute)
  - Name of the institution.
- institution_country_code  
  (Institution attribute)
  - ISO 3166-1 Alpha-2 (two-letter) country code for the country in which the institution is located.
- insitution_type  
  (Institution attribute)
  - ROR-style primary type for the institution.
- institution_homepage_url  
  (Institution attribute)
  - A URL for the institution's primary homepage
- institution_display_name_acroynyms  
  (Institution attribute)
  - Known acronyms or initialisms for the institution.
- institution_display_name_alternatives  
  (Institution attribute)
  - Alternative names for the institution.
- institution_works_count  
  (Institution attribute)
  - The number of works created by authors affiliated with this institution.
- institution_cited_by_count  
  (Institution attribute)
  - The number of works that cite a work created by authors affiliated by the institution.


- institution_summary_stats  
  (Institution attribute)
  - Citation metrics for the institution  
    Similar to author_summary_stats.

## Information

### Contact Information

Nina Smirnova - [nina.smirnova@gesis.org](mailto:nina.smirnova@gesis.org)

### Publication Information

Released to Zenodo 1st Feb 2024

### Acknowledgements 

This work was funded by DAAD–UGC Project-based Personnel Exchange Programme (PPP 2022) via the project “Social Network and Scientometric Analysis in Collaborative Research Publications between India and Germany” (SEASON -- <https://github.com/kalawinka/season/>), grant numbers: DAAD project number: 57608852; UGC project number: 1-10/2020(IC). This work was funded by the Federal Ministry of Education and Research via funding numbers: 16WIK2301B, The OpenBib project. We acknowledge support by the Federal Ministry of Education and Research, Germany under grant number 01PQ17001, the Competence Network for Bibliometrics. Nina Smirnova received funding from the German Research Foundation (DFG) via the project "POLLUX" under grant number MA 3964/7-2.

### Copyright

This dataset is released under CC-BY-4 (Creative Commons Attribution 4.0 International).