PathOS Impact of Open Access Routes on Topic Persistence Case Study Research Data, Code, and Analysis Results
Authors/Creators
Description
This repository contains the data, scripts, and results for the Impact of Open Access Routes on Topic Persistence case study, part of the PATHOS project.
Overview
Artificial intelligence methods are being rapidly mobilized to tackle the climate crisis, but the knowledge base often burns bright and fades quickly. This case study asks whether two distinct Open Access (OA) routes help AI-for-Climate research topics stay active in the literature:
-
Green OA: self-archiving in repositories
-
Published OA: journal-mediated open access with a clear licence
Bronze OA and dual-mode publications are excluded for treatment clarity. Closed Access (CA) articles serve as the counterfactual.
By foregrounding topic persistence as a key dimension of impact, the study goes beyond short-term citation counts and investigates whether openness helps research topics remain visible long enough to demonstrate their potential.
Repository Structure
├── README.md
├── fos_taxonomy_v0.1.2.json
├── persistent_topics_create_collection.py
├── persistent_topics_find_paper_openaireids.py
├── persistent_topics_find_paper_affiliations.py
├── persistent_topics_get_collection_author_gender.py
├── persistent_topics_calculate_indicators.py
├── persistent_topics_calculate_indicators_sdg.py
├── persistent_topics_indicators_create_data_for_vis.py
└── persistent_topics_collection_w_outcomes/
├── complete_collection_df.parquet / .xlsx
├── topic_attribution_df.parquet / .xlsx
├── results/
│ ├── analysis_conclusions.txt
│ ├── summary_statistics.xlsx
│ ├── treatment_effects_green_oa.xlsx
│ ├── treatment_effects_published_oa.xlsx
│ ├── descriptive_effects_any_oa.xlsx
│ ├── tables/
│ │ ├── 01_executive_summary.xlsx
│ │ ├── 02_treatment_group_characteristics.xlsx
│ │ ├── 03_causal_effects_summary.xlsx
│ │ ├── 04_topic_persistence_analysis.xlsx
│ │ ├── 05_gender_equity_outcomes.xlsx
│ │ ├── 06_economic_impact_analysis.xlsx
│ │ ├── 07_publication_year_analysis.xlsx
│ │ └── 08_robustness_analysis.xlsx
│ ├── visualizations/
│ │ ├── 01_sample_overview.png
│ │ ├── 02_causal_effects.png
│ │ ├── 03_outcome_analysis.png
│ │ └── 04_temporal_and_balance.png
│ └── final_visualization_data_figures/
│ ├── data/
│ └── figures/
└── results_sdg_only/
├── sdg_analysis_conclusions.txt
├── green_matched_sdg_papers.xlsx
├── published_matched_sdg_papers.xlsx
├── closed_matched_a_sdg_papers.xlsx
├── closed_matched_b_sdg_papers.xlsx
├── tables/
│ ├── 01_sdg_distribution_matched_samples.xlsx
│ ├── 02_sdg_treatment_effects.xlsx
│ ├── 03_sdg_vs_non_sdg_comparison.xlsx
│ ├── 04_sdg_categories_by_impact.xlsx
│ ├── 05_sdg_gender_industry_collaboration.xlsx
│ ├── 06_sdg_analysis_summary.xlsx
│ ├── 07_sdg_alignment_comparison_matched.xlsx
│ └── 08_sdg_alignment_effects_summary.xlsx
└── visualizations/
├── 01_sdg_distribution_overview.png
├── 02_sdg_treatment_effects.png
├── 03_sdg_impact_analysis.png
└── 04_sdg_alignment_comparison_matched.png
Data Sources
External Data Sources (not included)
-
Semantic Scholar Academic Graph: full publication metadata
-
OpenAIRE Graph: European research infrastructure data
-
PATSTAT: patent database for citation analysis
-
ROR: Research Organization Registry
-
SciNoBo toolkit: FOS classification, interdisciplinarity, SDG mapping, FWCI scores
Included Data
-
Complete processed collection with outcomes
-
Topic attribution dataset (paper-topic mappings, persistence scores)
-
Analysis results: matched samples, treatment effects, summary statistics
-
SciNoBo Field of Science taxonomy (
fos_taxonomy_v0.1.2.json)
Scripts
Data Processing
-
persistent_topics_create_collection.py– integrates multiple data sources, outcomes, affiliations, patent citations -
persistent_topics_find_paper_openaireids.py– maps DOIs to OpenAIRE IDs -
persistent_topics_find_paper_affiliations.py– extracts affiliations, science-industry collaboration -
persistent_topics_get_collection_author_gender.py– gender classification of authors
Analysis
-
persistent_topics_calculate_indicators.py– main causal inference analysis (PSM for Green OA vs CA, Published OA vs CA) -
persistent_topics_calculate_indicators_sdg.py– SDG-focused treatment effects -
persistent_topics_indicators_create_data_for_vis.py– prepares final visualization datasets and figures
Key Findings
Sample
-
Total: 132,134 papers (2000–2021)
-
Green OA: 3,792 papers
-
Published OA: 19,045 papers
-
Closed Access: 92,998 papers
Contributions
-
New Topic Persistence Metric for long-term impact
-
Clean OA treatment definitions (excluding dual-mode and Bronze)
-
Separate analysis of Green vs Published OA pathways
Main Results
-
8 significant causal effects across outcomes
-
Enhanced topic persistence in OA papers
-
Positive gender equity outcomes
-
Evidence of economic impact (patents, collaborations)
SDG Findings
-
24,948 SDG-relevant papers (18.9% of sample)
-
11 significant treatment effects for SDG-related research
-
Stronger knowledge sustainability for achieving SDG goals
Methodology
Design
-
Propensity Score Matching (PSM) with balanced covariates
-
Separate analyses for Green OA vs CA and Published OA vs CA
-
Robust outcome metrics (including new persistence measure)
Treatment Definitions
-
Green OA: repository-based
-
Published OA: journal-based (gold, hybrid, diamond)
-
Closed Access: no open provision
-
Excluded: dual-mode and Bronze OA
Outcomes
-
Citation impact (traditional)
-
Topic persistence (novel metric)
-
Gender equity in authorship
-
Economic impact (patents, collaboration)
-
Field effects (disciplinary and SDG)
Files
pathos_persistent_topics_case_study_files.zip
Files
(48.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c2b35dab093b53ccdf1ebfa2fa3b89a5
|
48.8 MB | Preview Download |