CHExNet: A network of collaborators in early modern Jagiellonian University

do Valle Miranda, Luiz; Nalepa, Grzegorz J.

doi:10.5281/zenodo.18715362

Published February 20, 2026 | Version 0.0.2

Dataset Open

CHExNet: A network of collaborators in early modern Jagiellonian University

1. Jagiellonian University

The present repository contain the following files:

CHExNet.pkl network file: a two-layer temporal network stored as a nested Python dictionary with top-level keys layer_1 (AUJ, co-presence) and layer_2(BJ, collaboration). Each layer maps discrete time-slice IDs (e.g., 93, 94, …) to a record with: time (Python datetime.date), matrix (SciPy CSR sparse weighted adjacency matrix, shape N×N), and ids_pos_mat (dict mapping authority file final_id to row/column indices in matrix). Sparse adjacency (CSR) is used to efficiently store mostly-zero graphs.

authority_file_cac_alma.parquet: Parquet table of person authority records (ALMA) with crosswalk to CAC. One row per person (final_id); includes alma_name (authority heading), alma_born/alma_died (years, may be missing/NaN), cac_id (linked CAC identifier, optional), and first_polish_pub (earliest Polish publication year, optional).

ALMA_matched.pkl is a pickled Python object (typically a list[dict]) containing enriched ALMA bibliographic records with normalized titles, date ranges, places, and reconciled contributor/subject entities.

Each element (row) is one ALMA record with these fields (as observed):

ALMA_id (str): ALMA MMS/bib identifier for the record.
autors_list (list[str] | None): list of main author name strings (may be missing).
title (str | None): main title.
subtitle (str | None): subtitle (may be missing).
publication_city (list[str] | None): place statements as extracted strings (often “a: … | d: …”).
date_start (datetime.date | None): normalized start date for the record’s date span.
date_end (datetime.date | None): normalized end date for the record’s date span.
record_type (str | None): high-level type label (e.g., “Tekst”, “Obraz”).
all_names (numpy array of object | None): all person-name strings associated with the record (authors + contributors + other roles), stored as a NumPy array.
all_names_final_id (tuple[int] | None): aligned tuple of IDs corresponding to all_names (same order) and matcheable with authority file's final_id.
contributing_persons (list[tuple[str, list[str] | None]] | None): contributor persons as (name, roles) pairs, where roles is a list of role labels (e.g., ['Oprac.'], ['Redakcja']) or None.
contributing_organizations (list[tuple[str, str | None]] | None): contributor organizations as (org_name, role) pairs (role may be None).
genre (list[str] | None): genre/form terms (can contain duplicates).
subjects_people (list[str] | None): subject people headings (optional).
subjects_topics (list[str] | None): topical subject headings (optional).
subjects_places (list[str] | None): geographic subject headings (optional).

CAC_matched.pkl is a pickled Python object (typically a list[dict]) containing normalized person-event records from CAC, with standardized place fields, date ranges, and a compact metadata dictionary describing the event.

Each element (row) represents one event instance with these fields (as observed):

person_id (int): internal person identifier (CAC person/entity ID).
event_place (str | None): event place label (human-readable).
event_place_parent (str | None): parent place/institution label (human-readable) - this is the value used for co-presence.
event_place_id (float | int | None): normalized place identifier for event_place (often float due to missing values/NaNs).
event_place_parent_id (float | int | None): normalized place identifier for event_place_parent.
date_start (pandas Timestamp | None): normalized start date (inclusive) for the event time span.
date_end (pandas Timestamp | None): normalized end date (inclusive) for the event time span.
event_metadata (dict | None): event-specific attributes; keys vary by event type, commonly:
- event_type (str): e.g., “uzyskanie stopnia”, “koniec/ustanie funkcji”
- degree_type (str, optional): e.g., “magister”
- science_field (str, optional): e.g., “sztuki wyzwolone/filozofia”
- position (str, optional): e.g., “prowizor”

time_id.pkl is a pickled Python dictionary that assigns a unique integer ID to each discrete time slice (half-year steps).

For more information on how to use the files, see luizdovalle2/CHExNET---Analysis and the article CHExNet: A 400-years Multilayer Network of Early Modern Collaboration at the Jagiellonian University

Files

Files (14.6 MB)

Name	Size	Download all
ALMA_matched.pkl md5:584e58250e723ae8949d89bc84917ec1	4.0 MB	Download
authority_file_cac_alma.parquet md5:0319c5f04c4a74bbc6cf2b5bf18a1722	126.7 kB	Download
CAC_matched.pkl md5:89dc922da995db35bf57acef8adf2d34	3.2 MB	Download
CHExNet.pkl md5:262ebb821b1691ab88c6840ba251a296	7.2 MB	Download
time_id.pkl md5:d69a1100dd6da463a065037ce8aa1e73	15.8 kB	Download

	All versions	This version
Views	21	14
Downloads	20	20
Data volume	58.2 MB	58.2 MB

CHExNet: A network of collaborators in early modern Jagiellonian University

Authors/Creators

Description

Files

Files (14.6 MB)