Published February 20, 2026 | Version 0.0.2
Dataset Open

CHExNet: A network of collaborators in early modern Jagiellonian University

  • 1. ROR icon Jagiellonian University

Description

The present repository contain the following files:

CHExNet.pkl network file: a two-layer temporal network stored as a nested Python dictionary with top-level keys layer_1 (AUJ, co-presence) and layer_2(BJ, collaboration). Each layer maps discrete time-slice IDs (e.g., 9394, …) to a record with: time (Python datetime.date), matrix (SciPy CSR sparse weighted adjacency matrix, shape N×N), and ids_pos_mat (dict mapping authority file final_id to row/column indices in matrix). Sparse adjacency (CSR) is used to efficiently store mostly-zero graphs.

 

authority_file_cac_alma.parquet: Parquet table of person authority records (ALMA) with crosswalk to CAC. One row per person (final_id); includes alma_name (authority heading), alma_born/alma_died (years, may be missing/NaN), cac_id (linked CAC identifier, optional), and first_polish_pub (earliest Polish publication year, optional).

 

ALMA_matched.pkl is a pickled Python object (typically a list[dict]) containing enriched ALMA bibliographic records with normalized titles, date ranges, places, and reconciled contributor/subject entities.

Each element (row) is one ALMA record with these fields (as observed):

  • ALMA_id (str): ALMA MMS/bib identifier for the record.

  • autors_list (list[str] | None): list of main author name strings (may be missing). 

  • title (str | None): main title. 

  • subtitle (str | None): subtitle (may be missing). 

  • publication_city (list[str] | None): place statements as extracted strings (often “a: … | d: …”). 

  • date_start (datetime.date | None): normalized start date for the record’s date span. 

  • date_end (datetime.date | None): normalized end date for the record’s date span. 

  • record_type (str | None): high-level type label (e.g., “Tekst”, “Obraz”). 

  • all_names (numpy array of object | None): all person-name strings associated with the record (authors + contributors + other roles), stored as a NumPy array. 

  • all_names_final_id (tuple[int] | None): aligned tuple of IDs corresponding to all_names (same order) and matcheable with authority file's final_id

  • contributing_persons (list[tuple[str, list[str] | None]] | None): contributor persons as (name, roles) pairs, where roles is a list of role labels (e.g., ['Oprac.']['Redakcja']) or None

  • contributing_organizations (list[tuple[str, str | None]] | None): contributor organizations as (org_name, role) pairs (role may be None). 

  • genre (list[str] | None): genre/form terms (can contain duplicates). 

  • subjects_people (list[str] | None): subject people headings (optional). 

  • subjects_topics (list[str] | None): topical subject headings (optional). 

  • subjects_places (list[str] | None): geographic subject headings (optional). 

 

CAC_matched.pkl is a pickled Python object (typically a list[dict]) containing normalized person-event records from CAC, with standardized place fields, date ranges, and a compact metadata dictionary describing the event.

Each element (row) represents one event instance with these fields (as observed):

  • person_id (int): internal person identifier (CAC person/entity ID).

  • event_place (str | None): event place label (human-readable).

  • event_place_parent (str | None): parent place/institution label (human-readable) - this is the value used for co-presence.

  • event_place_id (float | int | None): normalized place identifier for event_place (often float due to missing values/NaNs).

  • event_place_parent_id (float | int | None): normalized place identifier for event_place_parent.

  • date_start (pandas Timestamp | None): normalized start date (inclusive) for the event time span.

  • date_end (pandas Timestamp | None): normalized end date (inclusive) for the event time span.

  • event_metadata (dict | None): event-specific attributes; keys vary by event type, commonly:

    • event_type (str): e.g., “uzyskanie stopnia”, “koniec/ustanie funkcji”

    • degree_type (str, optional): e.g., “magister”

    • science_field (str, optional): e.g., “sztuki wyzwolone/filozofia”

    • position (str, optional): e.g., “prowizor”

 

time_id.pkl is a pickled Python dictionary that assigns a unique integer ID to each discrete time slice (half-year steps).

 

For more information on how to use the files, see luizdovalle2/CHExNET---Analysis and the article CHExNet: A 400-years Multilayer Network of Early Modern Collaboration at the Jagiellonian University

Files

Files (14.6 MB)

Name Size Download all
md5:584e58250e723ae8949d89bc84917ec1
4.0 MB Download
md5:0319c5f04c4a74bbc6cf2b5bf18a1722
126.7 kB Download
md5:89dc922da995db35bf57acef8adf2d34
3.2 MB Download
md5:262ebb821b1691ab88c6840ba251a296
7.2 MB Download
md5:d69a1100dd6da463a065037ce8aa1e73
15.8 kB Download