CHExNet: A network of collaborators in early modern Jagiellonian University
Description
The present repository contain the following files:
CHExNet.pkl network file: a two-layer temporal network stored as a nested Python dictionary with top-level keys layer_1 (AUJ, co-presence) and layer_2(BJ, collaboration). Each layer maps discrete time-slice IDs (e.g., 93, 94, …) to a record with: time (Python datetime.date), matrix (SciPy CSR sparse weighted adjacency matrix, shape N×N), and ids_pos_mat (dict mapping authority file final_id to row/column indices in matrix). Sparse adjacency (CSR) is used to efficiently store mostly-zero graphs.
authority_file_cac_alma.parquet: Parquet table of person authority records (ALMA) with crosswalk to CAC. One row per person (final_id); includes alma_name (authority heading), alma_born/alma_died (years, may be missing/NaN), cac_id (linked CAC identifier, optional), and first_polish_pub (earliest Polish publication year, optional).
ALMA_matched.pkl is a pickled Python object (typically a list[dict]) containing enriched ALMA bibliographic records with normalized titles, date ranges, places, and reconciled contributor/subject entities.
Each element (row) is one ALMA record with these fields (as observed):
-
ALMA_id(str): ALMA MMS/bib identifier for the record. -
autors_list(list[str] | None): list of main author name strings (may be missing). -
title(str | None): main title. -
subtitle(str | None): subtitle (may be missing). -
publication_city(list[str] | None): place statements as extracted strings (often “a: … | d: …”). -
date_start(datetime.date | None): normalized start date for the record’s date span. -
date_end(datetime.date | None): normalized end date for the record’s date span. -
record_type(str | None): high-level type label (e.g., “Tekst”, “Obraz”). -
all_names(numpy array of object | None): all person-name strings associated with the record (authors + contributors + other roles), stored as a NumPy array. -
all_names_final_id(tuple[int] | None): aligned tuple of IDs corresponding toall_names(same order) and matcheable with authority file'sfinal_id. -
contributing_persons(list[tuple[str, list[str] | None]] | None): contributor persons as(name, roles)pairs, whererolesis a list of role labels (e.g.,['Oprac.'],['Redakcja']) orNone. -
contributing_organizations(list[tuple[str, str | None]] | None): contributor organizations as(org_name, role)pairs (role may beNone). -
genre(list[str] | None): genre/form terms (can contain duplicates). -
subjects_people(list[str] | None): subject people headings (optional). -
subjects_topics(list[str] | None): topical subject headings (optional). -
subjects_places(list[str] | None): geographic subject headings (optional).
CAC_matched.pkl is a pickled Python object (typically a list[dict]) containing normalized person-event records from CAC, with standardized place fields, date ranges, and a compact metadata dictionary describing the event.
Each element (row) represents one event instance with these fields (as observed):
-
person_id(int): internal person identifier (CAC person/entity ID). -
event_place(str | None): event place label (human-readable). -
event_place_parent(str | None): parent place/institution label (human-readable) - this is the value used for co-presence. -
event_place_id(float | int | None): normalized place identifier forevent_place(often float due to missing values/NaNs). -
event_place_parent_id(float | int | None): normalized place identifier forevent_place_parent. -
date_start(pandasTimestamp| None): normalized start date (inclusive) for the event time span. -
date_end(pandasTimestamp| None): normalized end date (inclusive) for the event time span. -
event_metadata(dict | None): event-specific attributes; keys vary by event type, commonly:-
event_type(str): e.g., “uzyskanie stopnia”, “koniec/ustanie funkcji” -
degree_type(str, optional): e.g., “magister” -
science_field(str, optional): e.g., “sztuki wyzwolone/filozofia” -
position(str, optional): e.g., “prowizor”
-
time_id.pkl is a pickled Python dictionary that assigns a unique integer ID to each discrete time slice (half-year steps).
For more information on how to use the files, see luizdovalle2/CHExNET---Analysis and the article CHExNet: A 400-years Multilayer Network of Early Modern Collaboration at the Jagiellonian University