callahantiff/PheKnowLator: v3.0.2
- 1. CU Anschutz Medical Campus
- 2. Università degli Studi di Milano
Description
Release: v3.0.2
Website: https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0 Data Access: Google Cloud Storage -- PheKnowLator Bucket Docker Container: DockerHub Dedicated Project Container PyPI: pkt-kg 3.0.2
Updated Jupyter Notebooks:
Updated Scripts:
builds/data_preprocessing.pypkt_kg/metadata.pypkt_kg/utils/kg_utils.pybuilds/data_to_download.txtpkt_kg/utils/data_utils.pytests/test_data_utils_downloading.py
Updates
-
Addresses issue #118 (PR: #119) by patching the prior functionality related to obtaining labels and definitions from ontologies. Specifically, it now ensures that whenever possible the language encoding for these fields is English. Please see details below for information on how to address nodes containing foreign characters prior to this release.
Solution for Builds Prior to
v3.0.2The (bad_node_patch.json) file contains a dictionary where the outer keys are theentity_uriand the puter values are another dictionary where the inner keys arelabelanddescription/definitionand the inner values for these inner keys are the updated strings without foreign characters. An example of this dictionary is shown below:key = '<http://purl.obolibrary.org/obo/UBERON_0000468>' print(bad_node_patch[key]) >>> {'label': 'multicellular organism', 'description/definition': 'Anatomical structure that is an individual member of a species and consists of more than one cell.'}The code to identify the nodes with erroneous foreign characters is shown below:
import re import pandas as pd # link to downloaded `NodeLabels.txt` file input_file = `'NodeLabels.txt'` # load data as Pandas DataFrame nodedf = pd.read_csv(input_file, sep='\t', header=0) # identify bad nodes and filter DataFrame so it only contains these rows nodedf['bad'] = nodedf['label'].apply(lambda x: re.search("[\u4e00-\u9FFF]", x) if not pd.isna(x) else None) nodedf_bad_nodes = nodedf[~pd.isna(nodedf['bad'])].drop_duplicates()
Files
callahantiff/PheKnowLator-v3.0.2.zip
Files
(64.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:46b9b639e9eb6662358e60ebdb33b57f
|
64.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/callahantiff/PheKnowLator/tree/v3.0.2 (URL)