HobnobMancer/cazy_webscraper: v2.3.0

Emma Hobbs; Leighton Pritchard

doi:10.5281/zenodo.7962936

Published May 23, 2023 | Version v2.3.0

Software Open

HobnobMancer/cazy_webscraper: v2.3.0

1. University of St Andrews
2. University of Strathclyde

What's Changed

Issue 111 + 112 uniprot by @HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/115

Full Changelog: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.2.8...v2.3.0

New in version 2.3.0

Downloading protein data from UniProt is several magnitudes faster than before - and should have fewer issues with using older version of bioservices
- Uses bioservices mapping to map directly from NCBI protein version accession to UniProt
- cw_get_uniprot_data not longer calls to NCBI and thus no longer requires an email address as a positional argument
Updated database schema: Changed Genbanks 1--* Uniprots to Genbanks *--1 Uniprots. Uniprots.uniprot_id is now listed in the Genbanks table, instead of listing Genbanks.genbank_id in the Uniprots table
Retrieve taxonomic classifications from UniProt
- Use the --taxonomy/-t flag to retrieve the scientific name (genus and species) for proteins of interest
- Adds downloaded taxonomic information to the UniprotsTaxs table
Improved clarrification of deleting old records when using cw_get_uniprot_data
- Separate arguments to delete Genbanks-EC number and Genbanks-PDB accession relationships that are no longer listed in UniProt for those proteins in the local CAZyme database for proteins whom data is downloaded from UniProt
- New args:
  - --delete_old_ec_relationships = deletes Genbank(protein)-EC number relationships no longer in UniProt
  - --delete_old_ecs = deletes EC numbers in the local db not linked to any proteins
  - --delete_old_pdb_relationships = deletes Genbank(protein)-PDB relationships no longer in UniProt
  - --delete_old_pdbs = deletes PDB accessions in the local db not linked to any proteins
Retrieve the local db schema
- New command cw_get_db_schema added.
- Retrieves the SQLite schema of a local CAZyme database and prints it to the terminal
Added option to skip retrieving the latest taxonomic classifications NCBI taxonomies
- By default, when retreiving data from CAZy, cazy_webscraper retrieves the latest taxonomic classifications for proteins listed under multiple tax
- To increase scrapping time, and to reduce burden on the NCBI-Entrez server, if this data is not needed (e.g. GTDB taxs will be use) this step can be skipped by using the new --skip_ncbi_tax flag.
- When skipping retrieval of the latest taxa classifications from NCBI, cazy_webscraper will add the first taxa retrieved from CAZy for those proteins listed under mutliple taxa

Files

HobnobMancer/cazy_webscraper-v2.3.0.zip

Files (1.7 MB)

Name	Size	Download all
HobnobMancer/cazy_webscraper-v2.3.0.zip md5:99b52f0d1f697f537c5425bf516157a1	1.7 MB	Preview Download

Additional details

Is supplement to: https://github.com/HobnobMancer/cazy_webscraper/tree/v2.3.0 (URL)

	All versions	This version
Views	709	32
Downloads	71	4
Data volume	201.2 MB	6.8 MB

HobnobMancer/cazy_webscraper: v2.3.0

Creators

Description

Files

HobnobMancer/cazy_webscraper-v2.3.0.zip

Files (1.7 MB)

Additional details

Related works