HobnobMancer/cazy_webscraper: v2.3.0
Description
What's Changed
- Issue 111 + 112 uniprot by @HobnobMancer in https://github.com/HobnobMancer/cazy_webscraper/pull/115
Full Changelog: https://github.com/HobnobMancer/cazy_webscraper/compare/v2.2.8...v2.3.0
New in version 2.3.0
- Downloading protein data from UniProt is several magnitudes faster than before - and should have fewer issues with using older version of
bioservices
- Uses
bioservices
mapping to map directly from NCBI protein version accession to UniProt cw_get_uniprot_data
not longer calls to NCBI and thus no longer requires an email address as a positional argument
- Uses
Updated database schema: Changed
Genbanks 1--* Uniprots
toGenbanks *--1 Uniprots
.Uniprots.uniprot_id
is now listed in theGenbanks
table, instead of listingGenbanks.genbank_id
in theUniprots
tableRetrieve taxonomic classifications from UniProt
- Use the
--taxonomy
/-t
flag to retrieve the scientific name (genus and species) for proteins of interest - Adds downloaded taxonomic information to the
UniprotsTaxs
table
- Use the
Improved clarrification of deleting old records when using
cw_get_uniprot_data
- Separate arguments to delete Genbanks-EC number and Genbanks-PDB accession relationships that are no longer listed in UniProt for those proteins in the local CAZyme database for proteins whom data is downloaded from UniProt
- New args:
--delete_old_ec_relationships
= deletes Genbank(protein)-EC number relationships no longer in UniProt--delete_old_ecs
= deletes EC numbers in the local db not linked to any proteins--delete_old_pdb_relationships
= deletes Genbank(protein)-PDB relationships no longer in UniProt--delete_old_pdbs
= deletes PDB accessions in the local db not linked to any proteins
Retrieve the local db schema
- New command
cw_get_db_schema
added. - Retrieves the SQLite schema of a local CAZyme database and prints it to the terminal
- New command
Added option to skip retrieving the latest taxonomic classifications NCBI taxonomies
- By default, when retreiving data from CAZy,
cazy_webscraper
retrieves the latest taxonomic classifications for proteins listed under multiple tax - To increase scrapping time, and to reduce burden on the NCBI-Entrez server, if this data is not needed (e.g. GTDB taxs will be use) this step can be skipped by using the new
--skip_ncbi_tax
flag. - When skipping retrieval of the latest taxa classifications from NCBI,
cazy_webscraper
will add the first taxa retrieved from CAZy for those proteins listed under mutliple taxa
- By default, when retreiving data from CAZy,
Files
HobnobMancer/cazy_webscraper-v2.3.0.zip
Files
(1.7 MB)
Name | Size | Download all |
---|---|---|
md5:99b52f0d1f697f537c5425bf516157a1
|
1.7 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/HobnobMancer/cazy_webscraper/tree/v2.3.0 (URL)