Published December 4, 2023
| Version v1.0.0
Software
Open
BojarLab/glycowork: v1.0.0
Authors/Creators
Description
Change Log
- Added a Zenodo badge, to have a release-specific doi for glycowork
glycan_data
- Updated sugarbase database; sugarbase is now pickled, so literal evaluations are necessary
- Harmonized glycan column names across generated dataframes; all use 'glycan' now, 'target' has been deprecated
loader
- Updated
motif_listto be compatible with new position encoding - Added Internal_LewisX and Internal_LewisA to
motif_list(renamed LewisX and LewisA to Terminal_LewisX and Terminal_LewisA, correspondingly) - Made
df_speciesstatic again to speed up package import - Added
find_nth_reversehelper function that finds the starting index of the nth occurrence of a substring from the end of the string - Added
remove_unmatched_bracketshelper function to strip unmatched opening or closing brackets from glycan strings
motif
- Added more masses to mz_to_composition.csv /
mass_dict: Acetonitrile, Formate, Cl-, HCO3-, and NH4+
processing
- Extended
canonicalize_iupacto cases like "NeuGcα3Galβ3(NeuAcα6)GalNAcol" and even more modification formulations, e.g., "6S-GlcNAc" - Added
canonicalize_compositionto convert compositions formatted either in the style of HexNAc2Hex1Fuc3Neu5Ac1 or N2H1F3A1 into dictionaries used by glycowork - Added GalNAc4S to permitted reducing end monosaccharides for O-linked glycans in
enforce_class MissForestnow has a maximum number of iterations and will check for convergence each iteration (immediately finishing upon converging), yielding some speed-ups in most cases- The output of
min_process_glycansno longer contains empty strings for glycans ending in a linkage - Updated
choose_correct_isoformto be compatible with change inmin_process_glycans - Added
get_possible_linkagesto retrieve linkages matching a wildcarded linkage - Added
get_possible_monosaccharidesto retrieve monosaccharides matching a monosaccharide type (HexNAc, etc.) - Added decorators,
rescue_glycansandrescue_compositions, to canonicalize them in case a decorated function errors out - Added
linearcode_to_iupacto support LinearCode as input format for glycowork (this will be called withincanonicalize_iupacand the decorators); note that for now coverage may not be perfect yet - Added
iupac_extended_to_condensedto support IUPAC-extended as input format for glycowork (this will be called withincanonicalize_iupacand the decorators); note that for now coverage may not be perfect yet - Added
glycoct_to_iupacto support GlycoCT as input format for glycowork (this will be called withincanonicalize_iupacand the decorators); note that for now coverage may not be perfect yet - Added
wurcs_to_iupacto support WURCS as input format for glycowork (this will be called withincanonicalize_iupacand the decorators); note that for now coverage may not be perfect yet - Added
oxford_to_iupacto support Oxford as input format for glycowork (this will be called withincanonicalize_iupacand the decorators); note that for now coverage is limited check_nomenclature(formerly inmotif.tokenization) now handles outputting warning messages for trying to use non-string, non-graph nomenclatures or SMILES with glycowork functions- Expanded
find_isomorphsto generate more isomorphic sequence variants and thereby increasing the chances thatchoose_correct_isoformwill have access to the canonical sequence - Fixed a rare issue with
canonicalize_iupacwhere sequences coming fromstructure_to_basicwould sometimes be formatted incorrectly if they contained dHex - Fixed an issue in
find_isomorphsin which double branches were not always correctly swapped
analysis
get_heatmapnow no longer tries to convert data to relative abundances if negative values are detected in the input- All functions using dataframes as inputs in
analysiscan now also be used by providing full filepaths to the .csv file instead - Optimized some of the code for readability and speed (everything should be at least a bit faster now)
annotate
get_k_saccharidesis now allowed to generate new dynamic motifs with tokens outside of lib (viaexpand_lib)annotate_glycanandannotate_datasetnow also support narrow wildcards- Fixed an issue in
count_unique_subgraphs_of_size_kin which branched motifs were not always correctly formatted (i.e., opening/closing brackets) get_k_saccharidesnow outputs dataframes with counts as default and can yield the old nested lists of motifs by setting the new keywordjust_motifsto True- Fixed an edge case in which
get_k_saccharidessometimes overcounted individual monosaccharides if their strings overlapped
graph
subgraph_isomorphismandcompare_glycansnow support using wildcards and position encoding at the same time. Theextrakeyword argument is now deprecated and the functions auto-detect whether anything has been specified in wildcards and/or termini_listsubgraph_isomorphismandcompare_glycansnow support automatically inferred narrow wildcards to allow for (i) matching linkages like a1-? to only specified linkages within that group (e.g., a1-3 but not b1-3 etc.) and (ii) matching monosaccharide types like HexNAc to only specified monosaccharides of that type (e.g., GlcNAc but not Glc, etc.)- The
wildcard_listkeyword argument in all graph & annotation functions is now deprecated as wildcards are inferred automatically via narrow wildcards and native full wildcards (?1-? and Monosaccharide) subgraph_isomorphismnow behaves as expected for testing motifs ending in linkages on glycans ending in linkagessubgraph_isomorphismcan now return the matched subgraphs in the input glycan with the newreturn_matcheskeyword argumentglycan_to_nxGraphis now decorated with therescue_glycansdecorator, which auto-canonicalizes IUPAC strings if they are not in the format preferred by glycowork- Fixed mismatch of labels and string_labels in
categorical_node_match_wildcard - Fixed an issue in
subgraph_isomorphismin which, when using positional encoding, sometimes the mirror image of a motif was incorrectly captured if the termini aligned termini_listwithinsubgraph_isomorphismnow only requires the specification of monosaccharide positions- Added
expand_termini_listhelper function to facilitate the expansion of monosaccharide-onlytermini_listinto fulltermini_listbehind the scenes - Added support for shorthand notation of position encoding, now either 'terminal' or 't' will work
- Improved handling of complex branching in
graph_to_string; should be fewer unexpected translations now - Fixed an issue in
graph_to_stringin which induced subgraphs could cause errors due to unexpected or weirdly sorted node indices - Fixed an edge case in which the reducing end could be sometimes calculated as 'internal' when termini='calc' in
glycan_to_nxGraph - Deprecated a duplicate
character_to_labelandstring_to_labels - Deprecated
categorical_termini_match; the functionality is now handled withincategorical_node_match_wildcard - Deprecated the
wildcardskeyword argument fromcompare_glycansas this will now be detected internally, if wildcards are provided viawildcard_list
tokenization
- Composition functions (e.g.,
composition_to_mass) are now decorated withrescue_compositions, which means that they can be used with compositions like "H3N2" (basically anything thatcanonicalize_compositioncan handle) - Deprecated
character_to_labelas it's now handled withinstring_to_labels - Moved
check_nomenclatureinto motif.processing - Optimized some of the code for readability and speed (most things should be at least a bit faster now)
draw
- Support motif highlighting in
GlycoDraw: by providing thehighlight_motifkeyword argument, motifs can be highlighted (everything else will be set to low opacity). Works with IUPAC-condensed motifs and named motifs fromknown - Support wildcards in motif highlighting with the
highlight_wildcard_listkeyword argument, for instance highlighting allGal(?1-?)GlcNAcsubunits (for Gal(b1-?)GlcNAc you don't needhighlight_wildcard_list, as narrow wildcards are handled automatically) - Support positional encoding in motif highlighting with the
highlight_termini_listkeyword argument, for instance highlighting all terminal, non-reducing endGal(b1-?)GlcNAcsubunits (yes, you can use both wildcards and positional encoding at the same time😊) - Support drawing of repeat structures (indicated by brackets and the number of repeats) via the new
repeatkeyword argument. Internal repeats can also be specified with the additionalrepeat_rangekeyword argument. - Optimized some of the code for readability and speed (most things should be at least a bit faster now)
network
biosynthesis
- Optimized some of the code for readability and speed (everything should be up to 2x faster now)
evolution
- Optimized some of the code for readability and speed (everything should be at least a bit faster now)
ml
- Optimized some of the code for readability and speed (most things should be at least a bit faster now)
Files
BojarLab/glycowork-v1.0.0.zip
Files
(106.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:025d924fa6605c402868335e7d2bd37d
|
106.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/BojarLab/glycowork/tree/v1.0.0 (URL)