Published January 31, 2024
| Version v1.1.0
Software
Open
BojarLab/glycowork: v1.1.0
Description
Change Log
glycan_data
- Updated sugarbase database and all models
stats
- Newly added module to glycowork
- Moved all the statistics functions from
motif.processing
into this module:cohen_d
,mahalanobis_distance
,mahalanobis_variance
,variance_stabilization
,MissForest
,impute_and_normalize
, andvariance_based_filtering
- Added
fast_two_sum
,two_sum
,expansion_sum
,hlm
,update_cf_for_m_n
,jtkdist
,jtkinit
,jtkstat
, andjtkx
helper functions for JTK test - Added
get_BF
to calculate Jeffreys' approximate Bayes factor based on sample size and p-value - Added
get_alphaN
to calculate sample size-appropriate significance cut-offs informed by Bayesian statistics - Added
pi0_tst
andTST_grouped_benjamini_hochberg
to perform a Two-Stage adaptive Benjamini-Hochberg procedure based on groups (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175141/ or https://www.biorxiv.org/content/10.1101/2024.01.13.575531v1) - Added
test_inter_vs_intra_group
to estimate intra- versus inter-group correlation with a mixed-effects model for groupings of glycans based on domain expertise
motif
regex
- Newly added module to glycowork
- Added the
get_match
function and associated functions to implement a regular expression system for glycans. This allows for powerful queries to detect and extract motifs of arbitrary complexity.
processing
- Moved
cohen_d
,mahalanobis_distance
,mahalanobis_variance
,variance_stabilization
,MissForest
,impute_and_normalize
, andvariance_based_filtering
intoglycan_data.stats
to re-focusprocessing
on processing glycan sequences - Extended
canonicalize_composition
to cases like '5_4_2_1', '5421', and '(Hex)2 (HexNAc)2 (Deoxyhexose)1 (NeuAc)2 + (Man)3(GlcNAc)2' - GlycoCT and WURCS handling for universal input now encompass more monosaccharides and more modifications
- Expanded
oxford_to_iupac
to handle more complex sequences, including sulfation, LacdiNAc, hybrid structures, extended Neu5Ac, complex fucosylation, more custom linkage specifications enforce_class
can now deal with free glycans regardless of whether they end in '-ol' or not
annotate
annotate_dataset
and downstream functions now accept a new keyword in "feature_set", called "custom". If "custom" is added to "feature_set", a list of custom motifs can and must be added via the "custom_motifs" keyword argument. "custom" can be mixed and matched with all other keywords in "feature_set"annotate_dataset
now also accepts glyco-regular expressions via the "custom" keyword in "feature_set". These expressions need to be added within the "custom_motifs" keyword argument and have to start with an "r", such as "rHex-HexNAc-([Hex|Fuc]){1,2}-HexNAc". Normal motifs and glyco-regular expressions can be freely mixed within "custom_motifs"- Added
group_glycans_core
,group_glycans_sia_fuc
, andgroup_glycans_N_glycan_type
to group glycans by core structure (for O-glycans), Sia/Fuc/FucSia/Rest, or complex/hybrid/high-man/rest (for N-glycans) - Fixed a bug in
get_k_saccharides
, in which redundant columns were not always correctly removed
analysis
- Added
get_jtk
to analyze circadian expression of glycans in temporal glycomics datasets using the Jonckheere–Terpstra–Kendall (JTK) algorithm, with the typical interface for motifs and imputation etc analogous to differential expression. get_differential_expression
,get_glycanova
, andget_jtk
now useget_alphaN
to calculate a sample size-appropriate significance cut-off (see https://journals.sagepub.com/doi/10.1177/14761270231214429) and add a 'significant' column to the output to display whether the corrected p-values lie below this threshold- Added the "zscores" keyword argument to
get_pvals_motifs
to perform z-score transformation if used data are not yet z-score transformed, by setting "zscores" to False - For statistical calculations,
get_pval_motifs
will now weigh the motif occurrences by z-score magnitude, rather than only using a cut-off for enrichment calculations - Added effect size calculations to
get_pval_motifs
which are also in the output, as Cohen's d - Changed
get_pval_motifs
such that now both enrichments and depletions will be tested (with depletions resulting in negative effect sizes) - Added
select_grouping
to find out which grouping of glycans has the highest intra- versus inter-group correlation, as estimated byglycan_data.stats.test_inter_vs_intra_group
- When "motifs = False" and "grouped_BH = True",
get_differential_expression
now tries to use the Two-Stage adaptive Benjamini-Hochberg procedure based on groups for multiple testing correction, if meaningful groups can be found in the glycans [note this makes everything at least one order of magnitude slower, though most datasets should still finish in a few seconds]
draw
- In
GlycoDraw
, the "highlight_motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression) - Added
plot_glycans_excel
to allow for the automated insertion ofGlycoDraw
SNFG pictures into an Excel file containing glycan sequences
graph
categorical_node_match_wildcard
now uses string ID for matching, instead of integer ID, which means even two graphs, generated with two different libs, can now be successfully compared viacompare_glycans
orsubgraph_isomorphism
compare_glycans
orsubgraph_isomorphism
(and all functions using these functions) now support negation, by prepending "!". For instance, "!Fuc(a1-?)Gal(b1-4)GlcNAc" will match subsequences that have a monosaccharide that is NOT Fuc before the Gal. It is highly recommend to generate your own lib viaget_lib
if you use negation, as monosaccharides such as !Fuc are not within lib and will cause indexing errors.- Added "?1-?" as another ultimate wildcard (promoting it from a strong narrow wildcard)
- Fixed some cases where "Monosaccharide" was not treated as an ultimate wildcard in graph operations
- Fixed an issue in
graph_to_string
in which glycans of size 1 (e.g., "GalNAc") sometimes were missing their first character
network
- Updated pre-calculated biosynthetic networks for milk oligosaccharides
biosynthesis
- Refactored
find_diff
to make networks compatible with the automated, dynamic wildcards (i.e., ? behave as they should and don't necessarily cause over-branching of the network) - In
highlight_network
, the "motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression)
ml
model_training
- In
training_setup
, upgraded the loss functions for all classification problems to PolyLoss with label smoothing (see https://arxiv.org/abs/2204.12511 for details). - In
training_setup
, number of classes (for multiclass or multilabel classification) can now be specified via the new "num_classes" keyword argument
Files
BojarLab/glycowork-v1.1.0.zip
Files
(104.6 MB)
Name | Size | Download all |
---|---|---|
md5:b9989949847bb3779fa632c2e85ac867
|
104.6 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/BojarLab/glycowork/tree/v1.1.0 (URL)