ahmedmagds/GNUVID: GNUVID 2.2
Ahmed M Moustafa
GNUVID v2.2 now uses minimap2 and Gofasta to align to the reference for prediction using the random forest classifier.
GNUVID now assigns genomes to five new Variants of Concern:
- CC81085 represents the Brazilian P.1 lineage (a.k.a. 20J/501Y.V3).
- CC70949 represents the Brazilian P.2 lineage.
- CC72860 represents the Californian B.1.429 (CAL.20C) lineage.
- CC71014 represents the South African B.1.351 lineage (a.k.a. 20H/501Y.V2).
- 10 CCs represent the UK B.1.1.7 lineage (a.k.a. 20I/501Y.V1 Variant of Concern (VOC) 202012/01). (10 CCs: 46649, 45062, 49676, 54949, 54452, 58534, 57630, 66559, 62415 and 67441).
## New Features
- GNUVID now excludes genomes that does not pass quality check for sequence length (15000) and proportion of ambiguity (Ns) (0.5). User can change these cutoffs.
- Skip exact matching (-e): do only prediction [Default: do exact matching first].
- Prediction block size (-b): you can now assign the block size of genomes to be predicted at once [Default: 1000]. This will be helpful for machines with limited memory.