Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs
Creators
-
1.
Inria Rennes - Bretagne Atlantique Research Centre
-
2.
Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement
- 3. Inria Centre de Recherche Rennes Bretagne Atlantique
- 4. INRA Centre de Toulouse Midi-Pyrénées
-
5.
National Research Institute For Agriculture, Food And Environment
Description
Graph edition is a vastly studied subject, with many heuristics to compare topologies, and many NP-hard problems. Here, we present a method, relying on the specificities of what a pangenome graph is (a collection of subsequences linked by edges, that represents the embedding of genomes inside a graph structure) to formulate a O(n) solution in this specific case. It allows us to pinpoint dissimilarities between graphs, and we can analyse how such graphs differ when build with different tools, or parameters.
Warning: all graphs are given as they came out of the Minigraph-Cactus and PGGB pipelines. It means, as `rs-pancat-compare` can compare only GFA1.0 that you must perform conversion using the `vg toolkit` (see [commands available on this GitHub](https://github.com/dubssieg/pancat_paper))
Data description:
Archive `yeast_dataset`:
Contains the raw `.fasta` genomes used to build the yeast chromosome 1 graphs described in the publication.
Archive `json_datasets_results`:
Contains the computed distance, variants, and sequence complexity analysis results as `.json` files.
Archive `reference_impact`:
Contains the `.gfa` graphs used for the comparison of the impact of the reference choice against the secondary genome order in Minigraph-Cactus (fig 1A of the article).
Archive `mgc_vs_pggb`:
Contains the `.gfa` graphs used for the comparison of the impact of the reference choice in Minigraph-Cactus against PGGB (fig 1B of the article).
Archives `growth_replicate_XX` (not kept in paper):
These archives are replicates with varying references of an experiment made by adding more and more genomes to the graphs. The file names ranges from 2 to 15, these numbers being the number of genomes included in the graph. (Yeast dataset, chromosome 1)
Archive `software_evolution` (not kept in paper):
This archive contains graphs made using the same 15 genomes of yeast (chromosome 1) on three different versions of Minigraph-Cactus and three different versions of PGGB.
Files
growth_replicate_00.zip
Files
(76.9 MB)
Name | Size | Download all |
---|---|---|
md5:1b2c42bd36defdeef322f45de4f8729c
|
15.4 MB | Preview Download |
md5:a19d8500a4ead04f6aade92d345b5c90
|
14.5 MB | Preview Download |
md5:d61378d634960dd2ceb876f63d574a43
|
14.9 MB | Preview Download |
md5:d83cbd784fccf334b7548877af81b415
|
628.4 kB | Preview Download |
md5:152762e47e460b76de64140712c808bf
|
15.8 MB | Preview Download |
md5:1f0516df5c55099b7cd82944fad1c5cc
|
9.0 MB | Preview Download |
md5:197ae39dabb9fe360ac363d6f4932b60
|
5.7 MB | Preview Download |
md5:c2ddf0cba18792083bdf670e55a910d7
|
923.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Tharos-ux/pancat_paper
- Programming language
- Python
- Development Status
- Wip