There is a newer version of the record available.

Published April 5, 2024 | Version v1
Dataset Open

Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs

  • 1. ROR icon Inria Rennes - Bretagne Atlantique Research Centre
  • 2. ROR icon Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement
  • 3. Inria Centre de Recherche Rennes Bretagne Atlantique
  • 4. INRA Centre de Toulouse Midi-Pyrénées
  • 5. EDMO icon National Research Institute For Agriculture, Food And Environment

Description

Graph edition is a vastly studied subject, with many heuristics to compare topologies, and many NP-hard problems. Here, we present a method, relying on the specificities of what a pangenome graph is (a collection of subsequences linked by edges, that represents the embedding of genomes inside a graph structure) to formulate a O(n) solution in this specific case. It allows us to pinpoint dissimilarities between graphs, and we can analyse how such graphs differ when build with different tools, or parameters.

Warning: all graphs are given as they came out of the Minigraph-Cactus and PGGB pipelines. It means, as `rs-pancat-compare` can compare only GFA1.0 that you must perform conversion using the `vg toolkit` (see [commands available on this GitHub](https://github.com/dubssieg/pancat_paper))

Data description:

Archive `yeast_dataset`:

Contains the raw `.fasta` genomes used to build the yeast chromosome 1 graphs described in the publication.

Archive `json_datasets_results`:

Contains the computed distance, variants, and sequence complexity analysis results as `.json` files. 

Archive `reference_impact`:

Contains the `.gfa` graphs used for the comparison of the impact of the reference choice against the secondary genome order in Minigraph-Cactus (fig 1A of the article).

Archive `mgc_vs_pggb`:

Contains the `.gfa` graphs used for the comparison of the impact of the reference choice in Minigraph-Cactus against PGGB (fig 1B of the article).

Archives `growth_replicate_XX` (not kept in paper):

These archives are replicates with varying references of an experiment made by adding more and more genomes to the graphs. The file names ranges from 2 to 15, these numbers being the number of genomes included in the graph. (Yeast dataset, chromosome 1)

Archive `software_evolution` (not kept in paper):

This archive contains graphs made using the same 15 genomes of yeast (chromosome 1) on three different versions of Minigraph-Cactus and three different versions of PGGB.

Files

growth_replicate_00.zip

Files (76.9 MB)

Name Size Download all
md5:1b2c42bd36defdeef322f45de4f8729c
15.4 MB Preview Download
md5:a19d8500a4ead04f6aade92d345b5c90
14.5 MB Preview Download
md5:d61378d634960dd2ceb876f63d574a43
14.9 MB Preview Download
md5:d83cbd784fccf334b7548877af81b415
628.4 kB Preview Download
md5:152762e47e460b76de64140712c808bf
15.8 MB Preview Download
md5:1f0516df5c55099b7cd82944fad1c5cc
9.0 MB Preview Download
md5:197ae39dabb9fe360ac363d6f4932b60
5.7 MB Preview Download
md5:c2ddf0cba18792083bdf670e55a910d7
923.3 kB Preview Download

Additional details

Software

Repository URL
https://github.com/Tharos-ux/pancat_paper
Programming language
Python
Development Status
Wip