4570616
doi
10.5281/zenodo.4570616
oai:zenodo.org:4570616
Montagne, RĂ©mi
Institut Pasteur
Baudry, Lyam
Institut Pasteur
Cournac, Axel
Institut Pasteur
Chromosight benchmarks and processed data
Matthey-Doret, Cyril
Institut Pasteur
doi:10.1101/2020.03.08.981910
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Hi-C
benchmark
Loop detection
<p>Input data and scripts required to rerun chromosight's benchmarks and associated figures from the manuscript, as well as output results in text format.</p>
<p>The record contains 4 tarballs corresponding to two different benchmarks and various processed files. Each tarball contains the scripts, input and output data of its benchmark:</p>
<p><strong>Performance benchmark:</strong></p>
<p>This benchmark is contained in "20200406_benchmark_chromosight_performance.tar.gz" and compares chromosight running time and RAM use with 2 other softwares. This benchmark is run on a real high resolution human Hi-C matrix with different subsampling values. Benchmark scripts are expected to be run on a regular laptop or desktop.</p>
<p><strong>Results benchmark:</strong></p>
<p>This benchmark is contained in "20200406_benchmark_chromosight_results.tar.gz" and assess chromosight's ability to detect chromatin loop patterns on Hi-C contact maps. Chromosight is compared to 4 other softwares. For each software, precision, recall (=sensitivity) and F1 scores are measured using 2000 small synthetic Hi-C matrices with known loop coordinates. Each software is run with a range of 50-200 parameter combinations for all data. Scripts to run this benchmark are written to run as a job array on a SLURM computing cluster to reduce compute time.</p>
<p><strong>Processed data files</strong></p>
<p>Intermediate and output files used throughout the manuscript. This includes contact matrices in cool format, genomic intervals in BED or BED2D format and loop calls from public data for different softwares.</p>
<p> </p>
<p><strong>Simulation input:</strong></p>
<p>The file "simulation_input.tar.gz" contains the inputs used to generate synthetic Hi-C contact maps used in the results benchmark. it consists of a contact map from the chromosome 5 <em>Saccharomyces cerevisiae</em> strain W303 from <a href="https://www.ncbi.nlm.nih.gov/sra/SRX5559680[accn]">this project</a>, and a corresponding set of domain border coordinates detected on this contact map as described in the methods of Chromosight's paper. Border coordinates are counted in bins from the start of chromosome 5. Both files can be fed to the "chromo_simul.py" script available on <a href="https://github.com/koszullab/chromosight_analyses_scripts/blob/master/python_codes/chromo_simul.py">github</a> as follows to generate synthetic contact maps:<br>
`python chromo_simul.py matrix.cool chr5 borders.txt out_dir`</p>
Zenodo
2020-04-06
info:eu-repo/semantics/other
3742094
1614644835.902702
317927256
md5:7daf25490681ce9e32f55f343c263d00
https://zenodo.org/records/4570616/files/20200406_benchmark_chromosight_performance.tar.gz
37831
md5:621ea03205a3c1e59aa8060899e3f17a
https://zenodo.org/records/4570616/files/simulation_inputs.tar.gz
2001885024
md5:3d00efc58a14bff3d3a92355a5a0de79
https://zenodo.org/records/4570616/files/20200406_benchmark_chromosight_results.tar.gz
243733328
md5:4c16e8b599d82f4eb362e262caa5f9c7
https://zenodo.org/records/4570616/files/processed_files.tar.gz
public
10.1101/2020.03.08.981910
Is supplement to
doi
10.5281/zenodo.3742094
isVersionOf
doi