Consistency Evaluator - Point Request Container for GAME (K562/homo_sapiens)
Authors/Creators
Description
The goal of this Evaluator is to assess the consistency between forward and reverse complement sequence point predictions by calculating the pearson r correlation between them. It ensures that Predictors generating accessibility scores produce similar results for both strands. This Consistency Evaluator specifically requests point accessibility predictions for the K562 cell type in Homo Sapiens.
The Consistency_evaluator_point_K562.sif contains the following:
- The scripts required to process the data and connect to predictors in the GAME API
The /evaluator_data folders contains:
all_consistency_data.csvsequence file that contains 900 sequences (251 bp each)
- 150 randomly sampled peak sequences from iPSC ATAC-seq data and their reverse complements (300 sequences total)
- Mononucleotide shuffled versions of the original sequences and their reverse complement (300 sequences total)
- Dinucleotide shuffled versions of the original sequences and their reverse complement (300 sequences total)
The folder also contains the data and script to recreate the final .csv file
- Original bed file pulled from ENCODE - ATAC seq data for iPSC:
ENCFF121CAA.bed - The
sequence_design.pyscript that- Pulls the center 251bp from 150 random peaks from the bed file and creates a
.fastafile - The fasta file is used to run a tool called Biasaway to create the mono and dinucleotide shuffled sequences
- The original and shuffled sequence files are read in and written to one file (
all_consistency_data.csv)
- Pulls the center 251bp from 150 random peaks from the bed file and creates a
How to run:
apptainer run --containall -B /path_to/evaluator_data/:/evaluator_data -B /path_to/prediction_folder/:/predictions Consistency_evaluator_point_K562.sif HOST PORT /predictionsNotes:
- This Evaluator was designed to be used with any Predictor that can return accessbility predictions in homo sapiens to test its consistency
- The
all_consistency_data.csvsequence file is copied into the container and is not created everytime the container is run but we include the code in case users are curious how it was created
Additional information regarding the API can be found here: https://github.com/de-Boer-Lab/Genomic-API-for-Model-Evaluation
Files
evaluator_data.zip
Files
(156.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b4bc69ff6e1cec28ebceb4cccb4ac520
|
152.4 MB | Download |
|
md5:641eb481b07fd1bbdad5e18ad7ceb3d0
|
3.9 MB | Preview Download |