Published 2025 | Version v2
Software Open

Consistency Evaluator - Point Request Container for GAME (K562/homo_sapiens)

Authors/Creators

Description

The goal of this Evaluator is to assess the consistency between forward and reverse complement sequence point predictions by calculating the pearson r correlation between them. It ensures that Predictors generating accessibility scores produce similar results for both strands. This Consistency Evaluator specifically requests point accessibility predictions for the K562 cell type in Homo Sapiens.

The Consistency_evaluator_point_K562.sif contains the following:

  • The scripts required to process the data and connect to predictors in the GAME API

The /evaluator_data folders contains:

  • all_consistency_data.csv sequence file that contains 900 sequences (251 bp each)

    1. 150 randomly sampled peak sequences from iPSC ATAC-seq data and their reverse complements (300 sequences total)
    2. Mononucleotide shuffled versions of the original sequences and their reverse complement (300 sequences total)
    3. Dinucleotide shuffled versions of the original sequences and their reverse complement (300 sequences total)

The folder also contains the data and script to recreate the final .csv file

  • Original bed file pulled from ENCODE - ATAC seq data for iPSC: ENCFF121CAA.bed
  • The sequence_design.py script that 
    1. Pulls the center 251bp from 150 random peaks from the bed file and creates a .fasta file
    2. The fasta file is used to run a tool called Biasaway to create the mono and dinucleotide shuffled sequences
    3. The original and shuffled sequence files are read in and written to one file (all_consistency_data.csv)

How to run:

apptainer run --containall -B /path_to/evaluator_data/:/evaluator_data -B /path_to/prediction_folder/:/predictions Consistency_evaluator_point_K562.sif HOST PORT /predictions

Notes:

  • This Evaluator was designed to be used with any Predictor that can return accessbility predictions in homo sapiens to test its consistency
  • The all_consistency_data.csv sequence file is copied into the container and is not created everytime the container is run but we include the code in case users are curious how it was created

Additional information regarding the API can be found here: https://github.com/de-Boer-Lab/Genomic-API-for-Model-Evaluation

Files

evaluator_data.zip

Files (156.3 MB)

Name Size Download all
md5:b4bc69ff6e1cec28ebceb4cccb4ac520
152.4 MB Download
md5:641eb481b07fd1bbdad5e18ad7ceb3d0
3.9 MB Preview Download