Published September 21, 2024 | Version v1
Software Open

Supporting materials for "Two subtle problems with over-representation analysis"

  • 1. ROR icon Burnet Institute
  • 2. ROR icon Deakin University

Description

To support our article "Two subtle problems with over-representation analysis" 

The contents of this repository are quite simple: 

1. background_docker_image.tar.gz Is a docker image that enables complete reproducibility
of our work.

2. background_github.tar.gz Is the git repository that contains all the scripts.

Study abstract:

Over-representation analysis (ORA) is used widely to assess the enrichment of functional categories in a gene list compared to a background list. ORA is therefore a critical method in the interpretation of ’omics data, relating gene lists to biological functions and themes. Although ORA is hugely popular, we and others have noticed two potentially undesired behaviours of some ORA tools. The first one we call the “background problem,” because it involves the software eliminating large numbers of genes from the background list if they are not annotated as belonging to any category. The second one we call the “false discovery rate problem,” because some tools underestimate the true number of parallel tests conducted. Here we demonstrate the impact of these issues on several real RNA-seq datasets and use simulated RNA-seq data to quantify the impact of these problems. We show that the severity of these problems depends on the gene set library, the number of genes in the list, and the degree of noise in the dataset. These problems can be mitigated by changing packages/websites for ORA or by changing to another approach such as functional class scoring.

Files

README.md

Files (1.9 GB)

Name Size Download all
md5:82df7405e6fe3ada0a59ca4056fb604b
1.8 GB Download
md5:1a1c6adb63cef8050e2916c0144ed8b9
57.7 MB Download
md5:d93a1cc23fafd135e97d5d57d33646bf
404 Bytes Preview Download

Additional details