This series of exercise aims at comparing a set of approaches to build motifs from collections of annotated binding sites.
Data collection: each participant will choose a transcription factor of Escherichia coli in the database RegulonDB, and collect the lists of annotated binding sites and regulated genes.
Motif building: we will then use two alternative approaches (consensus and MEME) to build a motif from these binding sites.
String-based pattern matching. We will then use the consensus (IUPAC or regular expression) to scan promoters for putative binding, and evaluate the correspondance between the predicted target genes and the regulated genes annotated in RegulonDB.
Matrix-based pattern matching. We will then use the position-specific scoring matrices to scan all promoters for putative binding sites, and compare the predicted target genes with the list of regulated genes annotated in .
RegulonDB | http://regulondb.ccg.unam.mx/ | A database of transcripitonal regulation in the Bacteria Escherichia coli K12. |
Regulatory Sequence Analysis Tools (RSAT) | http://www.rsat.eu/ | A web-based software suite to detect cis-regulatory elements in DNA sequences. |
MEME | http://meme.nbcr.net/meme/ | A software suite for the analysis of sequence motif, including the motif discovery tool called "MEME". |
We will now use a string-based pattern matching approach to predict putative binding sites and target genes for the selected factor.