RSAT course - Exercises - Bacterial regulons - from sites to motifs

Contents


Introduction

This series of exercise aims at comparing a set of approaches to build motifs from collections of annotated binding sites.

  1. Data collection: each participant will choose a transcription factor of Escherichia coli in the database RegulonDB, and collect the lists of annotated binding sites and regulated genes.

  2. Motif building: we will then use two alternative approaches (consensus and MEME) to build a motif from these binding sites.

  3. String-based pattern matching. We will then use the consensus (IUPAC or regular expression) to scan promoters for putative binding, and evaluate the correspondance between the predicted target genes and the regulated genes annotated in RegulonDB.

  4. Matrix-based pattern matching. We will then use the position-specific scoring matrices to scan all promoters for putative binding sites, and compare the predicted target genes with the list of regulated genes annotated in .

Pedagogic objectives

  1. Understand the way a motif can be built from a handfull of binding sites.
  2. Understand the interest and limitations of two alternative approaches (consensus and matrix) to predict transcription factor binding sites.

[back to contents]

Resources

RegulonDB http://regulondb.ccg.unam.mx/ A database of transcripitonal regulation in the Bacteria Escherichia coli K12.
Regulatory Sequence Analysis Tools (RSAT) http://www.rsat.eu/ A web-based software suite to detect cis-regulatory elements in DNA sequences.
MEME http://meme.nbcr.net/meme/ A software suite for the analysis of sequence motif, including the motif discovery tool called "MEME".

[back to contents]

Exercises


Building a motif from annotated binding sites

Protocol

  1. Open a connection to RegulonDB.
  2. In the menu under Search by type of object, select "in Regulon", and click Search without entering anything in the search box. This will return a list of all the regulons supported in RegulonDB.
  3. Browse some records, and select a regulon that contains a reasonable number of binding sites (between 10 to 30).
  4. In the page corresponding to the selected regulon, collect the names of the regulated genes, and store them in a text file on your computer. You will need this list to answer some questions.
  5. Click on the "+" sign on the right side of the title "Transcription factor binding sites (TFBSs) arrangements", and select the option Export > DNA binding sites > fasta.
  6. Open the fasta file with a text editor, and suppress all the comment line (lines starting with a '#'), because they would not be recognized by some sequence analysis tools.
  7. Open a connection to MEME. Paste the sequences in fasta format, leave all parameters to their default values, and start the analysis. store the result files (in html + text formats) on your computer.
  8. Come back to the MEME form, and redo the analysis with custom parameters, based on your perceoption of this particular request (building a motif from a handful of experimentally proven binding sites).

Scanning promoters with a consensus sequence

We will now use a string-based pattern matching approach to predict putative binding sites and target genes for the selected factor.

  1. Open a connection to the Regulatory Sequence Analysis Tools (RSAT) web server.
  2. In the left-side menu, click on the link retrieve sequence.
  3. Select the organism Escherichia coli K 12 substr MG1655 uid 57779; click on the radio button all to select all genes; leave all other parameters unchanged and click GO.
  4. Download the sequence file (fasta) on your computer for further usage.
  5. In the Next step box at the bottom of the result page, click dna-pattern. This will open a new form with your sequence file automatically loaded ( Sequence transferred from previous query).
  6. In the box Query pattern(s), paste the regular expression produced by MEME with the default parameters. Uncheck the options Sequence limits and match positions. Check the option match counts, set the min count to 1, and click GO.