Practical
Pattern discovery and pattern matching in non-coding sequences

Web sites

Sample families

Exercises

  1. Click on one of the "Regulatory Sequence Analsysis" web sites above. This will open a separate window for the Regulatory Sequence Analysis tools. The left frame contains a menu with the list of available programs. You can now switch between the "practical" and the "tools" window.
  2. Sequence retrieval
    • Click on the link "known regulons" above. This open a third window where you will be able to select the gene families.
    • Click on the PHO family and select all the genes. Copy it with the command "Edit-Copy".
    • Come back to the RSA-tools window and click on the link "upstream sequence" in the left frame. Paste your gene family in the box, and retrieve the upstream sequences. Notice the list of buttons at the top of the result page. Each button allows you to send the upstream sequences you just retrieved towards another program.
    • Among these buttons, click on "oligo-analysis".
        Beware: do not click on the link "oligo-analysis" in the left frame, this would open an empty form for oligo-analysis.
  3. oligo-analysis
    • Perform the analysis with the default options.
      • How many patterns were selected ?
      • How many would you have expected at random ?
      • What is the most significant pattern ?
      • How large is the largest pattern contig ?
      • Look at the tables in the article on dyad-detection. How do the patterns you discovered compare with the known patterns ?
    • Click on the "Back" icon to come back to the oligo-analysis form.
      Redo the analysis of the same family with an equiprobable nucleotide model. Answer the same questions as above
    • Redo the analysis, using input alphabet frequencies for oligonucleotide calibration.
    • Try to justify thhe differences between the results obtained with the respective probabilistic models.
  4. gibbs sampler
    • In the known regulons, select the MET family.
    • Retrieve upstream sequences and pipe them to the gibbs sampler.
    • Run the program with the default parameters.
    • Compare the result with the know motif. Which motifs did you find ?
    • Repeat the analysis a few times. At each trial, take note of the value of "informatin per parameter". This is the significance of the profile matrix.
      • How does it fluctuate ?
      • Are the patterns extracted always the same ?
  5. dyad-detector
    • Select a set of genes from the sample families.
    • Retrieve upstream sequences.
    • Perform dyad-analysis with the default options (this will take 1-2 minuts, it is a good opportunity to take a coffee).
  6. clusters from gene expression data
    • In the sample families, open the folder "dna_chip_clusters".
    • Select some family
    • Retrieve upstream sequences
    • Try to discover regulatory patterns with the methods of your choice
    • Compare the results with the known motifs (see the paper on dyad-analysis).

Suggested readings