Practical
Pattern discovery and pattern matching in non-coding sequencesWeb sites
This tutorial
http://bioicrs1.chem.uva.nl/rsa-tools/doc/practicals/amsterdam_2000/
Regulatory Sequence Analysis tools
- temporary site (for the hands-on): http://bioicrs1.chem.uva.nl/rsa-tools/
- permanent site: http://www.ucmb.ulb.ac.be/bioinformatics/rsa-tools/
Sample families
http://bioicrs1.chem.uva.nl/rsa-tools/doc/practicals/amsterdam_2000/sample_families/
Exercises
- Click on one of the "Regulatory Sequence Analsysis" web sites above. This will open a separate window for the Regulatory Sequence Analysis tools. The left frame contains a menu with the list of available programs. You can now switch between the "practical" and the "tools" window.
- Sequence retrieval
- Click on the link "known regulons" above. This open a third window where you will be able to select the gene families.
- Click on the PHO family and select all the genes. Copy it with the command "Edit-Copy".
- Come back to the RSA-tools window and click on the link "upstream sequence" in the left frame. Paste your gene family in the box, and retrieve the upstream sequences. Notice the list of buttons at the top of the result page. Each button allows you to send the upstream sequences you just retrieved towards another program.
- Among these buttons, click on "oligo-analysis".
Beware: do not click on the link "oligo-analysis" in the left frame, this would open an empty form for oligo-analysis.
- oligo-analysis
- Perform the analysis with the default options.
- How many patterns were selected ?
- How many would you have expected at random ?
- What is the most significant pattern ?
- How large is the largest pattern contig ?
- Look at the tables in the article on dyad-detection. How do the patterns you discovered compare with the known patterns ?
- Click on the "Back" icon to come back to the oligo-analysis form.
Redo the analysis of the same family with an equiprobable nucleotide model. Answer the same questions as above- Redo the analysis, using input alphabet frequencies for oligonucleotide calibration.
- Try to justify thhe differences between the results obtained with the respective probabilistic models.
- gibbs sampler
- In the known regulons, select the MET family.
- Retrieve upstream sequences and pipe them to the gibbs sampler.
- Run the program with the default parameters.
- Compare the result with the know motif. Which motifs did you find ?
- Repeat the analysis a few times. At each trial, take note of the value of "informatin per parameter". This is the significance of the profile matrix.
- How does it fluctuate ?
- Are the patterns extracted always the same ?
- dyad-detector
- Select a set of genes from the sample families.
- Retrieve upstream sequences.
- Perform dyad-analysis with the default options (this will take 1-2 minuts, it is a good opportunity to take a coffee).
- clusters from gene expression data
- In the sample families, open the folder "dna_chip_clusters".
- Select some family
- Retrieve upstream sequences
- Try to discover regulatory patterns with the methods of your choice
- Compare the results with the known motifs (see the paper on dyad-analysis).
Suggested readings
String-based approaches
- van Helden J, Rios AF, Collado-Vides J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 2000 Apr 15;28(8):1808-1818. PMID:10734201
- van Helden, J., André, B., and Collado-Vides, J. (1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol.Biol. 281:827-842.
- van Helden, J., André, B., and Collado-Vides, J. (1999).A web site for the computational analysis of yeast regulatory sequences. Yeast, in press.
- van Helden, J. , del Olmo, M., and Pérez-Ortín, J.E.Genomic computational analysis of yeast downstream sequences reveals putative polyadenylation efficiency elements. submitted.
- van Helden, J., Rios, A., Collado-Vides, J. Extracting cis-acting regulatory elements from yeast non-coding sequences by analysis of spaced dyads. in prep
Gibbs sampling
- Hughes JD, Estep PW, Tavazoie S, Church GM. Computational identification of cis-regulatory elements associated with groups of functionally related genes in saccharomyces cerevisiae. J Mol Biol. 2000 Mar 10;296(5):1205-14. PMID: 10698627; UI: 20198293 PMID:10698627
- Roth FP, Hughes JD, Estep PW, Church GM. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol. 1998 Oct;16(10):939-45. PMID: 9788350; UI: 99002399
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999 Jul;22(3):281-5. PMID: 10391217; UI: 99318101
- Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208-14. PMID: 8211139; UI: 94023958
Comment: first application of the gibbs sampler to discover motifs in protein sequences