Warning
Problems for motif discovery with vertebrate genomes
Several vertebrate genomes are currrently supported on RSAT.
Motif discovery programs (e.g. oligo-analysis and dyad-analysis), have been proven to be highly efficient with microbial genomes (Fungi, Bacteria), and relatively efficient with medium-sized genomes like Arabidopsis thaliana and Drosophila melanogaster.
Our first tests with human and mouse genes suggest that with vertebrate genomes these programs return many false positive (i.e. if you submit a random set of genes, you always get plenty of highly 'significant' motifs). This is likely to come from the heterogeneity of human sequences (mixtures of GC-rich and GC-poor promoters).
Other pattern discovrey approaches like consensus and the gibbs sampler are also likely to return many false positives with vertebrate sequences, since, even with microbes, they generally return more false positives than oligo-analysis and dyad-analysis.
We suspect that the same problem will be encountered by most currently existing motif discovery programs : if you submit a random selection of human promoters, they will always return plenty of motifs, although the correct answer would be to return no motif (since these genes are not likely to be co-regulated). If you know any counter example, please indicate it to me.
We are currently working on methodological improvements to model heterogeneous sequences. This is an on-going research, and it may take several months. In the mean time, we already installed the genomes to offer support for some tools that give reasonably good reasults (pattern matching for example), but we do not recommend to trust motif discovery results with vertebrate genomes.