RSAT - Change summary

The change history is now available through the RSAT Forum.
New changes will be posted in the Announcement section.


The Regulatory Sequence Analysis tools are evolving regularly. We only summarize here the main steps in this evolution.

An exhaustive list of modifications for all the tools (including those anterior to September 2002) is maintained in our CVS server. It can be obtained on request by sending an email to .

Date Program Description
2008 Course Jacques van Helden's course on the analysis of regulatory sequences (a few hundreds of slides).
2008/01/07 footprint-discovery First web version of footprint-discovery
2008/01/07 oligo-analysis and dyad-analysis Added taxon-wide background models
2007/12 matrix-scan Many additions to the program matrix-scan, implemented by Morgane Thomas-Chollier and Jean Valéry Tutratsinze
2007/10 web site The RSAT lookup has been reshaped by Morgane Thomas-Chollier and Sylvain Brohée.
2007/08/09 convert-matrix Added support for several input/output formats.
2007 Web Services For bioinformaticians, RSAT tools are now also available as Web Services
2007/05/12 All the programs Output tables can now be re-sorted on any criterion by cliking on the column headers (thanks to Matthieu Defrance and Morgane Thomas-Chollier)
2006/12/07 oligo-analysis
dyad-analysis
Added the possibility to send the result (PSSM) to matrix-scan.
2006/12/07 oligo-analysis
dyad-analysis
Included a conversion from pattern assembly fo PSSM via convert-matrix.
2006/11 matrix-scan Added a web interface to the program matrix-scan
2006/04/12 dyad-analysis Corrected imprecisions with the calculation of expected frequency (with the monad background)
Cross-cecked the results by comparing dyads with spacing 0 to oligo-analysis results.
Improved the output (more consistent with oligo-analysis)
2006/02 All the scripts reading sequences Added support for tab-delimited sequence files. Each row corresponds to one sequence.
  1. The first column contains the sequence ID
  2. the second column the sequence.
  3. All subsequent columns are ignored.
2006/02/08 Pattern matching and motif discovery algorithms Added an option "mask", which allows to mask either lowercases or uppercases (transform soft masking into hard masking)
2006/01/23 retrieve-seq Added an option to retrieve sequences for multiple organisms (coupled with get-orthologs)
2006/01/19 get-orthologs Created a form for get-orthologs
2006/01/01 retrieve-seq Fixed a bug with gene names : genes without names had "NULL" as name, instead of the identifier. From now on, the locus_tag is systematically used as identifier, and, when a gene has no name, its ID is used as primary name.
2005/12/14 oligo-analysis Added the option to select over-represented, under-represented or both types of oligonucleotides
2005/11/18 oligo-analysis and dyad-analysis Changed the correction for multi-testing. Before, the E-value was calculated as
  E-value = P-value * nb_possible_patterns
The E-value is now calculated according to the number of patterns really tested for significance (i.e. after filtering on occurrences, frequencies, ...).
  E-value = P-value * nb_tested_patterns

In most cases it should not make any difference with previous results, but for small sequence sets (< 10000bp) or large patterns (> 6nt), the previous correction was too severe, resulting in a loss of sensitivity.

2005/08/31 dyad-analysis Added an option to select fields to be returned, and specify an upper and a lower thresholds on any of these fields
2005/07/21 genome-scale-dna-pattern Added the option Feature type (which was already present in retrieve sequences)
2005/07/21 genome-scale-patser Added the option Feature type (which was already present in retrieve sequences)
2005/06/12 dyad-analysis Fixed a bug with the calculation of relative frequencies,expected frequencies, and probabilities.
2005/06/12 dyad-analysis
  • Changed the counting mode when counting on boths strands. Instead of considering the 2 strands as two independent sequences with T=2*(L-k+1) possible positions for D dyads, I consider pairs of reverse-complementary dyads as the single event, and the number of possible positions for any of the pairs (direct or RC dyad) is T=L-k+1.

    Occurrence counts remain unchanged, except for the revere-palindromic dyads, which appear twice less frequent (they are noww "seen" on a single strand.

    Probabilities were adapted accordingly. For non-reverse-plindromic dyads, this can slightly change the calucated P-values. for reverse complementary palindromic dyads, it does not change anything , since the P-value were already calculated this way.

  • 2005/06/12 oligo-analysis Fixed a bug with the Markov models (expected frequencies summed to 4 instead of 1)
    2005/05/11 random-seq Markov models are calibrated on non-coding upstream sequences (upstream-noorf) rather than intergenic.
    2005/04/09 retrieve-seq Fixed a bug with the determination of neighbour ORFS. This bug resulted in incorrect clipping with the option "no overlapping ORF". In particular, it waas incompatible with downstream sequecnes (all sizes=0).
    2004/09/18 retrieve-seq
    • Homonyms are now supported (i.e. several genes with the same name)
    • fixed a bug by treating the cases of overlapping upstream genes (e.g. in bacterial operons) : in this case, the sequence has a length of 0
    2004/06/09 convert-matrix Implemented a web interface for a new script, convert-matrix (in construction)
    2004/06/03 oligo-analysis Added the possibility to specify an upper and a lower threshold on each output parameter
    2004/05/12 retrieve-seq
    genome-scale dna-pattern
    genome-scale patser
    Fixed a bug (introduced May 7) which caused each sequence to be duplicated with the option "all genes"
    2004/05/08 random-genes tutorial Wrote a tutorial for random-genes
    2004/05/07 retrieve-seq
    random-genes
    gene-info
    Additional feature types are now supported: tRNA, rRNA, scRNA
    2004/05/07 patser The default parameter for lower threshold estimation has been set to weight score, because the adjusted information is often too stringent (no answer)
    2004/04/02 feature-map The backbone (black line) of each sequence now reflects the sequence length. This is useful to show the different sequence lengths when the upstream ORFs have been clipped with retrieve-seq.
    2004/04/02 dna-pattern dna pattern now exports the start and end position of each sequence (this information is used to draw the the feature-map)
    2004/03/27 feature-map Feature-maps can now be exported in postscript (ps) to obtain a high quality printing. The form contains an option to select image format.
    2004/01/13 random-genes Added an option to return multiple group of random genes
    2003/12/30 oligo-analysis Added an option to return pattern count distributions. This returnss one row per patttern and each row contains a frequency distribution for the pattern occurrences.
    2003/10/20 oligo-analysis Fixed a bug in the calculation of expected frequencies when oligos were counted on both strands with alphabet, Markov and lexicon : the expected frequency did not sum the pairs of reverse complements, which resulted in an under-estimation of expected frequencies, and thus in an over-estimation of the significance.
    2003/08/29 oligo-analysis Added the option 'One row per sequence', which returns a table with one row per gene, one column per pattern (occurrence counts only)
    2003/07/10 retrieve-seq Added the option "server" for the output type. This avoids to display the sequence, whilst allowing to send it to other tools
    2003/07/07 Genomes First installation of Rattus norvegicus
    2003/07/07 dyad-analysis For the pattern assembly, changed the allowed flanking bases from 2 to 1, to avoid assembling too many unrelated patterns
    2003/07/06 gene-info The program orf-info has been renamed gene-info. The reason is that mRNAs are now supported. An option allows to chose the feature type between CDSand mRNA.
    2003/06/06 retrieve-seq Added an option "feature-type" allowing to choose between CDSs and mRNAs.
    2003/06/02 patser
    genome-scale-patser
    Added the option to return a score table (one row per sequence, one column per match)
    2003/05/14 tutorials Added several pages to the tutorials.
    • String-based representations
    • Matrix-based representations
    • patser
    • TRANSFAC
    • Generating random sequences
    • Selecting random genes
    2003/05/06 position-analysis Implemented the web interface for position-analysis.
    2003/04/28 parsers The genbank parser has been improved to take into account more synonyms (by parsing the field 'note=', and return more details about the organism (taxonomy) and the contigs (date of the genbank deposition).
    2003/04/28 supported organisms New script which returns information (taxonomy, parsing date, default upstream boundaries, ...) about supported organisms
    2002/11/16 dyad-analysis Support different bacground models : intergenic (as previously), upstream, upstream-noorf
    2002/10/22 pattern-assembly Added a form for pattern-assembly (previously, it was only available through the result of dyad-assembly or oligo-assembly)
    2002/09/16 tutorial on consensus Wrote a tutorial for consensus, Jerry Hert'z motif discovery program.
    2002/09/16 gene-info Added an option to match queries against gene description (previously, queries were only matched against gene names and identifiers).
    2002/09/15 oligo-analysis tutorial improved the tutorial in several ways
    1. changed the example to the MET family
    2. added an explanation about result interpretation
    3. use the min feature thickness to highlight the Met3p binding site in the feature map.
    2002/09/15 tutorials Wrote a tutorial for genome-scale patser
    2002/09/13 retrieve-seq The default is now to prevent overlap with upstream ORFs. This is essential for bacteria, to avoid taking large fragments of coding sequences, when the gene is in the middle of an operon. Actually, for bacteria, the distance to the upstream ORF is < 50bp for more than half of the genes.
    2002/09/14 oligo-analysis different background models are supported: intergenic, all upstream sequences (with or without overlap with upstream ORFs).
    2002/09/12 random-genes A new program, allowing to select random genes in a given genome. The selected genes can be piped to retrieve-seq. This program is important for testing the rate of false positive returned by the different motif discovery programs.
    2002/09/12 oligo-analysis and dyad-analysis The programs now return an E-value, in addition to the P-value and sig index.
    2002/09/12 oligo-analysis For double strand counts of occurrences, I changed the way to calculate the binomial probability. See the manual for the detail.