Published June 30, 2024 | Version TAIR_Data_20240630
Dataset Open

TAIR functional annotation data

  • 1. Phoenix Bioinformatics
  • 1. Phoenix Bioinformatics

Description

Quarterly release of curated gene function data for Arabidopsis thaliana from The Arabidopsis Information Resource (www.arabidopsis.org)

The contents of the compressed archive include the following files which are described in detail in the included README file.


1.ATH_GO_GOSLIM.txt.gz
This document is a tab-delimited file containing GO annotations for Arabidopsis genes annotated by TAIR and TIGR with terms from the Gene Ontology Consortium controlled vocabularies (see www.geneontology.org). This file includes an updated set of literature based annotations and >40,000 electronic annotations based upon matches to INTERPRO domains supplied by Nicola Mulder from SWISS PROT/INTERPRO. 

Please cite this paper when using TAIR's GO annotations in your research:  Berardini, TZ, Mundodi, S, Reiser, L, Huala, E, Garcia-Hernandez, M, Zhang, P, Mueller, LM, Yoon, J, Doyle, A, Lander, G, Moseyko, N, Yoo, D, Xu, I, Zoeckler, B, Montoya, M, Miller, N, Weems, D, and Rhee, SY (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol. 135(2):1-11.  


2.gene_aliases_yyyymmdd.txt(.gz)
This file lists alternative names for each gene.


3.Locus_Germplasm_Phenotype_yyyymmdd.txt.gz
This file contains links between loci, germplasms, and phenotypes. 

4.Locus_Published_yyyymmdd.txt.gz
This file contains links between loci and publications. 

5.po_temporal_gene_arabidopsis_tair.assoc.gz
6. po_anatomy_gene_arabidopsis_tair.assoc.gz
These two files are tab-delimited files. Each contains the 
set of literature-based annotations of Arabidopsis genes and loci annotated at TAIR to the terms from the Plant Ontology developed by the Plant Ontology Consortium (POC, www.plantontology.org).


7.TAIR10 or ARAPORT11_functional_descriptions_yyyymmdd.txt(.gz)
This file contains functional descriptions for gene  models included in either the TAIR 10 or as of 20170630 the Araport11 genome release. TAIR10/Araport11 refers to the version of the genome annotation.
 

8. Araport11_GFF3_genes_transposons.MMMYYYY.gff.gz
This document is a tab-delimited file in GFF format.  This document contains annotations from Araport11 genome release. Annotations in this file include information curated from recent scientific literature.
Note:  This file is available starting with the 20211231 Data Release.

Column header: explanation
1. Name of the chromosome
2. Source: Name of the the data source that generated this feature (Araport11)
3. Annotation type: eg gene, mRNA etc.
4. Start position of annotation.
5. Stop position of annotation. 
6. Score - A floating point value.
7. Strand information. Defined as + (forward) or - (reverse).
8. Frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.
9. Detailed annotation information with a semicolon-separated list of tag-value pairs, providing additional information about each feature, including curator summary, computational description,. etc. 


9. Araport11_GTF_genes_transposons.MMMYYYY.gtf.gz
This document is a tab-delimited file in GTF format.  This document contains annotations from Araport11 genome release. Annotations in this file include information curated from recent scientific literature.
Note:  This file is available starting with the 20211231 Data Release.

Column header: explanation
1. Name of the chromosome
2. Source: Name of the the data source that generated this feature (Araport11)
3. Annotation type: eg gene, mRNA etc.
4. Start position of annotation.
5. Stop position of annotation. 
6. Score - A floating point value.
7. Strand information. Defined as + (forward) or - (reverse).
8. Frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.
9. Detailed annotation information with a semicolon-separated list of tag-value pairs, providing additional information about each feature, including transcript_id. gene_id, Note, etc. 

Files

Files (58.6 MB)

Name Size Download all
md5:b7e7cf51184de80006eb97a40e1fc1e6
2.5 MB Download
md5:459b5436d325184c6e3fd01f70754e84
16.5 MB Download
md5:5b27f0dbe1d74e80f8857c3c9d4f06ee
6.6 MB Download
md5:4d89618f81170f22a80a7c0fb021476d
7.4 MB Download
md5:8d025490884baf4c5c74ca90e98cbf1f
371.8 kB Download
md5:9ea2dda2904e119855ea6f2863df2b62
755.7 kB Download
md5:fffb4c94a31529e871069e6f417f1ea3
2.2 MB Download
md5:ecea19c4cef159d6a665c8cf006d04a0
14.7 MB Download
md5:ddd0e9ae2299f662e671d7f300c41ced
7.5 MB Download

Additional details

Dates

Issued
2024-06-30
Data collected as of

References

  • Berardini, TZ, Mundodi, S, Reiser, L, Huala, E, Garcia-Hernandez, M, Zhang, P, Mueller, LM, Yoon, J, Doyle, A, Lander, G, Moseyko, N, Yoo, D, Xu, I, Zoeckler, B, Montoya, M, Miller, N, Weems, D, and Rhee, SY (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol. 135(2):1-11. DOI:10.1104/pp.104.040071
  • Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W, Mueller LA, Bhattacharyya D, Bhaya D, Sobral BW, Beavis W, Meinke DW, Town CD, Somerville C, Rhee SY. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001 Jan 1;29(1):102-5. PubMed PMID: 11125061; PubMed Central PMCID: PMC29827.