---
layout: reference
---

## Glossary

{:auto_ids}  
accession
:   a unique identifier assigned to each sequence or set of sequences

categorical variable
:   Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical). Categorical variables take on a fixed number of values that are names or labels. 

cleaned data  
:   data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis

conditional formatting  
:   formatting that is applied to a specific cell or range of cells depending on a set of criteria  

CSV (comma separated values) format  
:   a plain text file format in which values are separated by commas

factor  
:   a variable that takes on a limited number of possible values (i.e. categorical data)

Gb
:   gigabyte of file storage or file size

Gbase
:   a gigabase represents one billion nucleic acid bases (Gbp may indicate one billion base pairs of nucleic acid)

headers
:   names at tops of columns that are descriptive about the column contents (sometimes optional)

metadata  
:   data which describes other data  

NGS
:   common acronym for "Next Generation Sequencing" currently being replaced by "High Throughput Sequencing"

null value  
:   a value used to record observations missing from a dataset

observation  
:   a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)

plain text
:   unformatted text

quality assurance  
:   any process which checks data for validity during entry  

quality control  
:   any process which removes problematic data from a dataset

raw data  
:   data that has not been manipulated and represents actual recorded values

rich text  
:  formatted text (e.g. text that appears bolded, colored or italicized)

string  
:   a collection of characters (e.g. "thisisastring")

TSV (tab separated values) format  
:   a plain text file format in which values are separated by tabs

variable  
:   a category of data being collected on the object being recorded (e.g. a mouse's weight)