This function uses techniques from the Bioconductor Workflow in order to preprocess phyloseq data for downstream analysis.

preprocess_phyloseq(phyloseq_object, process_list = NULL, ...)

Arguments

phyloseq_object

A phyloseq object

process_list

This parameter is used to control the way the phyloseq object is processed. It can be one of three values:

NULL

Will utilize the default processing strategy or the dot parameters (...).

file

The absolute path to a YAML config file.

list

Equivalent to putting dot parameters in a list (list(...)).

...

The dot parameters can be any combination of the default processing keywords. See the "Keywords for Processing" section below for more details

Value

Returns a phyloseq object that has undergone the specified processing strategies.

Details

DETAILS

Keywords for Pre-processing

master_thresh

DEFAULT: 1e-5. Filters any taxa that do not meat the mean threshold.

taxon_filter

DEFAULT: list("Phylum"= list("min_a"=5, "r_s_p"=0.5), "Class"=list("min_a"=3, "r_s_p"=0.3)). Filters OTUs that do not appear more than a certian amount of times in a certain percentage of samples at the specified agglomerated rank.

prevelance_filter

DEFAULT: list("min_a"=5, "r_s_p"=0.5). Filters OTUs that do not appear more than a certian amount of times in a certain percentage of samples.

glom_rank

DEFAULT: NULL. Agglomerates the data at the specified rank.

ambiguous

DEFAULT: list(amb_ranks = c("Phylum", "Class", "Order", "Family", "Genus"), amb_items = c(NA, "", "uncharacterized", "uncultured", "Unassigned", "Ambiguous", "Ambiguous_taxa")). Removes OTUs that are labeled with the specified ambiguous items. This is done for each specified rank.

coeff_of_variation

DEFAULT: 0.55. Standardizes abundances to the median sequencing depth

trans_function

DEFAULT: function(x)x / sum(x). Transforms the abundance values to relative abundance values.

merge_samples

DEFAULT: NULL.

See also

Examples

# NOT RUN {
> phy_obj <- get_phyloseq_object(...)
> phy_obj
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1955 taxa and 48 samples ]
sample_data() Sample Data:       [ 48 samples by 8 sample variables ]
tax_table()   Taxonomy Table:    [ 1955 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 1955 tips and 1954 internal nodes ]
> pp_phy_obj <- preprocess_phyloseq(phy_obj, master_thresh = 1e-5,
                                    taxon_filter = list("Phylum"= list("min_a"=5, "r_s_p"=0.5),
                                                        "Class"=list("min_a"=3, "r_s_p"=0.3)),
                                    prevalence_filter = list("min_a"=5, "r_s_p"=0.5), glom_rank = NULL,
                                    ambiguous=list(amb_ranks = c("Phylum", "Class", "Order", "Family", "Genus"),
                                    amb_items = c(NA, "", "uncharacterized", "uncultured", "Unassigned", "Ambiguous", "Ambiguous_taxa")),
                                    coeff_of_variation = 0.55, trans_function = function(x){x / sum(x)}, merge_samp = NULL)
> pp_phy_obj

# }