Simply point to the final upload directory output by bcbio, and this function will take care of the rest. It automatically imports RNA-seq counts, metadata, and program versions used.

loadRNASeq(uploadDir, interestingGroups = "sampleName",
  sampleMetadataFile = NULL, maxSamples = 50, annotable,
  ensemblVersion = NULL, ...)

Arguments

uploadDir

Path to final upload directory. This path is set when running bcbio_nextgen -w template.

interestingGroups

Character vector of interesting groups. First entry is used for plot colors during quality control (QC) analysis. Entire vector is used for PCA and heatmap QC functions.

sampleMetadataFile

Optional. Custom metadata file containing sample information. Otherwise defaults to sample metadata saved in the YAML file.

maxSamples

Maximum number of samples to calculate DESeq2::rlog() and DESeq2::varianceStabilizingTransformation() matrix. See Details.

annotable

Optional. User-defined gene annotations (a.k.a. "annotable"), which will be slotted into rowData(). Typically this should be left undefined. By default, the function will automatically generate an annotable from the annotations available on Ensembl. If set NULL, then rowData() inside the resulting bcbioRNASeq object will be left empty. This is recommended for projects dealing with genes or transcripts that are poorly annotated.

ensemblVersion

Optional. Ensembl release version. If NULL, defaults to current release, and does not typically need to be user-defined. This parameter can be useful for matching Ensembl annotations against an outdated bcbio annotation build.

...

Additional arguments, slotted into the metadata() accessor.

Value

bcbioRNASeq.

Details

When number of samples is bigger than maxSamples, rlog and vst slot in SummarizedExperiment::SummarizedExperiment will be the output of edgeR normalization method.

Note

When working in RStudio, we recommend connecting to the bcbio-nextgen run directory as a remote connection over sshfs.

Examples

uploadDir <- system.file("extdata/bcbio", package = "bcbioRNASeq") bcb <- loadRNASeq(uploadDir, interestingGroups = "group")
#> 2017-05-23_rnaseq
#> 4 samples detected
#> Reading project-summary.yaml
#> Genome: Mus musculus (mm10)
#> Loading Ensembl annotations from AnnotationHub #> 2017-10-27
#> EnsDB AH57770: Mus musculus Ensembl 90
#> Parsed with column specification: #> cols( #> enstxp = col_character(), #> ensgene = col_character() #> )
#> Reading bcbio run information
#> Parsed with column specification: #> cols( #> genome = col_character(), #> resource = col_character(), #> version = col_datetime(format = "") #> )
#> Warning: bcbio-nextgen.log missing
#> Warning: bcbio-nextgen-commands.log missing
#> Reading salmon counts using tximport
#> 1
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 2
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 3
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 4
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#>
#> summarizing abundance
#> summarizing counts
#> summarizing length
#> Generating internal DESeqDataSet for quality control
#> using just counts from tximport
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> Performing rlog transformation
#> Performing variance stabilizing transformation
#> Reading STAR featureCounts aligned counts
#> Parsed with column specification: #> cols( #> .default = col_integer(), #> id = col_character() #> )
#> See spec(...) for full column specifications.
#> Warning: Unannotated genes detected in counts matrix (0.594%)
# Load without gene annotations bcb <- loadRNASeq(uploadDir, annotable = NULL)
#> 2017-05-23_rnaseq
#> 4 samples detected
#> Reading project-summary.yaml
#> Genome: Mus musculus (mm10)
#> Parsed with column specification: #> cols( #> enstxp = col_character(), #> ensgene = col_character() #> )
#> Reading bcbio run information
#> Parsed with column specification: #> cols( #> genome = col_character(), #> resource = col_character(), #> version = col_datetime(format = "") #> )
#> Warning: bcbio-nextgen.log missing
#> Warning: bcbio-nextgen-commands.log missing
#> Reading salmon counts using tximport
#> 1
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 2
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 3
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#> 4
#> Parsed with column specification: #> cols( #> Name = col_character(), #> Length = col_integer(), #> EffectiveLength = col_double(), #> TPM = col_double(), #> NumReads = col_double() #> )
#>
#> summarizing abundance
#> summarizing counts
#> summarizing length
#> Generating internal DESeqDataSet for quality control
#> using just counts from tximport
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> Performing rlog transformation
#> Performing variance stabilizing transformation
#> Reading STAR featureCounts aligned counts
#> Parsed with column specification: #> cols( #> .default = col_integer(), #> id = col_character() #> )
#> See spec(...) for full column specifications.