##Command line for QIIME2##

To import fastq files recommended commandline is Casava 1.8 paired-end demultiplexed fastq
In Casava 1.8 demultiplexed (paired-end) format, there are two fastq.gz files for each sample in the study, each containing the forward or reverse reads for that sample. The file name includes the sample identifier. The forward and reverse read file names for a single sample will look like Solveig-1_S2_L001_R1_001.fastq.gz and Solveig-1_S2_L001_R2_001.fastq.gz, respectively.

Fastq-files available from DRYAD https://datadryad.org/handle/10255/dryad.150698

Unzip: COPD_BERGEN_Fastq175_part1 & COPD_BERGEN_Fastq175_part2, and move all subfolders to a new folder.

QIIME2 tutorial for importing data: https://docs.qiime2.org/2019.1/tutorials/importing/

Metadatafile giving sampleID, sample type and RUN to import and batch samples for DADA2:

COPD_BERGEN_stvsex_Metadata.txt
(Sample type: Induced only=108 samples. Variable to batch in DADA2: run)


qiime dada2 denoise-paired --i-demultiplexed-seqs COPDQ2_IlluminafilesR7.qza --p-trunc-len-f 300 --p-trunc-len-r 225 --p-trim-left-f 17 --p-trim-left-r 21 --p-chimera-method pooled --output-dir COPD_DenoisepairedR7_pooled --p-n-threads 7 --verboseqiime dada2 denoise-paired --i-demultiplexed-seqs COPDQ2_IlluminafilesR8.qza --p-trunc-len-f 288 --p-trunc-len-r 222 --p-trim-left-f 17 --p-trim-left-r 21 --p-chimera-method pooled --output-dir COPD_DenoisepairedR8_pooled --p-n-threads 7 --verboseqiime dada2 denoise-paired --i-demultiplexed-seqs COPDQ2_IlluminafilesR9.qza --p-trunc-len-f 288 --p-trunc-len-r 222 --p-trim-left-f 17 --p-trim-left-r 21 --p-chimera-method pooled --output-dir COPD_DenoisepairedR9_pooled --p-n-threads 7 --verboseqiime dada2 denoise-paired --i-demultiplexed-seqs COPDQ2_IlluminafilesR2.qza --p-trunc-len-f 288 --p-trunc-len-r 222 --p-trim-left-f 17 --p-trim-left-r 21 --p-chimera-method pooled --output-dir COPD_DenoisepairedR2_pooled --p-n-threads 7 --verbose
qiime feature-table merge --i-tables COPD_DenoisepairedR2_pooled/table.qza --i-tables COPD_DenoisepairedR7_pooled/table.qza --i-tables COPD_DenoisepairedR8_pooled/table.qza --i-tables COPD_DenoisepairedR9_pooled/table.qza --o-merged-table COPDpooled_ASVtable108.qza


Merging representative sequences:
qiime feature-table merge-seqs - outputs: COPDpooled_RepSeq108.qza


Chimera removal step 2:

qiime vsearch uchime-denovo --i-table COPDpooled_ASVtable108.qza --i-sequences COPDpooled_RepSeq108.qza --output-dir COPDpooled_vsearch108

qiime feature-table filter-features - outputs: COPDpooled_NochimeraASV108.qzaqiime feature-table filter-seqs - outputs: COPDpooled_NochimeraRepSeq108.qza

Filtering 4 samples due to low quality and sequence counts. Remove pairs of samples: Solveig61+62 and Solveig 151+152 (Does not reduce number of sample pairs in final selection). Outputs: COPDpooled_ASV104Nochimera.qza 


For taxonomic classification: 
SILVA release 128 https://www.arb-silva.de/documentation/release-128/
Files:
SILVA_128_QIIME_release/rep_set/rep_set_16S_only/99/99_otus_16S.fastaSILVA_128_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txtqiime tools import \  --type 'FeatureData[Sequence]' \  --input-path 99_otus_16S.fasta\  --output-path Silva_99_otus_16S.qzaqiime tools import \  --type 'FeatureData[Taxonomy]' \  --source-format HeaderlessTSVTaxonomyFormat \  --input-path consensus_taxonomy_7_levels.txt \  --output-path Silva_ref-taxonomy.qzaqiime feature-classifier extract-reads \  --i-sequences Silva_99_otus_16S.qza \  --p-f-primer CCTACGGGNGGCWGCAG \  --p-r-primer GACTACHVGGGTATCTAATCC \  --o-reads Silva_99_ref-seqs.qzaqiime feature-classifier fit-classifier-naive-bayes --i-reference-reads Silva_99_ref-seqs.qza --i-reference-taxonomy Silva_ref-taxonomy.qza  --o-classifier Silva_99_classifier.qzaqiime feature-classifier classify-sklearn - outputs: TaxonomyPooled.qza


Decontam run in R due to lack of negative controls:
Exported from qiime2 for import in R:

qiime tools export COPDpooled_ASV104Nochimera.qza --output-dir Exported_R
output=feature-table.biom
qiime tools export TaxonomyPooled.qza --output-dir Exported_R
output=taxonomy.tsv
make copy: Pooledbiom-taxonomy.tsv - modify as explained:https://forum.qiime2.org/t/is-there-any-way-to-summarize-taxa-plot-by-category/446/2?u=jairideout
biom add-metadata -i feature-table.biom -o COPDpooled_table_wtaxonomy.biom --observation-metadata-fp Pooledbiom-taxonomy.tsv --sc-separated taxonomy

biom add-metadata -i COPDpooled_table_wtaxonomy.biom -o COPDpooled_ASVwTaxandMet.biom --sample-metadata-fp COPD_BERGEN_stvsex_Metadata.txtImport to R and run Decontam threshold=0.2 on pico green measurements (Metadatafile: decontambatch_1q_2p)=2. 

qiime feature-table filter-features outputs: PooledASV104_nochimnocont.qza


Filter ASVs lacking taxonomic assignment (Unassigned+D_0__Bacteria):
Pooledbiom-taxonomy.tsv to obtain ASV ids, store in Pooled_unidentifiedASVid.txt:

qiime feature-table filter-features --i-table PooledASV104_nochimnocont.qza --m-metadata-file Pooled_unidentifiedASVid.txt --p-exclude-ids --o-filtered-table PooledASV104_clean.qza


Phylogenetic tree by the following commands:
qiime alignment mafft --i-sequences {input:COPDpooled_RepSeq108.qza}
qiime alignment mask
qiime phylogeny fasttree
qiime phylogeny midpoint-root

Final output: COPDpooled_RootedTree.qza


Select samples for Sputum microbiota at stable state and during exacerbations in a cohort of COPD patients

Filter small and rare ASVs:
qiime feature-table filter-features --i-table PooledASV104_clean.qza --p-min-frequency 10 --p-min-samples 5 --o-filtered-table PooledASV104_filter5_10.qza

qiime feature-table filter-samples --i-table PooledASV104_filter5_10.qza --m-metadata-file COPD_BERGEN_stvsex_Metadata.txt  --p-where "PickPair='1'" --o-filtered-table COPD_BERGEN_stvsex_ASV.qza


And for the case-study of 13 sputum samples from one individual:

qiime feature-table filter-samples --i-table PooledASV104_filter5_10.qza --m-metadata-file COPD_BERGEN_stvsex_Metadata.txt  --p-where "PickCase='1'" --o-filtered-table COPD_BERGEN_Case_ASV.qza

The datasets are now ready for analyses.