seismicrna package

SEISMIC-RNA

Expose the sub-packages demult, align, relate, cluster, and table, plus the __version__ attribute, at the top level so that they can be imported from external modules and scripts:

>>> import seismicrna
>>> seismicrna.__version__
'x.y.z'
>>> from seismicrna import __version__
'x.y.z'

Subpackages

Submodules

seismicrna.join.join_sections(out_dir: Path, name: str, sample: str, ref: str, sects: Iterable[str], clustered: bool, *, clusts: dict[str, dict[int, dict[int, int]]], force: bool)

Join one or more sections.

Parameters:
  • out_dir (pathlib.Path) – Output directory.

  • name (str) – Name of the joined section.

  • sample (str) – Name of the sample.

  • ref (str) – Name of the reference.

  • sects (Iterable[str]) – Names of the sections being joined.

  • clustered (bool) – Whether the dataset is clustered.

  • clusts (dict[str, dict[int, dict[int, int]]]) – For each section, for each order, the cluster from the original section to use as the cluster in the joined section; ignored if clustered is False.

  • force (bool) – Force the report to be written, even if it exists.

Returns:

Path of the Pool report file.

Return type:

pathlib.Path

seismicrna.join.run(input_path: tuple[str, ...], *, joined: str = '', join_clusts: str | None, max_procs: int = 16, parallel: bool = True, force: bool = False) list[Path]

Merge sections (horizontally) from the Mask or Cluster step.

Parameters:
  • joined (str) – Joined section name [keyword-only, default: ‘’]

  • join_clusts (str | None) – Join clusters from this CSV file [keyword-only]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 16]

  • parallel (bool) – Run tasks in parallel or in series [keyword-only, default: True]

  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

SEISMIC-RNA Main Module

This module is the entry point for the command line interface:

seismic [OPTIONS] command [OPTIONS] [ARGS]

calls the function cli() defined in this module.

Whole Pipeline Main Module

seismicrna.wf.as_tuple_str(items: Iterable)
seismicrna.wf.run(fasta: str, input_path: tuple[str, ...], *, out_dir: str = './out', tmp_pfx: str = './tmp-', keep_tmp: bool = False, brotli_level: int = 10, force: bool = False, max_procs: int = 16, parallel: bool = True, fastqz: tuple[str, ...] = (), fastqy: tuple[str, ...] = (), fastqx: tuple[str, ...] = (), phred_enc: int = 33, demulti_overwrite: bool = False, demult_on: bool = False, parallel_demultiplexing: bool = False, clipped: int = 0, mismatch_tolerence: int = 0, index_tolerance: int = 0, barcode_start: int = 0, barcode_end: int = 0, dmfastqz: tuple[str, ...] = (), dmfastqy: tuple[str, ...] = (), dmfastqx: tuple[str, ...] = (), fastqc: bool = True, qc_extract: bool = False, cut: bool = True, cut_q1: int = 25, cut_q2: int = 25, cut_g1: tuple[str, ...] = ('GCTCTTCCGATCT',), cut_a1: tuple[str, ...] = ('AGATCGGAAGAGC',), cut_g2: tuple[str, ...] = ('GCTCTTCCGATCT',), cut_a2: tuple[str, ...] = ('AGATCGGAAGAGC',), cut_o: int = 6, cut_e: float = 0.1, cut_indels: bool = True, cut_nextseq: bool = False, cut_discard_trimmed: bool = False, cut_discard_untrimmed: bool = False, cut_m: int = 20, bt2_local: bool = True, bt2_discordant: bool = False, bt2_mixed: bool = False, bt2_dovetail: bool = False, bt2_contain: bool = True, bt2_score_min_e2e: str = 'L,-1,-0.5', bt2_score_min_loc: str = 'L,1,0.5', bt2_i: int = 0, bt2_x: int = 600, bt2_gbar: int = 4, bt2_l: int = 20, bt2_s: str = 'L,1,0.1', bt2_d: int = 4, bt2_r: int = 2, bt2_dpad: int = 2, bt2_orient: str = 'fr', bt2_un: bool = True, min_mapq: int = 25, sep_strands: bool = False, f1r2_plus: bool = False, minus_label: str = '-minus', min_phred: int = 25, min_reads: int = 1000, ambindel: bool = True, overhangs: bool = True, clip_end5: int = 4, clip_end3: int = 6, batch_size: int = 65536, pool: str = '', mask_coords: tuple[tuple[str, int, int], ...] = (), mask_primers: tuple[tuple[str, DNA, DNA], ...] = (), primer_gap: int = 0, mask_sections_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: tuple[str, ...] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos_file: str | None = None, mask_pos: tuple[tuple[str, int], ...] = (), mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: int = 1.0, min_mut_gap: int = 3, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, max_clusters: int = 0, em_runs: int = 12, min_em_iter: int = 10, max_em_iter: int = 500, em_thresh: float = 0.37, joined: str = '', join_clusts: str = None, table_pos: bool = True, table_read: bool = True, table_clust: bool = True, fold: bool = False, fold_coords: tuple[tuple[str, int, int], ...] = (), fold_primers: tuple[tuple[str, DNA, DNA], ...] = (), fold_sections_file: str | None = None, fold_full: bool = True, quantile: float = 0.0, fold_temp: float = 310.15, fold_constraint: str | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, export: bool = False, samples_meta: str = None, refs_meta: str = None, all_pos: bool = True, cgroup: str = 'order', hist_bins: int = 10, hist_margin: float = 0.1, struct_file: tuple[str, ...] = (), window: int = 45, winmin: int = 9, csv: bool = True, html: bool = True, svg: bool = False, pdf: bool = False, png: bool = False, graph_mprof: bool = True, graph_tmprof: bool = True, graph_ncov: bool = True, graph_mhist: bool = True, graph_giniroll: bool = False, graph_roc: bool = True, graph_aucroll: bool = False)

Run the entire workflow.

Parameters:
  • out_dir (str) – Write all output files to this directory [keyword-only, default: ‘./out’]

  • tmp_pfx (str) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp-‘]

  • keep_tmp (bool) – Keep temporary files after finishing [keyword-only, default: False]

  • brotli_level (int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]

  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 16]

  • parallel (bool) – Run tasks in parallel or in series [keyword-only, default: True]

  • fastqz (tuple) – FASTQ file(s) of single-end reads [keyword-only, default: ()]

  • fastqy (tuple) – FASTQ file(s) of paired-end reads with mates 1 and 2 interleaved [keyword-only, default: ()]

  • fastqx (tuple) – FASTQ files of paired-end reads with mates 1 and 2 in separate files [keyword-only, default: ()]

  • phred_enc (int) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [keyword-only, default: 33]

  • demulti_overwrite (bool) – Desiginates whether to overwrite the grepped fastq. should only be used if changing setting on the same sample [keyword-only, default: False]

  • demult_on (bool) – Enable demultiplexing [keyword-only, default: False]

  • parallel_demultiplexing (bool) – Whether to run demultiplexing at maximum speed by submitting multithreaded grep functions [keyword-only, default: False]

  • clipped (int) – Designates the amount of clipped patterns to search for in the sample, will raise compution time [keyword-only, default: 0]

  • mismatch_tolerence (int) – Designates the allowable amount of mismatches allowed in a string and still be considered a valid pattern find. will increase non-parallel computation at a factorial rate. use caution going above 2 mismatches. does not apply to clipped sequences. [keyword-only, default: 0]

  • index_tolerance (int) – Designates the allowable amount of distance you allow the pattern to be found in a read from the reference index [keyword-only, default: 0]

  • barcode_start (int) – Index of start of barcode [keyword-only, default: 0]

  • barcode_end (int) – Length of barcode [keyword-only, default: 0]

  • dmfastqz (tuple) – Demultiplexed FASTQ files of single-end reads [keyword-only, default: ()]

  • dmfastqy (tuple) – Demultiplexed FASTQ files of paired-end reads interleaved in one file [keyword-only, default: ()]

  • dmfastqx (tuple) – Demultiplexed FASTQ files of mate 1 and mate 2 reads [keyword-only, default: ()]

  • fastqc (bool) – Run FastQC on the initial and trimmed FASTQ files [keyword-only, default: True]

  • qc_extract (bool) – Unzip FastQC report files [keyword-only, default: False]

  • cut (bool) – Use Cutadapt to trim reads before alignment [keyword-only, default: True]

  • cut_q1 (int) – Trim base calls below this Phred score from read 1 [keyword-only, default: 25]

  • cut_q2 (int) – Trim base calls below this Phred score from read 2 [keyword-only, default: 25]

  • cut_g1 (tuple) – Trim this 5’ adapter from read 1 [keyword-only, default: (‘GCTCTTCCGATCT’,)]

  • cut_a1 (tuple) – Trim this 3’ adapter from read 1 [keyword-only, default: (‘AGATCGGAAGAGC’,)]

  • cut_g2 (tuple) – Trim this 5’ adapter from read 2 [keyword-only, default: (‘GCTCTTCCGATCT’,)]

  • cut_a2 (tuple) – Trim this 3’ adapter from read 2 [keyword-only, default: (‘AGATCGGAAGAGC’,)]

  • cut_o (int) – Require at least this many bases of an adapter to trim it [keyword-only, default: 6]

  • cut_e (float) – Tolerate at most this fraction of errors in adapter sequences [keyword-only, default: 0.1]

  • cut_indels (bool) – Allow errors in adapter sequences to be insertions and deletions [keyword-only, default: True]

  • cut_nextseq (bool) – Trim high-quality Gs from the 3’ end (for Illumina NextSeq and iSeq) [keyword-only, default: False]

  • cut_discard_trimmed (bool) – Discard reads in which an adapters were found [keyword-only, default: False]

  • cut_discard_untrimmed (bool) – Discard reads in which no adapters were found [keyword-only, default: False]

  • cut_m (int) – Discard reads shorter than this length after trimming [keyword-only, default: 20]

  • bt2_local (bool) – Run Bowtie2 in local mode rather than end-to-end mode [keyword-only, default: True]

  • bt2_discordant (bool) – Output paired-end reads whose mates align discordantly [keyword-only, default: False]

  • bt2_mixed (bool) – Attempt to align individual mates of pairs that fail to align [keyword-only, default: False]

  • bt2_dovetail (bool) – Consider dovetailed mate pairs to align concordantly [keyword-only, default: False]

  • bt2_contain (bool) – Consider nested mate pairs to align concordantly [keyword-only, default: True]

  • bt2_score_min_e2e (str) – Discard alignments that score below this threshold in end-to-end mode [keyword-only, default: ‘L,-1,-0.5’]

  • bt2_score_min_loc (str) – Discard alignments that score below this threshold in local mode [keyword-only, default: ‘L,1,0.5’]

  • bt2_i (int) – Discard paired-end alignments shorter than this many bases [keyword-only, default: 0]

  • bt2_x (int) – Discard paired-end alignments longer than this many bases [keyword-only, default: 600]

  • bt2_gbar (int) – Do not place gaps within this many bases from the end of a read [keyword-only, default: 4]

  • bt2_l (int) – Use this seed length for Bowtie2 [keyword-only, default: 20]

  • bt2_s (str) – Seed Bowtie2 alignments at this interval [keyword-only, default: ‘L,1,0.1’]

  • bt2_d (int) – Discard alignments if over this many consecutive seed extensions fail [keyword-only, default: 4]

  • bt2_r (int) – Re-seed reads with repetitive seeds up to this many times [keyword-only, default: 2]

  • bt2_dpad (int) – Pad the alignment matrix with this many bases (to allow gaps) [keyword-only, default: 2]

  • bt2_orient (str) – Require paired mates to have this orientation [keyword-only, default: ‘fr’]

  • bt2_un (bool) – Output unaligned reads to a FASTQ file [keyword-only, default: True]

  • min_mapq (int) – Discard reads with mapping qualities below this threshold [keyword-only, default: 25]

  • sep_strands (bool) – Separate each alignment map into plus- and minus-strand reads [keyword-only, default: False]

  • f1r2_plus (bool) – With –sep-strands, consider forward mate 1s and reverse mate 2s to be plus-stranded [keyword-only, default: False]

  • minus_label (str) – With –sep-strands, append this label to each minus-strand reference [keyword-only, default: ‘-minus’]

  • min_phred (int) – Mark base calls with Phred scores lower than this threshold as ambiguous [keyword-only, default: 25]

  • min_reads (int) – Discard alignment maps with fewer than this many reads [keyword-only, default: 1000]

  • ambindel (bool) – Mark all ambiguous insertions and deletions [keyword-only, default: True]

  • overhangs (bool) – Retain the overhangs of paired-end mates that dovetail [keyword-only, default: True]

  • clip_end5 (int) – Clip this many bases from the 5’ end of each read [keyword-only, default: 4]

  • clip_end3 (int) – Clip this many bases from the 3’ end of each read [keyword-only, default: 6]

  • batch_size (int) – Limit batches to at most this many reads [keyword-only, default: 65536]

  • pool (str) – Pooled sample name [keyword-only, default: ‘’]

  • mask_coords (tuple) – Mask a section of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]

  • mask_primers (tuple) – Mask a section of a reference given its forward and reverse primers [keyword-only, default: ()]

  • primer_gap (int) – Leave a gap of this many bases between the primer and the section [keyword-only, default: 0]

  • mask_sections_file (str | None) – Mask sections of references from coordinates/primers in a CSV file [keyword-only, default: None]

  • mask_del (bool) – Mask deletions [keyword-only, default: True]

  • mask_ins (bool) – Mask insertions [keyword-only, default: True]

  • mask_mut (tuple) – Mask this type of mutation [keyword-only, default: ()]

  • mask_polya (int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]

  • mask_gu (bool) – Mask G and U bases [keyword-only, default: True]

  • mask_pos_file (str | None) – Mask positions in references from a file [keyword-only, default: None]

  • mask_pos (tuple) – Mask this position in this reference [keyword-only, default: ()]

  • mask_discontig (bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]

  • min_ncov_read (int) – Mask reads with fewer than this many bases covering the section [keyword-only, default: 1]

  • min_finfo_read (float) – Mask reads with less than this fraction of unambiguous base calls [keyword-only, default: 0.95]

  • max_fmut_read (int) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]

  • min_mut_gap (int) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 3]

  • min_ninfo_pos (int) – Mask positions with fewer than this many unambiguous base calls [keyword-only, default: 1000]

  • max_fmut_pos (float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]

  • quick_unbias (bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]

  • quick_unbias_thresh (float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]

  • max_clusters (int) – Attempt to find at most this many clusters [keyword-only, default: 0]

  • em_runs (int) – Repeat EM this many times for each number of clusters [keyword-only, default: 12]

  • min_em_iter (int) – Run EM for at least this many iterations (times number of clusters) [keyword-only, default: 10]

  • max_em_iter (int) – Run EM for at most this many iterations (times number of clusters) [keyword-only, default: 500]

  • em_thresh (float) – Stop EM when the log likelihood increases by less than this threshold [keyword-only, default: 0.37]

  • joined (str) – Joined section name [keyword-only, default: ‘’]

  • join_clusts (str) – Join clusters from this CSV file [keyword-only, default: None]

  • table_pos (bool) – Make a table counting relationships per position [keyword-only, default: True]

  • table_read (bool) – Make a table counting relationships per read [keyword-only, default: True]

  • table_clust (bool) – Make a table counting reads per cluster (only for clustered data) [keyword-only, default: True]

  • fold (bool) – Predict the secondary structure using the RNAstructure Fold program [keyword-only, default: False]

  • fold_coords (tuple) – Fold a section of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]

  • fold_primers (tuple) – Fold a section of a reference given its forward and reverse primers [keyword-only, default: ()]

  • fold_sections_file (str | None) – Fold sections of references from coordinates/primers in a CSV file [keyword-only, default: None]

  • fold_full (bool) – If no sections are specified, whether to default to the full section or to the table’s section [keyword-only, default: True]

  • quantile (float) – Normalize and winsorize ratios to this quantile (0.0 disables) [keyword-only, default: 0.0]

  • fold_temp (float) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]

  • fold_constraint (str | None) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]

  • fold_md (int) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]

  • fold_mfe (bool) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]

  • fold_max (int) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]

  • fold_percent (float) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]

  • export (bool) – Export each sample to SEISMICgraph (https://seismicrna.org) [keyword-only, default: False]

  • samples_meta (str) – Add sample metadata from this CSV file to exported results [keyword-only, default: None]

  • refs_meta (str) – Add reference metadata from this CSV file to exported results [keyword-only, default: None]

  • all_pos (bool) – Export all positions (not just unmasked positions) [keyword-only, default: True]

  • cgroup (str) – Graph each INDIVidual cluster in its own file, each ORDER in its own file, or UNITE all clusters in one file containing all orders [keyword-only, default: ‘order’]

  • hist_bins (int) – Number of bins in each histogram; must be ≥ 1 [keyword-only, default: 10]

  • hist_margin (float) – Autofill margins of at most this width in histograms of ratios [keyword-only, default: 0.1]

  • struct_file (tuple) – Compare mutational profiles to the structure(s) in this CT file [keyword-only, default: ()]

  • window (int) – Use a sliding window of this many bases [keyword-only, default: 45]

  • winmin (int) – Mask sliding windows with fewer than this number of data [keyword-only, default: 9]

  • csv (bool) – Output the data for each graph in a Comma-Separated Values file [keyword-only, default: True]

  • html (bool) – Output each graph in an interactive HyperText Markup Language file [keyword-only, default: True]

  • svg (bool) – Output each graph in a Scalable Vector Graphics file [keyword-only, default: False]

  • pdf (bool) – Output each graph in a Portable Document Format file [keyword-only, default: False]

  • png (bool) – Output each graph in a Portable Network Graphics file [keyword-only, default: False]

  • graph_mprof (bool) – Graph mutational profiles [keyword-only, default: True]

  • graph_tmprof (bool) – Graph typed mutational profiles [keyword-only, default: True]

  • graph_ncov (bool) – Graph coverages per position [keyword-only, default: True]

  • graph_mhist (bool) – Graph histograms of mutations per read [keyword-only, default: True]

  • graph_giniroll (bool) – Graph rolling Gini coefficients [keyword-only, default: False]

  • graph_roc (bool) – Graph receiver operating characteristic curves [keyword-only, default: True]

  • graph_aucroll (bool) – Graph rolling areas under receiver operating characteristic curves [keyword-only, default: False]