seismicrna.mask package

Submodules

class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)

Bases: MaskReadBatch, SectionMutsBatch, PartialMutsBatch

property read_weights

Weights for each read when computing counts.

class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)

Bases: PartialReadBatch

property num_reads

Number of reads.

property read_nums

Read numbers.

seismicrna.mask.batch.apply_mask(batch: SectionMutsBatch, read_nums: ndarray | None = None, section: Section | None = None, sanitize: bool = False)
class seismicrna.mask.data.JoinMaskMutsDataset(*args, **kwargs)

Bases: JoinMutsDataset, MergedUnbiasDataset

classmethod get_batch_type()

Type of batch.

classmethod get_dataset_load_func()

Function to load one constituent dataset.

classmethod get_report_type()

Type of report.

classmethod name_batch_attrs()

Name the attributes of each batch.

class seismicrna.mask.data.MaskMutsDataset(data1: MutsDataset, data2: Dataset)

Bases: ArrowDataset, UnbiasDataset

Chain mutation data with masked reads.

MASK_NAME = 'mask'
classmethod get_dataset1_load_func()

Function to load Dataset 1.

classmethod get_dataset2_type()

Type of Dataset 2.

property min_mut_gap

Minimum gap between two mutations.

property pattern

Pattern of mutations to count.

property quick_unbias

Use the quick heuristic for unbiasing.

property quick_unbias_thresh

Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

property section

Section of the dataset.

class seismicrna.mask.data.MaskReadDataset(report: BatchedReport, top: Path)

Bases: LoadedDataset, UnbiasDataset

Load batches of masked relation vectors.

classmethod get_batch_type()

Type of batch.

classmethod get_report_type()

Type of report.

property min_mut_gap

Minimum gap between two mutations.

property pattern

Pattern of mutations to count.

property pos_kept

Positions kept after masking.

property quick_unbias

Use the quick heuristic for unbiasing.

property quick_unbias_thresh

Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

class seismicrna.mask.io.MaskBatchIO(*, sect: str, **kwargs)

Bases: ReadBatchIO, MaskIO, MaskReadBatch

classmethod file_seg_type()

Type of the last segment in the path.

class seismicrna.mask.io.MaskIO(*, sect: str, **kwargs)

Bases: SectIO, ABC

classmethod auto_fields()

Names and automatic values of selected fields.

seismicrna.mask.main.load_sections(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, sections_file: Path | None = None)

Open sections of relate reports.

seismicrna.mask.main.run(input_path: tuple[str, ...], *, mask_coords: tuple[tuple[str, int, int], ...] = (), mask_primers: tuple[tuple[str, DNA, DNA], ...] = (), primer_gap: int = 0, mask_sections_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: tuple[str, ...] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos: tuple[tuple[str, int], ...] = (), mask_pos_file: str | None = None, mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: int = 1.0, min_mut_gap: int = 3, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, max_procs: int = 16, parallel: bool = True, force: bool = False, tmp_pfx='./tmp-', keep_tmp=False) list[Path]

Define mutations and sections to filter reads and positions.

Parameters:
  • mask_coords (tuple) – Mask a section of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]

  • mask_primers (tuple) – Mask a section of a reference given its forward and reverse primers [keyword-only, default: ()]

  • primer_gap (int) – Leave a gap of this many bases between the primer and the section [keyword-only, default: 0]

  • mask_sections_file (str | None) – Mask sections of references from coordinates/primers in a CSV file [keyword-only, default: None]

  • mask_del (bool) – Mask deletions [keyword-only, default: True]

  • mask_ins (bool) – Mask insertions [keyword-only, default: True]

  • mask_mut (tuple) – Mask this type of mutation [keyword-only, default: ()]

  • mask_polya (int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]

  • mask_gu (bool) – Mask G and U bases [keyword-only, default: True]

  • mask_pos (tuple) – Mask this position in this reference [keyword-only, default: ()]

  • mask_pos_file (str | None) – Mask positions in references from a file [keyword-only, default: None]

  • mask_discontig (bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]

  • min_ninfo_pos (int) – Mask positions with fewer than this many unambiguous base calls [keyword-only, default: 1000]

  • max_fmut_pos (float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]

  • min_ncov_read (int) – Mask reads with fewer than this many bases covering the section [keyword-only, default: 1]

  • min_finfo_read (float) – Mask reads with less than this fraction of unambiguous base calls [keyword-only, default: 0.95]

  • max_fmut_read (int) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]

  • min_mut_gap (int) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 3]

  • quick_unbias (bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]

  • quick_unbias_thresh (float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]

  • brotli_level (int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 16]

  • parallel (bool) – Run tasks in parallel or in series [keyword-only, default: True]

  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

  • tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp-‘]

  • keep_tmp – Keep temporary files after finishing [keyword-only, default: False]

class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])

Bases: BatchedReport, MaskIO

classmethod auto_fields()

Names and automatic values of selected fields.

classmethod fields()

All fields of the report.

classmethod file_seg_type()

Type of the last segment in the path.

Mask – Write Module

class seismicrna.mask.write.Masker(dataset: RelateDataset | PoolDataset, section: Section, pattern: RelPattern, *, mask_polya: int = 5, mask_gu: bool = True, mask_pos: list[tuple[str, int]] = (), mask_pos_file: Path | None, mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 3, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, top: Path)

Bases: object

Mask batches of relation vectors.

CHECKSUM_KEY = 'mask'
MASK_POS_FMUT = 'pos-fmut'
MASK_POS_NINFO = 'pos-ninfo'
MASK_READ_DISCONTIG = 'read-discontig'
MASK_READ_FINFO = 'read-finfo'
MASK_READ_FMUT = 'read-fmut'
MASK_READ_GAP = 'read-gap'
MASK_READ_INIT = 'read-init'
MASK_READ_KEPT = 'read-kept'
MASK_READ_NCOV = 'read-ncov'
PATTERN_KEY = 'pattern'
create_report(began: datetime, ended: datetime)
mask()
property n_batches

Number of batches of reads.

property n_reads_discontig
property n_reads_kept

Number of reads kept.

property n_reads_max_fmut
property n_reads_min_finfo
property n_reads_min_gap
property n_reads_min_ncov
property n_reads_premask
property pos_gu

Positions masked for having a G or U base.

property pos_kept

Positions kept.

property pos_list

Positions masked arbitrarily from a list.

property pos_max_fmut

Positions masked for having too many mutations.

property pos_min_ninfo

Positions masked for having too few informative reads.

property pos_polya

Positions masked for lying in a poly(A) sequence.

seismicrna.mask.write.mask_section(dataset: RelateDataset | PoolDataset, section: Section, mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], *, tmp_dir: Path, force: bool, **kwargs)

Filter a section of a set of bit vectors.