seismicrna.mask package
Submodules
- class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)
Bases:
MaskReadBatch,SectionMutsBatch,PartialMutsBatch- property read_weights
Weights for each read when computing counts.
- class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)
Bases:
PartialReadBatch- property num_reads
Number of reads.
- property read_nums
Read numbers.
- seismicrna.mask.batch.apply_mask(batch: SectionMutsBatch, read_nums: ndarray | None = None, section: Section | None = None, sanitize: bool = False)
- class seismicrna.mask.data.JoinMaskMutsDataset(*args, **kwargs)
Bases:
JoinMutsDataset,MergedUnbiasDataset- classmethod get_batch_type()
Type of batch.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- classmethod name_batch_attrs()
Name the attributes of each batch.
- class seismicrna.mask.data.MaskMutsDataset(data1: MutsDataset, data2: Dataset)
Bases:
ArrowDataset,UnbiasDatasetChain mutation data with masked reads.
- MASK_NAME = 'mask'
- classmethod get_dataset1_load_func()
Function to load Dataset 1.
- classmethod get_dataset2_type()
Type of Dataset 2.
- property min_mut_gap
Minimum gap between two mutations.
- property pattern
Pattern of mutations to count.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- property section
Section of the dataset.
- class seismicrna.mask.data.MaskReadDataset(report: BatchedReport, top: Path)
Bases:
LoadedDataset,UnbiasDatasetLoad batches of masked relation vectors.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property min_mut_gap
Minimum gap between two mutations.
- property pattern
Pattern of mutations to count.
- property pos_kept
Positions kept after masking.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- class seismicrna.mask.io.MaskBatchIO(*, sect: str, **kwargs)
Bases:
ReadBatchIO,MaskIO,MaskReadBatch- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.io.MaskIO(*, sect: str, **kwargs)
-
- classmethod auto_fields()
Names and automatic values of selected fields.
- seismicrna.mask.main.load_sections(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, sections_file: Path | None = None)
Open sections of relate reports.
- seismicrna.mask.main.run(input_path: tuple[str, ...], *, mask_coords: tuple[tuple[str, int, int], ...] = (), mask_primers: tuple[tuple[str, DNA, DNA], ...] = (), primer_gap: int = 0, mask_sections_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: tuple[str, ...] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos: tuple[tuple[str, int], ...] = (), mask_pos_file: str | None = None, mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: int = 1.0, min_mut_gap: int = 3, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, max_procs: int = 16, parallel: bool = True, force: bool = False, tmp_pfx='./tmp-', keep_tmp=False) list[Path]
Define mutations and sections to filter reads and positions.
- Parameters:
mask_coords (
tuple) – Mask a section of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]mask_primers (
tuple) – Mask a section of a reference given its forward and reverse primers [keyword-only, default: ()]primer_gap (
int) – Leave a gap of this many bases between the primer and the section [keyword-only, default: 0]mask_sections_file (
str | None) – Mask sections of references from coordinates/primers in a CSV file [keyword-only, default: None]mask_del (
bool) – Mask deletions [keyword-only, default: True]mask_ins (
bool) – Mask insertions [keyword-only, default: True]mask_mut (
tuple) – Mask this type of mutation [keyword-only, default: ()]mask_polya (
int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]mask_gu (
bool) – Mask G and U bases [keyword-only, default: True]mask_pos (
tuple) – Mask this position in this reference [keyword-only, default: ()]mask_pos_file (
str | None) – Mask positions in references from a file [keyword-only, default: None]mask_discontig (
bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]min_ninfo_pos (
int) – Mask positions with fewer than this many unambiguous base calls [keyword-only, default: 1000]max_fmut_pos (
float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_ncov_read (
int) – Mask reads with fewer than this many bases covering the section [keyword-only, default: 1]min_finfo_read (
float) – Mask reads with less than this fraction of unambiguous base calls [keyword-only, default: 0.95]max_fmut_read (
int) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_mut_gap (
int) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 3]quick_unbias (
bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]quick_unbias_thresh (
float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]brotli_level (
int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]max_procs (
int) – Run up to this many processes simultaneously [keyword-only, default: 16]parallel (
bool) – Run tasks in parallel or in series [keyword-only, default: True]force (
bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp-‘]
keep_tmp – Keep temporary files after finishing [keyword-only, default: False]
- class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport,MaskIO- classmethod auto_fields()
Names and automatic values of selected fields.
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
Mask – Write Module
- class seismicrna.mask.write.Masker(dataset: RelateDataset | PoolDataset, section: Section, pattern: RelPattern, *, mask_polya: int = 5, mask_gu: bool = True, mask_pos: list[tuple[str, int]] = (), mask_pos_file: Path | None, mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 3, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, top: Path)
Bases:
objectMask batches of relation vectors.
- CHECKSUM_KEY = 'mask'
- MASK_POS_FMUT = 'pos-fmut'
- MASK_POS_NINFO = 'pos-ninfo'
- MASK_READ_DISCONTIG = 'read-discontig'
- MASK_READ_FINFO = 'read-finfo'
- MASK_READ_FMUT = 'read-fmut'
- MASK_READ_GAP = 'read-gap'
- MASK_READ_INIT = 'read-init'
- MASK_READ_KEPT = 'read-kept'
- MASK_READ_NCOV = 'read-ncov'
- PATTERN_KEY = 'pattern'
- mask()
- property n_batches
Number of batches of reads.
- property n_reads_discontig
- property n_reads_kept
Number of reads kept.
- property n_reads_max_fmut
- property n_reads_min_finfo
- property n_reads_min_gap
- property n_reads_min_ncov
- property n_reads_premask
- property pos_gu
Positions masked for having a G or U base.
- property pos_kept
Positions kept.
- property pos_list
Positions masked arbitrarily from a list.
- property pos_max_fmut
Positions masked for having too many mutations.
- property pos_min_ninfo
Positions masked for having too few informative reads.
- property pos_polya
Positions masked for lying in a poly(A) sequence.
- seismicrna.mask.write.mask_section(dataset: RelateDataset | PoolDataset, section: Section, mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], *, tmp_dir: Path, force: bool, **kwargs)
Filter a section of a set of bit vectors.