seismicrna.relate package
Subpackages
- seismicrna.relate.aux package
- seismicrna.relate.c package
- seismicrna.relate.py package
- seismicrna.relate.tests package
Submodules
- class seismicrna.relate.batch.QnamesBatch(*, names: list[str] | ndarray, **kwargs)
Bases:
AllReadBatch- property num_reads
Number of reads.
- classmethod simulate(batch: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>, **kwargs)
Simulate a batch.
- class seismicrna.relate.batch.RelateBatch(*, section: Section, **kwargs)
Bases:
SectionMutsBatch,AllReadBatch- property read_weights
Weights for each read when computing counts.
- classmethod simulate(batch: int, ref: str, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, **kwargs)
Simulate a batch.
- Parameters:
batch (
int) – Batch number.ref (
str) – Name of the reference.pmut (
pd.DataFrame) – Rate of each type of mutation at each position.uniq_end5s (
np.ndarray) – Unique read 5’ end coordinates.uniq_end3s (
np.ndarray) – Unique read 3’ end coordinates.pends (
np.ndarray) – Probability of each set of unique end coordinates.paired (
bool) – Whether to simulate paired-end or single-end reads.read_length (
int) – Length of each read segment (paired-end reads only).p_rev (
float) – Probability that mate 1 is reversed (paired-end reads only).min_mut_gap (
int) – Minimum number of positions between two mutations.num_reads (
int) – Number of reads in the batch.
- class seismicrna.relate.data.QnamesDataset(report: BatchedReport, top: Path)
Bases:
LoadedDatasetDataset of read names from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property pattern
Pattern of mutations to count.
- class seismicrna.relate.data.RelateDataset(report: BatchedReport, top: Path)
Bases:
LoadedMutsDatasetDataset of mutations from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property paired
Whether the reads are paired-end.
- property pattern
Pattern of mutations to count.
- class seismicrna.relate.io.QnamesBatchIO(*, sample: str, ref: str, **kwargs)
Bases:
ReadBatchIO,RelateIO,QnamesBatch- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RelateBatchIO(*args, section: Section, **kwargs)
Bases:
MutsBatchIO,RelateIO,RelateBatch- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RelateIO(*, sample: str, ref: str, **kwargs)
-
- classmethod auto_fields()
Names and automatic values of selected fields.
- seismicrna.relate.io.from_reads(reads: Iterable[tuple[str, tuple[list[int], [list[int]]], dict[int, int]]], sample: str, ref: str, refseq: DNA, batch: int)
Accumulate reads into relation vectors.
Relate – Main Module
Auth: Matty
Define the command line interface for the ‘relate’ command, as well as its main run function that executes the relate step.
- seismicrna.relate.main.run(fasta: str, input_path: tuple[str, ...], *, out_dir: str = './out', min_reads: int = 1000, min_mapq: int = 25, phred_enc: int = 33, min_phred: int = 25, batch_size: int = 65536, ambindel: bool = True, overhangs: bool = True, clip_end5: int = 4, clip_end3: int = 6, max_procs: int = 16, parallel: bool = True, brotli_level: int = 10, force: bool = False, keep_tmp: bool = False, tmp_pfx='./tmp-')
Compute relationships between references and aligned reads.
- Parameters:
out_dir (
str) – Write all output files to this directory [keyword-only, default: ‘./out’]min_reads (
int) – Discard alignment maps with fewer than this many reads [keyword-only, default: 1000]min_mapq (
int) – Discard reads with mapping qualities below this threshold [keyword-only, default: 25]phred_enc (
int) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [keyword-only, default: 33]min_phred (
int) – Mark base calls with Phred scores lower than this threshold as ambiguous [keyword-only, default: 25]batch_size (
int) – Limit batches to at most this many reads [keyword-only, default: 65536]ambindel (
bool) – Mark all ambiguous insertions and deletions [keyword-only, default: True]overhangs (
bool) – Retain the overhangs of paired-end mates that dovetail [keyword-only, default: True]clip_end5 (
int) – Clip this many bases from the 5’ end of each read [keyword-only, default: 4]clip_end3 (
int) – Clip this many bases from the 3’ end of each read [keyword-only, default: 6]max_procs (
int) – Run up to this many processes simultaneously [keyword-only, default: 16]parallel (
bool) – Run tasks in parallel or in series [keyword-only, default: True]brotli_level (
int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]force (
bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]keep_tmp (
bool) – Keep temporary files after finishing [keyword-only, default: False]tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp-‘]
- class seismicrna.relate.report.RelateReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedRefseqReport,RelateIO- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
- seismicrna.relate.report.refseq_file_auto_fields()
- seismicrna.relate.report.refseq_file_seg_types()
- class seismicrna.relate.sam.XamViewer(xam_input: Path, tmp_dir: Path, batch_size: int, n_procs: int = 1)
Bases:
object- create_tmp_sam()
Create the temporary SAM file.
- delete_tmp_sam()
Delete the temporary SAM file.
- property indexes
- property n_reads
- open_tmp_sam()
Open the temporary SAM file as a file object.
- property paired
- property ref
- property sample
- property tmp_sam_path
Get the path to the temporary SAM file.
- seismicrna.relate.sam.read_name(line: str)
Get the name of the read in the current line of a SAM file.
- seismicrna.relate.sam.tmp_xam_cmd(xam_in: Path, xam_out: Path, n_procs: int = 1)
Collate and create a temporary XAM file.
- seismicrna.relate.sim.simulate_batch(sample: str, ref: str, batch: int, pmut: ~pandas.core.frame.DataFrame, uniq_end5s: ~numpy.ndarray, uniq_end3s: ~numpy.ndarray, pends: ~numpy.ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>)
Simulate a pair of RelateBatchIO and QnamesBatchIO.
- seismicrna.relate.sim.simulate_batches(batch_size: int, pmut: DataFrame, pclust: Series, num_reads: int, **kwargs)
- seismicrna.relate.sim.simulate_cluster(first_batch: int, batch_size: int, num_reads: int, **kwargs)
Simulate all batches for one cluster.
- seismicrna.relate.sim.simulate_relate(*, out_dir: Path, tmp_dir: Path, sample: str, ref: str, refseq: DNA, batch_size: int, num_reads: int, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, pclust: Series, brotli_level: int, force: bool, **kwargs)
Simulate an entire relate step.
Relation Vector Writing Module
Given alignment map (BAM) files, split each file into batches of reads, write the relation vectors for each batch to a compressed file, and write a report summarizing the results.
- class seismicrna.relate.write.RelationWriter(xam_view: XamViewer, seq: DNA)
Bases:
objectCompute and write relation vectors for all reads from one sample mapped to one reference sequence.
- property num_reads
- property ref
- property sample
- write(*, out_dir: Path, release_dir: Path, min_mapq: int, min_reads: int, brotli_level: int, force: bool, overhangs: bool, min_phred: int, phred_enc: int, ambindel: bool, clip_end5: int, clip_end3: int, **kwargs)
Compute a relation vector for every record in a BAM file, write the vectors into one or more batch files, compute their checksums, and write a report summarizing the results.
- seismicrna.relate.write.generate_batch(batch: int, *, xam_view: XamViewer, top: Path, refseq: DNA, min_mapq: int, min_qual: str, ambindel: bool, overhangs: bool, clip_end5: int, clip_end3: int, brotli_level: int)
Compute relation vectors for every SAM record in one batch, write the vectors to a batch file, and return its MD5 checksum and the number of vectors.