seismicrna.relate.py package
Subpackages
- seismicrna.relate.py.tests package
- Submodules
TestParseCigarTestEncodeMatchTestEncodeRelateTestFindRelsLineTestFindRelsLine.iter_cases()TestFindRelsLine.relate()TestFindRelsLine.test_aaaa_0ins()TestFindRelsLine.test_aaaaaa_0ins()TestFindRelsLine.test_aacc_1ins()TestFindRelsLine.test_acgt_1ins()TestFindRelsLine.test_all_matches()TestFindRelsLine.test_ambig_delet_low_qual()TestFindRelsLine.test_soft_clips()
TestMergeMates
- Submodules
Submodules
- class seismicrna.relate.py.ambindel.Deletion(rel_ins_pos: int, rel_del_pos: int)
Bases:
Indel- property rank
Rank of the indel.
- class seismicrna.relate.py.ambindel.Indel(rel_ins_pos: int, rel_del_pos: int)
Bases:
ABCBase class for an Insertion or Deletion (collectively, “indel”) It is used to find alternative positions for indels by keeping track of an indel’s current coordinates (as it is moved) and determining whether a specific move is valid.
- Parameters:
rel_ins_pos (
int) – The 0-indexed position of the indel with respect to the sequence (ref or read) with the relative insertion. This position points to one specific base. If the mutation is labeled an insertion, then the read is the sequence with the relative insertion (since it has a base that is not in the reference), and rel_ins_pos is the 0-based index of the inserted base in the coordinates of the read sequence. If the mutation is labeled a deletion, then the reference is the sequence with the relative insertion (since it has a base that is not in the read), and rel_ins_pos is the 0-based index of the deleted base in the coordinates of the reference sequence.(int) (rel_del_pos) – The opposite of rel_ins_pos: the 0-indexed position of the indel with respect to the sequence with a relative deletion (that is, the read if the mutation is denoted a deletion, and the ref if an insertion). Because the deleted base does not actually exist in the sequence whose coordinates it is based on, rel_del_pos does not refer to a specific position in the sequence, rather to the two extant positions in the sequence that flank the deleted position. It is most convenient for the algorithm to have this argument refer to the position 3’ of the deleted base and define the 5’ position as a property.
- property del_pos3
- property del_pos5
- property ins_pos
- reset()
Reset the indel to its initial position, and erase its history of tunneling.
- abstract sweep(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, ref: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)
Move the indel as far as possible in one direction.
- property tunneled
- class seismicrna.relate.py.ambindel.Insertion(rel_ins_pos: int, rel_del_pos: int)
Bases:
Indel- property rank
Rank of the indel.
- seismicrna.relate.py.ambindel.find_ambindels(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, refseq: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion])
Find and label all positions in the vector that are ambiguous due to insertions and deletions.
- Parameters:
muts (
dict) – Mutationsend5_ref (
int) – 5’ most position of the read that is not soft-clipped, using reference coordinates.end3_ref (
int) – 3’ most position of the read that is not soft-clipped, using reference coordinates.end5_read (
int) – 5’ most position of the read that is not soft-clipped, using read coordinates.end3_read (
int) – 3’ most position of the read that is not soft-clipped, using read coordinates.refseq (
DNA) – Reference sequenceread (
DNA) – Sequence of the readqual (
str) – Phred quality scores of the read, encoded as ASCII charactersmin_qual (
str) – The minimum Phred quality score needed to consider a base call informative: integer value of the ASCII characterdels (
list[Deletion]) – List of deletions.inns (
list[Insertion]) – List of insertions.
- seismicrna.relate.py.ambindel.sweep_indels(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, refseq: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)
For every insertion and deletion,
- Parameters:
muts (
dict) – Mutationsend5_ref (
int) – 5’ most position of the read that is not soft-clipped, using reference coordinates.end3_ref (
int) – 3’ most position of the read that is not soft-clipped, using reference coordinates.end5_read (
int) – 5’ most position of the read that is not soft-clipped, using read coordinates.end3_read (
int) – 3’ most position of the read that is not soft-clipped, using read coordinates.refseq (
DNA) – Reference sequenceread (
DNA) – Sequence of the readqual (
str) – Phred quality scores of the read, encoded as ASCII charactersmin_qual (
int) – The minimum Phred quality score needed to consider a base call informative: integer value of the ASCII characterdels (
list[Deletion]) – List of deletions.inns (
list[Insertion]) – List of insertions.from3to5 (
bool) – Whether to move indels in the 3’ -> 5’ direction (True) or the 5’ -> 3’ direction (False)tunnel (
bool) – Whether to allow tunneling
- seismicrna.relate.py.cigar.op_consumes_read(op: str)
Whether the CIGAR operation consumes the read.
- seismicrna.relate.py.cigar.op_consumes_ref(op: str)
Whether the CIGAR operation consumes the reference.
- seismicrna.relate.py.cigar.parse_cigar(cigar_string: str)
Yield the fields of a CIGAR string as pairs of (operation, length), where operation is 1 byte indicating the CIGAR operation and length is a positive integer indicating the number of bases from the read that the operation consumes. Note that in the CIGAR string itself, each length precedes its corresponding operation.
- Parameters:
cigar_string (
bytes) – CIGAR string from a SAM file. For full documentation, refer to https://samtools.github.io/hts-specs/- Yields:
bytes (length = 1)– Current CIGAR operationint (≥ 1)– Length of current CIGAR operation
Relate Code Module
Convert the relationships between reads and a reference from SAM format (which encodes relationships implicitly as CIGAR strings) to vectorized format (which encodes relationships explicitly as elements of arrays).
- seismicrna.relate.py.encode.encode_match(read_base: str, read_qual: str, min_qual: str)
A more efficient version of encode_compare given prior knowledge from the CIGAR string that the read and reference match at this position. Note that there is no analagous version when there is a known substitution because substitutions are relatively infrequent, so optimizing their processing would speed the program only slightly while making the source code more complex and harder to maintain.
- seismicrna.relate.py.encode.encode_relate(ref_base: str, read_base: str, read_qual: str, min_qual: str)
Encode the relation between a base in the read and a base in the reference sequence. If the read quality is sufficient, then return the match encoding if the read and reference bases match, otherwise the encoding of the substitution for the base in the read. If the read quality is insufficient, then return the fully ambiguous base encoding, that is a match or substitution to any base except the reference base, since a “substitution to the reference base” would be a match, not a substitution.
- exception seismicrna.relate.py.error.RelateError
Bases:
ExceptionAny error that occurs during relating.
- exception seismicrna.relate.py.error.RelateNotImplementedError
Bases:
RelateError,NotImplementedErrorAny NotImplementedError that occurs during relating.
- exception seismicrna.relate.py.error.RelateValueError
Bases:
RelateError,ValueErrorAny ValueError that occurs during relating.
- class seismicrna.relate.py.relate.SamFlag(flag: int)
Bases:
objectRepresents the set of 12 boolean flags for a SAM record.
- first
- flag
- paired
- rev
- second