seismicrna.relate.py package

Subpackages

Submodules

class seismicrna.relate.py.ambindel.Deletion(rel_ins_pos: int, rel_del_pos: int)

Bases: Indel

property rank

Rank of the indel.

sweep(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, ref: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)

Move the indel as far as possible in one direction.

class seismicrna.relate.py.ambindel.Indel(rel_ins_pos: int, rel_del_pos: int)

Bases: ABC

Base class for an Insertion or Deletion (collectively, “indel”) It is used to find alternative positions for indels by keeping track of an indel’s current coordinates (as it is moved) and determining whether a specific move is valid.

Parameters:
  • rel_ins_pos (int) – The 0-indexed position of the indel with respect to the sequence (ref or read) with the relative insertion. This position points to one specific base. If the mutation is labeled an insertion, then the read is the sequence with the relative insertion (since it has a base that is not in the reference), and rel_ins_pos is the 0-based index of the inserted base in the coordinates of the read sequence. If the mutation is labeled a deletion, then the reference is the sequence with the relative insertion (since it has a base that is not in the read), and rel_ins_pos is the 0-based index of the deleted base in the coordinates of the reference sequence.

  • (int) (rel_del_pos) – The opposite of rel_ins_pos: the 0-indexed position of the indel with respect to the sequence with a relative deletion (that is, the read if the mutation is denoted a deletion, and the ref if an insertion). Because the deleted base does not actually exist in the sequence whose coordinates it is based on, rel_del_pos does not refer to a specific position in the sequence, rather to the two extant positions in the sequence that flank the deleted position. It is most convenient for the algorithm to have this argument refer to the position 3’ of the deleted base and define the 5’ position as a property.

property del_pos3
property del_pos5
property ins_pos
abstract property rank: int

Rank of the indel.

reset()

Reset the indel to its initial position, and erase its history of tunneling.

step_del_pos(swap_pos: int)
abstract sweep(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, ref: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)

Move the indel as far as possible in one direction.

property tunneled
class seismicrna.relate.py.ambindel.Insertion(rel_ins_pos: int, rel_del_pos: int)

Bases: Indel

property rank

Rank of the indel.

stamp(muts: dict[int, int], reflen: int)

Stamp the relation vector with a 5’ and a 3’ insertion.

sweep(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, ref: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)

Move the indel as far as possible in one direction.

seismicrna.relate.py.ambindel.find_ambindels(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, refseq: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion])

Find and label all positions in the vector that are ambiguous due to insertions and deletions.

Parameters:
  • muts (dict) – Mutations

  • end5_ref (int) – 5’ most position of the read that is not soft-clipped, using reference coordinates.

  • end3_ref (int) – 3’ most position of the read that is not soft-clipped, using reference coordinates.

  • end5_read (int) – 5’ most position of the read that is not soft-clipped, using read coordinates.

  • end3_read (int) – 3’ most position of the read that is not soft-clipped, using read coordinates.

  • refseq (DNA) – Reference sequence

  • read (DNA) – Sequence of the read

  • qual (str) – Phred quality scores of the read, encoded as ASCII characters

  • min_qual (str) – The minimum Phred quality score needed to consider a base call informative: integer value of the ASCII character

  • dels (list[Deletion]) – List of deletions.

  • inns (list[Insertion]) – List of insertions.

seismicrna.relate.py.ambindel.sweep_indels(muts: dict[int, int], end5_ref: int, end3_ref: int, end5_read: int, end3_read: int, refseq: DNA, read: DNA, qual: str, min_qual: str, dels: list[Deletion], inns: list[Insertion], from3to5: bool, tunnel: bool)

For every insertion and deletion,

Parameters:
  • muts (dict) – Mutations

  • end5_ref (int) – 5’ most position of the read that is not soft-clipped, using reference coordinates.

  • end3_ref (int) – 3’ most position of the read that is not soft-clipped, using reference coordinates.

  • end5_read (int) – 5’ most position of the read that is not soft-clipped, using read coordinates.

  • end3_read (int) – 3’ most position of the read that is not soft-clipped, using read coordinates.

  • refseq (DNA) – Reference sequence

  • read (DNA) – Sequence of the read

  • qual (str) – Phred quality scores of the read, encoded as ASCII characters

  • min_qual (int) – The minimum Phred quality score needed to consider a base call informative: integer value of the ASCII character

  • dels (list[Deletion]) – List of deletions.

  • inns (list[Insertion]) – List of insertions.

  • from3to5 (bool) – Whether to move indels in the 3’ -> 5’ direction (True) or the 5’ -> 3’ direction (False)

  • tunnel (bool) – Whether to allow tunneling

seismicrna.relate.py.cigar.op_consumes_read(op: str)

Whether the CIGAR operation consumes the read.

seismicrna.relate.py.cigar.op_consumes_ref(op: str)

Whether the CIGAR operation consumes the reference.

seismicrna.relate.py.cigar.parse_cigar(cigar_string: str)

Yield the fields of a CIGAR string as pairs of (operation, length), where operation is 1 byte indicating the CIGAR operation and length is a positive integer indicating the number of bases from the read that the operation consumes. Note that in the CIGAR string itself, each length precedes its corresponding operation.

Parameters:

cigar_string (bytes) – CIGAR string from a SAM file. For full documentation, refer to https://samtools.github.io/hts-specs/

Yields:
  • bytes (length = 1) – Current CIGAR operation

  • int (≥ 1) – Length of current CIGAR operation

Relate Code Module


Convert the relationships between reads and a reference from SAM format (which encodes relationships implicitly as CIGAR strings) to vectorized format (which encodes relationships explicitly as elements of arrays).


seismicrna.relate.py.encode.encode_match(read_base: str, read_qual: str, min_qual: str)

A more efficient version of encode_compare given prior knowledge from the CIGAR string that the read and reference match at this position. Note that there is no analagous version when there is a known substitution because substitutions are relatively infrequent, so optimizing their processing would speed the program only slightly while making the source code more complex and harder to maintain.

seismicrna.relate.py.encode.encode_relate(ref_base: str, read_base: str, read_qual: str, min_qual: str)

Encode the relation between a base in the read and a base in the reference sequence. If the read quality is sufficient, then return the match encoding if the read and reference bases match, otherwise the encoding of the substitution for the base in the read. If the read quality is insufficient, then return the fully ambiguous base encoding, that is a match or substitution to any base except the reference base, since a “substitution to the reference base” would be a match, not a substitution.

Parameters:
  • ref_base (DNA) – Base in the reference sequence.

  • read_base (DNA) – Base in the read sequence.

  • read_qual (str) – ASCII encoding for the Phred quality score of the read base.

  • min_qual (str) – Minimum value of read_qual to not call the relation ambiguous.

exception seismicrna.relate.py.error.RelateError

Bases: Exception

Any error that occurs during relating.

exception seismicrna.relate.py.error.RelateNotImplementedError

Bases: RelateError, NotImplementedError

Any NotImplementedError that occurs during relating.

exception seismicrna.relate.py.error.RelateValueError

Bases: RelateError, ValueError

Any ValueError that occurs during relating.

class seismicrna.relate.py.relate.SamFlag(flag: int)

Bases: object

Represents the set of 12 boolean flags for a SAM record.

first
flag
paired
rev
second
class seismicrna.relate.py.relate.SamRead(line: str)

Bases: object

One read in a SAM file.

MIN_FIELDS = 11
cigar
flag
mapq
pos
qname
qual
rname
seq
seismicrna.relate.py.relate.find_rels_line(line1: str, line2: str, ref: str, refseq: DNA, min_mapq: int, qmin: str, ambindel: bool, overhangs: bool, clip_end5: int = 0, clip_end3: int = 0)