seismicrna.core.mu package

Subpackages

Submodules

Pairwise comparisons of mutation rates.

seismicrna.core.mu.compare.calc_coeff_determ(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the coefficient of determination (a.k.a. R-squared) between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Coefficient of determination.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_nrmsd(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the normalized root-mean-square deviation (NRMSD) of two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Normalized root-mean-square deviation (NRMSD)

Return type:

np.ndarray | pd.Series | pd.DataFrame

seismicrna.core.mu.compare.calc_pearson(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Pearson correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Pearson correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_rmsd(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the root-mean-square deviation (RMSD) of two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Root-mean-square deviation (RMSD)

Return type:

np.ndarray | pd.Series | pd.DataFrame

seismicrna.core.mu.compare.calc_spearman(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Spearman rank correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Spearman rank correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.compare_windows(mus1: Series, mus2: Series, method: str | Callable, size: int, min_count: int = 2)

Compare two Series via sliding windows.

seismicrna.core.mu.compare.get_comp_func(key: str) Callable

Get the function of a comparison method based on its key.

Parameters:

key (str) – Key with which to retrieve the comparison function.

Returns:

Function to compare mutation rates.

Return type:

Callable

seismicrna.core.mu.compare.get_comp_name(key: str) str

Get the name of a comparison method based on its key.

Parameters:

key (str) – Key with which to retrieve the comparison method name.

Returns:

Name of the comparison method.

Return type:

str

seismicrna.core.mu.dim.count_pos(mus: ndarray | Series | DataFrame)

Count the positions in an array of mutation rates.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Number of positions in the array of mutation rates.

Return type:

int

seismicrna.core.mu.dim.counts_pos(*mus: ndarray | Series | DataFrame)

Count the positions in each array of mutation rates.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Number of positions in each array of mutation rates.

Return type:

tuple[int, ]

seismicrna.core.mu.dim.counts_pos_consensus(*mus: ndarray | Series | DataFrame)

Find the number of positions in every array of mutation rates; every array must have the same number of positions.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.

Returns:

Number of positions in every array of mutation rates.

Return type:

int

seismicrna.core.mu.frame.auto_reframe(func: Callable)

Decorate a function with one positional argument of data so that it converts the input data to a NumPy array, runs, and then reframes the return value using the original argument as the target.

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

seismicrna.core.mu.frame.reframe(values: Number | ndarray | Series | DataFrame, axes: None | tuple[int | ndarray | Index, ...] = None)

Place the values in an array object with the given axes.

Parameters:
  • values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.

  • axes (None | tuple[int | numpy.ndarray | pandas.Index, ] = None) –

    Axes of the new array-like object, specified as follows:

    • If None, then return just the values as a NumPy array.

    • If a tuple, then each element creates an axis as follows:

      • If an integer, then force the corresponding axis to be of that length.

      • If an array-like, then assign the axis a Pandas Index from the values in the element.

      Then, the array and index types are determined as follows:

      • If all elements are integers, then return a NumPy array in which the values are broadcast to the shape given by axes.

      • If at least one element is array-like, then return a Pandas object (a Series if axes has one item, a DataFrame if two).

      • If integers and array-like items are mixed, then replace each integer with a Pandas RangeIndex.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.frame.reframe_like(values: Number | ndarray | Series | DataFrame, target: ndarray | Series | DataFrame, drop: int = 0)

Place the values in an array object with the same type and axes as target.

Parameters:
  • values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.

  • target (numpy.ndarray | pandas.Series | pandas.DataFrame) – Array object whose type and axes are to be used for constructing the returned array.

  • drop (int = 0) – Reduce the dimensionality of the target by dropping this number of axes, starting from axis 0 and continuing upwards.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

Calculate trends in mutation rates.

seismicrna.core.mu.measure.calc_gini(mus: ndarray | Series | DataFrame)

Calculate the Gini coefficient of mutation rates, ignoring NaNs.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Value of the Gini coefficient.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.measure.calc_signal_noise(mus: ndarray | Series | DataFrame, is_signal: ndarray | Series)

Calculate the signal-to-noise ratio of mutation rates.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a DataFrame.

  • is_signal (np.ndarray | pd.Series) – Whether to count each position as signal.

Returns:

Signal-to-noise ratio.

Return type:

float | numpy.ndarray | pandas.Series

Comparisons of arbitrary numbers of mutation rates.

seismicrna.core.mu.nan.any_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where any mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Boolean array of positions where any mutation rate is NaN.

Return type:

numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.auto_remove_nan(func: Callable)

Decorate a function with one positional argument of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

seismicrna.core.mu.nan.auto_removes_nan(func: Callable)

Decorate a function with positional argument(s) of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

seismicrna.core.mu.nan.no_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where no mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Boolean array of positions where no mutation rate is NaN.

Return type:

numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.remove_nan(mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Mutation rates without NaN values.

Return type:

tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

seismicrna.core.mu.nan.removes_nan(*mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate in any group is NaN.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.

Returns:

Mutation rates without NaN values.

Return type:

tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

Scale mutation rates.

seismicrna.core.mu.scale.calc_quantile(mus: ndarray | Series | DataFrame, quantile: float)

Calculate the mutation rate at a quantile, ignoring NaNs.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile to return from the mutation rates; must be in [0, 1].

Returns:

Value of the quantile from the mutation rates.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.scale.calc_ranks(mus: ndarray | Series | DataFrame)

Rank the mutation rates.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Ranks of the mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.scale.calc_rms(mus: ndarray | Series | DataFrame)

Calculate the root-mean-square mutation rate, ignoring NaNs.

Parameters:

mus (np.ndarray | pd.Series | pd.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Root-mean-square mutation rate.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.scale.normalize(mus: ndarray | Series | DataFrame, quantile: float)

Normalize the mutation rates to a quantile, so that the value of the quantile is scaled to 1 and all other mutation rates are scaled by the same factor. If quantile is 0, then do not normalize.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile for normalizing the mutation rates; must be in [0, 1].

Returns:

Normalized mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.scale.standardize(mus: ndarray | Series | DataFrame)

Standardize mutation rates so that the root-mean-square mutation rate equals 1. Note that approximately half of the standardized mutation rates will be greater than 1.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Standardized mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.scale.winsorize(mus: ndarray | Series | DataFrame, quantile: float)

Normalize and winsorize the mutation rates to a quantile so that all mutation rates greater than or equal to the mutation rate at the quantile are set to 1, and lesser mutation rates are normalized.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile for normalizing the mutation rates; must be in [0, 1].

Returns:

Normalized and winsorized mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

Mutation Rate Core Module


The functions in this module serve two main purposes:
  1. Adjust mutation rates to correct for observer bias.

  2. Normalize and winsorize mutation rates


Adjust mutation rates to correct for observer bias

Our lab has found that pairs of mutations in DMS-MaPseq data rarely have fewer than three non-mutated bases separating them. We suspect that the reverse transcriptase is prone to falling off or stalling at locations in the RNA where DMS has methylated bases that are too close.

Regardless of the reason, mutations on nearby bases are not independent, which violates a core assumption of the Bernoulli mixture model that we use in the expectation-maximization clustering algorithm. Namely, that mutations occur independently of each other, such that the likelihood of observing a bit vector is the product of the mutation rate of each base that is mutated and one minus the mutation rate (“non-mutation rate”) of each base that is not mutated.

In order to use the Bernoulli mixture model for expectation-maximization clustering, we modify it such that bases separated by zero, one, or two other bases are no longer assumed to mutate independently. Specifically, since pairs of close mutations are rare, we add the assumption that no mutations separated by fewer than three non-mutated bases may occur, and exclude the few bit vectors that have such close mutations.

When these bit vectors are assumed to exist in the original population of RNA molecules but not appear in the observed data, the mutation rates that are observed will differ from the real, underlying mutation rates, which would include the unobserved bit vectors. The real mutation rates must therefore be estimated from the observed mutation rates.

It is relatively easy to estimate the observed mutation rates given the real mutation rates, but there is no closed-form solution that we know of to estimate the real mutation rates from the observed mutation rates. Thus, we use an iterative approach, based on Newton’s method for finding the roots of functions. We initially guess the real mutation rates. Each iteration, we estimate the mutation rates that would have been observed given our current guess of the real mutation rates, and then subtract the mutation rates that were actually observed. This difference would be zero if we had accurately guessed the real mutation rates. Thus, we use Newton’s method to solve for the real mutation rates that minimize this difference. The details are described in the comments of this module and in Tomezsko et al. (2020) (https://doi.org/10.1038/s41586-020-2253-5).


Normalize and winsorize mutation rates

The mutation rates of an RNA may vary among different samples because of variations in the chemical probing and mutational profiling procedure. Thus, it is often helpful to compute “normalized” mutation rates that can be compared directly across different samples and used to predict secondary structures.

This module currently provides a simple method for normalizing mutation rates. First, a specific quantile of the dataset is chosen, such as 0.95 (i.e. the 95th percentile). The mutation rate with this quantile is set to 1.0, and all other mutation rates are scaled linearly.

If the chosen quantile is less than 1.0, then any mutation rates above the quantile will be scaled to values greater than 1.0. Since these high mutation rates may be exceptionally reactive bases, it is often helpful to cap the normalized mutation rates to a maximum of 1.0. The winsorize function in this module performs normalization and then sets any value greater than 1.0 to 1.0.

seismicrna.core.mu.unbias.calc_p_clust(p_clust_observed: ndarray, p_noclose: ndarray)

Cluster proportion among all reads.

Parameters:
  • p_clust_observed (np.ndarray) – Proportion of each cluster among reads with no two mutations too close. 1D (clusters)

  • p_noclose (np.ndarray) – Probability that a read from each cluster would have no two mutations too close. 1D (clusters)

Returns:

Proportion of each cluster among all reads. 1D (clusters)

Return type:

np.ndarray

seismicrna.core.mu.unbias.calc_p_clust_given_noclose(p_clust: ndarray, p_noclose: ndarray)

Cluster proportions among reads with no two mutations too close.

Parameters:
  • p_clust (np.ndarray) – Proportion of each cluster among all reads. 1D (clusters)

  • p_noclose (np.ndarray) – Probability that a read from each cluster would have no two mutations too close. 1D (clusters)

Returns:

Proportion of each cluster among reads with no two mutations too close. 1D (clusters)

Return type:

np.ndarray

seismicrna.core.mu.unbias.calc_p_ends_given_noclose(p_ends: ndarray, p_noclose_given_ends: ndarray)

Calculate the proportion of reads with no two mutations too close with each pair of 5’ and 3’ coordinates.

Assumptions

  • p_ends has 2 dimensions: (positions x clusters)

  • Every value in the upper triangle of p_ends is ≥ 0 and ≤ 1; no values below the main diagonal are used.

  • The upper triangle of p_ends sums to 1.

  • min_gap is a non-negative integer.

  • p_mut_given_span has 2 dimensions: (positions x clusters)

  • Every value in p_mut_given_span is ≥ 0 and ≤ 1.

  • There is at least 1 cluster.

param p_ends:

2D (positions x positions) array of the proportion of reads in each cluster beginning at the row position and ending at the column position.

type p_ends:

np.ndarray

param p_noclose_given_ends:

3D (positions x positions x clusters) array of the pobabilities that a read with 5’ and 3’ coordinates corresponding to the row and column would have no two mutations too close.

type p_noclose_given_ends:

np.ndarray

returns:

3D (positions x positions x clusters) array of the proportion of reads without mutations too close, beginning at the row position and ending at the column position, in each cluster.

rtype:

np.ndarray

seismicrna.core.mu.unbias.calc_p_ends_observed(npos: int, end5s: ndarray, end3s: ndarray, weights: ndarray | None = None, check_values: bool = True)

Calculate the proportion of each pair of 5’/3’ end coordinates observed in end5s and end3s, optionally weighted by weights.

Parameters:
  • npos (int) – Number of positions.

  • end5s (np.ndarray) – 5’ end coordinates of the reads: 1D array (reads)

  • end3s (np.ndarray) – 3’ end coordinates of the reads: 1D array (reads)

  • weights (np.ndarray | None = None) – Number of times each read occurs in each cluster: 2D array (reads x clusters)

  • check_values (bool = True) – Check that end5s, end3s, and weights are all valid.

Returns:

Fraction of reads with each 5’ (row) and 3’ (column) coordinate: 3D array (positions x positions x clusters)

Return type:

np.ndarray

seismicrna.core.mu.unbias.calc_p_noclose(p_ends: ndarray, p_noclose_given_ends: ndarray)

Calculate the proportion of each cluster considering only reads with no two mutations too close.

seismicrna.core.mu.unbias.calc_p_noclose_given_ends(p_mut_given_span: ndarray, min_gap: int)

Given underlying mutation rates (p_mut_given_span), calculate the probability that a read starting at position (a) and ending at position (b) would have no two mutations too close (i.e. separated by fewer than min_gap non-mutated positions), for each combination of (a) and (b) such that 1 ≤ a ≤ b ≤ L (in biological coordinates) or 0 ≤ a ≤ b < L (in Python coordinates).

Parameters:
  • p_mut_given_span (ndarray) – A 2D (positions x clusters) array of the underlying mutation rates, i.e. the probability that a read has a mutation at position (j) given that it contains position (j).

  • min_gap (int) – Minimum number of non-mutated bases between two mutations; must be ≥ 0.

Returns:

3D (positions x positions x clusters) array of the probability that a random read starting at position (a) (row) and ending at position (b) (column) would have no two mutations too close.

Return type:

np.ndarray

seismicrna.core.mu.unbias.calc_params(p_mut_given_span_observed: ndarray, p_ends_observed: ndarray, p_clust_observed: ndarray, min_gap: int, guess_p_mut_given_span: ndarray | None = None, guess_p_ends: ndarray | None = None, guess_p_clust: ndarray | None = None, *, prenormalize: bool = True, max_iter: int = 128, convergence_thresh: float = 0.0001, **kwargs)

Calculate the three sets of parameters based on observed data.

Parameters:
  • p_mut_given_span_observed (np.ndarray) – Observed probability that each position is mutated given that no two mutations are too close: 2D array (positions x clusters)

  • p_ends_observed (np.ndarray) – Observed proportion of reads aligned with each pair of 5’ and 3’ end coordinates given that no two mutations are too close: 3D array (positions x positions x clusters)

  • p_clust_observed (np.ndarray) – Observed proportion of reads in each cluster given that no two mutations are too close: 1D array (clusters)

  • min_gap (int) – Minimum number of non-mutated bases between two mutations. Must be a non-negative integer.

  • guess_p_mut_given_span (np.ndarray | None = None) – Initial guess for the probability that each position is mutated. If given, must be a 2D array (positions x clusters); defaults to p_mut_given_span_observed.

  • guess_p_ends (np.ndarray | None = None) – Initial guess for the proportion of total reads aligned to each pair of 5’ and 3’ end coordinates. If given, must be a 2D array (positions x positions); defaults to p_ends_observed.

  • guess_p_clust (np.ndarray | None = None) – Initial guess for the proportion of total reads in each cluster. If given, must be a 1D array (clusters); defaults to p_clust_observed.

  • prenormalize (bool = True) – Fill missing values in guess_p_mut_given_span, guess_p_ends, and guess_p_clust, and clip every value to be ≥ 0 and ≤ 1. Ensure the proportions in guess_p_clust and the upper triangle of guess_p_ends sum to 1.

  • max_iter (int = 128) – Maximum number of iterations in which to refine the parameters.

  • convergence_thresh (float = 1.e-4) – Convergence threshold based on the root-mean-square difference in mutation rates between consecutive iterations.

  • **kwargs – Additional keyword arguments for _calc_p_mut_given_span.

seismicrna.core.mu.unbias.calc_params_observed(n_pos_total: int, order: int, unmasked_pos: Iterable[int], muts_per_pos: Iterable[ndarray], end5s: ndarray, end3s: ndarray, counts_per_uniq: ndarray, membership: ndarray)

Calculate the observed estimates of the parameters.

Parameters:
  • n_pos_total (int) – Total number of positions in the section.

  • order (int) – Order of clustering.

  • unmasked_pos (Iterable[int]) – Unmasked positions; must be zero-indexed with respect to the 5’ end of the section.

  • muts_per_pos (Iterable[np.ndarray]) – For each unmasked position, numbers of all reads with a mutation at that position.

  • end5s (np.ndarray) – Coordinates of the 5’ ends of all reads; must be 0-indexed with respect to the 5’ end of the section.

  • end3s (np.ndarray) – Coordinates of the 3’ ends of all reads; must be 0-indexed with respect to the 5’ end of the section.

  • counts_per_uniq (np.ndarray) – Number of times each unique read occurs.

  • membership (np.ndarray) – Cluster memberships of each read: 2D array (reads x clusters)

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

seismicrna.core.mu.unbias.calc_rectangluar_sum(array: ndarray)

For each element of the main diagonal, calculate the sum over the rectangular array from that element to the upper right corner.

Parameters:

array (np.ndarray) – Array of at least two dimensions for which to calculate the sum of each rectangular array from each element on the main diagonal to the upper right corner. The first two dimensions must have equal lengths.

Returns:

Array with all but the first dimension of array indicating the sum of the array from each element on the main diagonal to the upper right corner of array.

Return type:

np.ndarray

seismicrna.core.mu.unbias.triu_log(a: ndarray)

Calculate the logarithm of the upper triangle(s) of array a. In the result, elements below the main diagonal are undefined.

Parameters:

a (np.ndarray) – Array (≥ 2 dimensions) of whose upper triangle to compute the logarithm; the first 2 dimensions must have equal lengths.

Returns:

Logarithm of the upper triangle(s) of a.

Return type:

np.ndarray