seismicrna.core package

Subpackages

Submodules

seismicrna.core.array.calc_inverse(target: ndarray, require: int = -1, fill: bool = False, fill_rev: bool = False, fill_default: int | None = None, verify: bool = True, what: str = 'array')

Calculate the inverse of target, such that if element i of target has value x, then element x of the inverse has value i.

>>> list(calc_inverse(np.array([3, 2, 7, 5, 1])))
[-1, 4, 1, 0, -1, 3, -1, 2]
>>> list(calc_inverse(np.arange(5)))
[0, 1, 2, 3, 4]
Parameters:
  • target (np.ndarray) – Target values; must be a 1-dimensional array of non-negative integers with no duplicate values.

  • require (int = -1) – Require the inverse to contain all indexes up to and including require (i.e. that its length is at least require + 1); ignored if require is -1; must be ≥ -1.

  • fill (bool = False) – Fill missing indexes (that do not appear in target).

  • fill_rev (bool = False) – Fill missing indexes in reverse order instead of forward order; only used if fill is True.

  • fill_default (int | None = None) – Value with which to fill before the first non-missing value has been encountered; if fill_rev is True, defaults to the length of target, otherwise to -1.

  • verify (bool = True) – Verify that all target values are unique, non-negative integers. If this is incorrect, then if verify is True, then ValueError will be raised; and if False, then the results of this function will be incorrect. Always set to True unless you have already verified that target is unique, non-negative integers.

  • what (str = "array") – What to name the array (only used for error messages).

Returns:

Inverse of target.

Return type:

np.ndarray

seismicrna.core.array.check_naturals(values: ndarray, what: str = 'values')

Raise ValueError if the values are not monotonically increasing natural numbers.

seismicrna.core.array.ensure_order(array1: ndarray, array2: ndarray, what1: str = 'array1', what2: str = 'array2', gt_eq: bool = False)

Ensure that array1 is ≤ or ≥ array2, element-wise.

Parameters:
  • array1 (np.ndarray) – Array 1 (same length as array2).

  • array2 (np.ndarray) – Array 2 (same length as array1).

  • what1 (str = "array1") – What array1 contains (only used for error messages).

  • what2 (str = "array2") – What array2 contains (only used for error messages).

  • gt_eq (bool = False) – Ensure array1 ≥ array2 if True, otherwise array1 ≤ array2.

Returns:

Shared length of array1 and array2.

Return type:

int

seismicrna.core.array.ensure_same_length(arr1: ndarray, arr2: ndarray, what1: str = 'array1', what2: str = 'array2')
seismicrna.core.array.find_dims(dims: Sequence[Sequence[str | None]], arrays: Sequence[ndarray], names: Sequence[str] | None = None, nonzero: Iterable[str] | bool = False)

Check the dimensions of the arrays.

seismicrna.core.array.find_true_dists(booleans: ndarray)

Find the distance to each True element in a boolean array.

seismicrna.core.array.get_length(array: ndarray, what: str = 'array') int
seismicrna.core.array.list_naturals(n: int)

List natural numbers up to and including n.

seismicrna.core.array.locate_elements(collection: ndarray, *elements: ndarray, what: str = 'collection', verify: bool = True)

Find the index at which each element of elements occurs in collection.

>>> list(locate_elements(np.array([4, 1, 2, 7, 5, 3]), np.array([5, 2, 5])))
[4, 2, 4]
Parameters:
  • collection (np.ndarray) – Collection in which to find each element in elements; must be a 1-dimensional array of non-negative integers with no duplicate values.

  • *elements (np.ndarray) – Elements to find; must be a 1-dimensional array that is a subset of collection, although duplicate values are permitted.

  • what (str = "collection") – What to name the collection (only used for error messages).

  • verify (bool = True) – Verify that all values in collection are unique, non-negative integers and that all items in elements are in collections.

Returns:

Index of each element of elements in collections.

Return type:

np.ndarray

seismicrna.core.array.sanitize_values(values: Iterable[int], lower_limit: int, upper_limit: int, whats: str = 'values')

Validate and sort values, and return them as an array.

seismicrna.core.array.stochastic_round(values: ndarray)

Round values to integers stochastically, so that the probability of rounding up equals the mantissa.

seismicrna.core.array.triangular(n: int)

The n th triangular number (n ≥ 0): number of items in an equilateral triangle with n items on each side.

Parameters:

n (int) – Index of the triangular number to return; equivalently, the side length of the equilateral triangle.

Returns:

The triangular number with index n; equivalently, the number of items in the equilateral triangle of side length n.

Return type:

int

class seismicrna.core.data.ArrowDataset(data1: MutsDataset, data2: Dataset)

Bases: MultistepDataset, NarrowDataset, ABC

Dataset made by integrating two datasets from different steps of the workflow, with one section.

class seismicrna.core.data.Dataset

Bases: ABC

Dataset comprising batches of data.

property batch_nums

Numbers of the batches.

abstract property end3: int

3’ end of the section.

abstract property end5: int

5’ end of the section.

abstract get_batch(batch_num: int) ReadBatch

Get a specific batch of data.

abstract classmethod get_report_type() type[Report]

Type of report.

iter_batches()

Yield each batch.

abstract classmethod load(report_file: Path)

Load a dataset from a report file.

classmethod load_report(report_file: Path)

Load the report from a file.

abstract property num_batches: int

Number of batches.

property num_reads

Number of reads in the dataset.

abstract property pattern: RelPattern | None

Pattern of mutations to count.

abstract property ref: str

Name of the reference.

abstract property sample: str

Name of the sample.

abstract property sect: str

Name of the section.

abstract property top: Path

Top-level directory of the dataset.

class seismicrna.core.data.LoadFunction(data_type: type[Dataset], /, *more_types: type[Dataset])

Bases: object

Function to load a dataset.

__call__(report_file: Path)

Load a dataset from the report file.

property dataset_types

Types of datasets that this function can load.

is_dataset_type(dataset: Dataset)

Whether the dataset is one of the loadable types.

property report_path_auto_fields

Automatic field values of the report file path.

property report_path_seg_types

Segment types of the report file path.

class seismicrna.core.data.LoadedDataset(report: BatchedReport, top: Path)

Bases: Dataset, ABC

Dataset created by loading directly from a Report.

property end3

3’ end of the section.

property end5

5’ end of the section.

get_batch(batch_num: int) ReadBatchIO | MutsBatchIO

Get a specific batch of data.

get_batch_checksum(batch: int)

Get the checksum of a specific batch from the report.

get_batch_path(batch: int)

Get the path to a batch of a specific number.

abstract classmethod get_batch_type() type[ReadBatchIO | MutsBatchIO]

Type of batch.

classmethod get_btype_name()

Name of the type of batch.

abstract classmethod get_report_type() type[BatchedReport]

Type of report.

classmethod load(report_file: Path)

Load a dataset from a report file.

property num_batches

Number of batches.

property ref

Name of the reference.

property sample

Name of the sample.

property sect

Name of the section.

property top

Top-level directory of the dataset.

class seismicrna.core.data.LoadedMutsDataset(report: BatchedReport, top: Path)

Bases: LoadedDataset, NarrowDataset, ABC

property end3

3’ end of the section.

property end5

5’ end of the section.

property refseq

Sequence of the reference.

property sect

Name of the section.

class seismicrna.core.data.MergedDataset(datasets: Iterable[Dataset])

Bases: Dataset, ABC

Dataset made by merging one or more constituent datasets.

abstract classmethod get_dataset_load_func() LoadFunction

Function to load one constituent dataset.

property pattern

Pattern of mutations to count.

property ref

Name of the reference.

property top

Top-level directory of the dataset.

class seismicrna.core.data.MergedMutsDataset(datasets: Iterable[Dataset])

Bases: MergedDataset, MutsDataset, ABC

MergedDataset with explicit mutational data.

property refseq

Sequence of the reference.

class seismicrna.core.data.MergedUnbiasDataset(datasets: Iterable[Dataset])

Bases: MergedDataset, UnbiasDataset, ABC

MergedDataset with attributes for correcting observer bias.

property min_mut_gap

Minimum gap between two mutations.

property quick_unbias

Use the quick heuristic for unbiasing.

property quick_unbias_thresh

Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

class seismicrna.core.data.MultistepDataset(data1: MutsDataset, data2: Dataset)

Bases: MutsDataset, ABC

Dataset made by integrating two datasets from different steps of the workflow.

property end3

3’ end of the section.

property end5

5’ end of the section.

get_batch(batch_num: int)

Get a specific batch of data.

abstract classmethod get_dataset1_load_func() LoadFunction

Function to load Dataset 1.

classmethod get_dataset1_report_file(dataset2_report_file: Path)

Given the report file for Dataset 2, determine the report file for Dataset 1.

classmethod get_dataset2_load_func()

Function to load Dataset 2.

abstract classmethod get_dataset2_type() type[Dataset]

Type of Dataset 2.

classmethod get_report_type()

Type of report.

classmethod load(dataset2_report_file: Path)

Load a dataset from a report file.

classmethod load_dataset1(dataset2_report_file: Path)

Load Dataset 1.

classmethod load_dataset2(dataset2_report_file: Path)

Load Dataset 2.

property num_batches

Number of batches.

property ref

Name of the reference.

property refseq

Sequence of the reference.

property sample

Name of the sample.

property sect

Name of the section.

property top

Top-level directory of the dataset.

class seismicrna.core.data.MutsDataset

Bases: Dataset, ABC

Dataset with explicit mutational data.

property reflen

Length of the reference sequence.

abstract property refseq: DNA

Sequence of the reference.

property section: Section

Section of the dataset.

class seismicrna.core.data.NarrowDataset

Bases: MutsDataset, ABC

MutsDataset with one section, in contrast to a WideDataset that unites one or more sections.

property section

Section of the dataset.

class seismicrna.core.data.TallDataset(sample: str, datasets: Iterable[Dataset])

Bases: MergedDataset, NarrowDataset, ABC

Dataset made by vertically pooling other datasets from one or more samples aligned to the same reference sequence.

property end3

3’ end of the section.

property end5

5’ end of the section.

get_batch(batch_num: int)

Get a specific batch of data.

classmethod load(report_file: Path)

Load a dataset from a report file.

property num_batches

Number of batches.

property nums_batches: list[int]

Number of batches in each dataset in the pool.

property sample

Name of the sample.

property samples: list[str]

Names of all samples in the pool.

property sect

Name of the section.

class seismicrna.core.data.TallMutsDataset(sample: str, datasets: Iterable[Dataset])

Bases: TallDataset, MergedMutsDataset, ABC

TallDataset with mutational data.

class seismicrna.core.data.UnbiasDataset

Bases: Dataset, ABC

Dataset with attributes for correcting observer bias.

abstract property min_mut_gap: int

Minimum gap between two mutations.

abstract property quick_unbias: bool

Use the quick heuristic for unbiasing.

abstract property quick_unbias_thresh: float

Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

class seismicrna.core.data.WideDataset(sect: str, clusts: dict | None, datasets: Iterable[Dataset])

Bases: MergedMutsDataset, ABC

Dataset made by horizontally joining other datasets from one or more sections of the same reference sequence.

property end3

3’ end of the section.

property end5

5’ end of the section.

get_batch(batch_num: int)

Get a specific batch of data.

classmethod load(report_file: Path)

Load a dataset from a report file.

property num_batches

Number of batches.

property sample

Name of the sample.

property sect

Name of the section.

property section

Section of the dataset.

property sects

Names of all joined sections.

seismicrna.core.data.load_datasets(input_path: Iterable[str | Path], load_func: LoadFunction)

Yield a Dataset from each report file in input_path.

Parameters:
  • input_path (Iterable[str | Path]) – Input paths to be searched recursively for report files.

  • load_func (LoadFunction) – Function to load the dataset from each report file.

class seismicrna.core.header.ClustHeader(*, max_order: int, min_order: int = 1, **kwargs)

Bases: Header

Header of order and cluster numbers.

classmethod clustered()

Whether the header has clusters.

property index

Index of the header.

classmethod levels()

Levels of the index.

property max_order

Maximum number of clusters (≥ 1) if clustered, else 0.

property min_order

Minimum number of clusters (≥ 1) if clustered, else 1.

class seismicrna.core.header.Header

Bases: ABC

Header for a table.

abstract classmethod clustered() bool

Whether the header has clusters.

property clusts

Order and cluster numbers of the header.

property index: Index

Index of the header.

iter_clust_indexes()

For each cluster in the header, yield an Index or MultiIndex of every column in the header that is part of the cluster.

classmethod level_keys()

Level keys of the index.

classmethod level_names()

Level names of the index.

abstract classmethod levels()

Levels of the index.

abstract property max_order: int

Maximum number of clusters (≥ 1) if clustered, else 0.

abstract property min_order: int

Minimum number of clusters (≥ 1) if clustered, else 1.

modified(**kwargs)

Return a new header with a possibly modified signature.

Parameters:

**kwargs – Keyword arguments for modifying the signature of the header. Each argument given here will be passed to make_header and override the attribute (if any) with the same name in this header’s signature. Attributes of this header’s signature that are not overriden will also be passed to make_header.

Returns:

New header with a possibly modified signature.

Return type:

Header

property names

Formatted name of each cluster.

classmethod num_levels()

Number of levels.

property orders

Index of order numbers.

select(**kwargs) Index

Select and return items from the header as an Index.

property signature

Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.

property size

Number of items in the Header.

class seismicrna.core.header.RelClustHeader(*, max_order: int, min_order: int = 1, **kwargs)

Bases: ClustHeader, RelHeader

Header of relationships with order and cluster numbers.

property index

Index of the header.

class seismicrna.core.header.RelHeader(*, rels: Iterable[str], **kwargs)

Bases: Header

Header of relationships.

classmethod clustered()

Whether the header has clusters.

property index

Index of the header.

classmethod levels()

Levels of the index.

property max_order

Maximum number of clusters (≥ 1) if clustered, else 0.

property min_order

Minimum number of clusters (≥ 1) if clustered, else 1.

property rels: ndarray

Relationships.

property signature

Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.

seismicrna.core.header.format_clust_name(order: int, clust: int, allow_zero: bool = False)

Format a pair of order and cluster numbers into a name.

Parameters:
  • order (int) – Order number

  • clust (int) – Cluster number

  • allow_zero (bool = False) – Allow order and cluster to be 0.

Returns:

Name specifying the order and cluster numbers, or “average” if the order number is 0.

Return type:

str

seismicrna.core.header.format_clust_names(clusts: Iterable[tuple[int, int]], allow_zero: bool = False, allow_duplicates: bool = False)

Format pairs of order and cluster numbers into a list of names.

Parameters:
  • clusts (Iterable[tuple[int, int]]) – Zero or more pairs of order and cluster numbers.

  • allow_zero (bool = False) – Allow order and cluster to be 0.

  • allow_duplicates (bool = False) – Allow order and cluster pairs to be duplicated.

Returns:

List of names of the pairs of order and cluster numbers.

Return type:

list[str]

Raises:

ValueError – If allow_duplicates is False and an order-cluster pair occurs more than once.

seismicrna.core.header.index_clusts(order: int)

Index of cluster numbers for one order.

Parameters:

order (int) – Number of clusters (≥ 0)

Returns:

Index of cluster numbers

Return type:

pd.Index

seismicrna.core.header.index_order_clusts(order: int)

List order and cluster numbers as a MultiIndex for one order.

Parameters:

order (int) – Number of clusters (≥ 0)

Returns:

Index wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.

Return type:

pd.MultiIndex

seismicrna.core.header.index_orders(max_order: int, min_order: int = 1)

Index of order numbers from min_order to max_order.

Parameters:
  • max_order (int) – Maximum number of clusters (≥ 0)

  • min_order (int = 1) – Minimum number of clusters (≥ 1)

Returns:

Index of order numbers

Return type:

pd.Index

seismicrna.core.header.index_orders_clusts(max_order: int, min_order: int = 1)

List order and cluster numbers as a MultiIndex for every order from min_order to max_order.

Parameters:
  • max_order (int) – Maximum number of clusters (≥ 0)

  • min_order (int = 1) – Minimum number of clusters (≥ 1)

Returns:

Index wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.

Return type:

pd.MultiIndex

seismicrna.core.header.list_clusts(order: int)

List all cluster numbers for one order.

Parameters:

order (int) – Number of clusters (≥ 0)

Returns:

List of cluster numbers.

Return type:

list[int]

seismicrna.core.header.list_order_clusts(order: int)

List order and cluster numbers as 2-tuples for one order.

Parameters:

order (int) – Number of clusters (≥ 0)

Returns:

List wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.

Return type:

list[tuple[int, int]]

seismicrna.core.header.list_orders(max_order: int, min_order: int = 1)

List order numbers from min_order to max_order.

Parameters:
  • max_order (int) – Maximum number of clusters (≥ 0)

  • min_order (int = 1) – Minimum number of clusters (≥ 1)

Returns:

List of numbers of clusters

Return type:

list[int]

seismicrna.core.header.list_orders_clusts(max_order: int, min_order: int = 1)

List order and cluster numbers as 2-tuples for every order from min_order to max_order.

Parameters:
  • max_order (int) – Maximum number of clusters (≥ 0)

  • min_order (int = 1) – Minimum number of clusters (≥ 1)

Returns:

List wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.

Return type:

list[tuple[int, int]]

seismicrna.core.header.make_header(*, rels: Iterable[str] = (), max_order: int = 0, min_order: int = 1)

Make a new Header of an appropriate type.

Parameters:
  • rels (Iterable[str]) – Relationships in the header.

  • max_order (int = 0) – Maximum number of clusters (≥ 1), or 0 if not clustered.

  • min_order (int = 1) – Minimum number of clusters (≥ 1), or 1 if not clustered.

Returns:

Header of the appropriate type.

Return type:

Header

seismicrna.core.header.parse_header(index: Index | MultiIndex)

Parse an Index into a Header of an appropriate type.

Parameters:

index (pd.Index | pd.MultiIndex) – Index to parse.

Returns:

New Header whose index is index.

Return type:

Header

seismicrna.core.header.validate_order_clust(order: int, clust: int, allow_zero: bool = False)

Validate a pair of order and cluster numbers.

Parameters:
  • order (int) – Order number

  • clust (int) – Cluster number

  • allow_zero (bool = False) – Allow order and cluster to be 0.

Returns:

If the order and cluster numbers form a valid pair.

Return type:

None

Raises:
  • TypeError – If order or cluster is not an integer.

  • ValueError – If the order and cluster numbers do not form a valid pair.

Core – Logging Module

Purpose

Central manager of logging.

class seismicrna.core.logs.AnsiCode

Bases: object

Format text with ANSI codes.

BLUE = 94
BOLD = 1
CODES = (0, 1, 4, 91, 92, 93, 94, 95, 96)
CYAN = 96
END = 0
GREEN = 92
PURPLE = 95
RED = 91
ULINE = 4
YELLOW = 93
classmethod end()

Convenience function to end formatting.

classmethod fmt(code: int)

Format one color code into text.

classmethod wrap(text: str, *codes: int, end: bool = True)

Wrap text with ANSI color code(s).

class seismicrna.core.logs.ColorFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)

Bases: Formatter

ansi_codes = {10: (94,), 20: (96,), 30: (93,), 40: (91,), 50: (95, 1)}
format(record: LogRecord) str

Log the message in color by adding an ANSI color escape code to the beginning and a color stopping code to the end.

seismicrna.core.logs.exc_info()

Whether to log exception information.

seismicrna.core.logs.get_config()

Get the configuration parameters of a logger.

seismicrna.core.logs.get_top_logger()

Return the top-level logger.

seismicrna.core.logs.get_verbosity(verbose: int = 0, quiet: int = 0)

Get the logging level based on the verbose and quiet arguments.

Parameters:
  • verbose (int [0, 2]) – 0 (): Log only warnings and errors 1 (-v): Also log status updates 2 (-vv): Also log detailed information (useful for debugging)

  • quiet (int [0, 2]) – 0 (): Suppress only status updates and detailed information 1 (-q): Also suppress warnings 2 (-qq): Also suppress non-critical error messages (discouraged)

  • verbosity (Giving both verbose and quiet flags causes the)

  • verbose=0 (to default to)

  • quiet=0.

seismicrna.core.logs.log_exceptions(logging_method: Callable, default: Callable | None)

If any exception occurs, catch it and return an empty list.

seismicrna.core.logs.set_config(verbose: int, quiet: int, log_file: str | None = None, log_color: bool = True)

Configure the main logger with handlers and verbosity.

Path Core Module


Most of the steps in SEISMIC-RNA produce files that other steps use. For example, the ‘align’ step writes alignment map (BAM) files, from which the ‘relate’ step writes relation vector files, which both the ‘mask’ and ‘table’ steps use.

Steps that pass files to each other must agree on

  • the path to the file, so that the second step can find the file

  • the meaning of each part of the path, so that the second step can parse information contained in the path

Although these path conventions could be written separately in each subpackage or module, this strategy is not ideal for several reasons:

  • It would risk inconsistencies among the modules, causing bugs.

  • Changing the conventions would require modifying every module, which would be not only tedious but also risky for the first reason.

  • Defining all the conventions in one place would reduce the size of the code base, improving readability, maintainability, and distribution.

This module defines all file path conventions for all other modules.

class seismicrna.core.path.Field(dtype: type[str | int | Path], options: Iterable = (), is_ext: bool = False)

Bases: object

build(val: Any)

Validate a value and return it as a string.

parse(text: str) Any

Parse a value from a string, validate it, and return it.

validate(val: Any)
class seismicrna.core.path.Path(*seg_types: Segment)

Bases: object

build(**fields: Any)

Return a pathlib.Path instance by assembling the given fields into a full path.

parse(path: str | Path)

Return the field names and values from a given path.

exception seismicrna.core.path.PathError

Bases: Exception

Any error involving a path

exception seismicrna.core.path.PathTypeError

Bases: PathError, TypeError

Use of the wrong type of path or segment

exception seismicrna.core.path.PathValueError

Bases: PathError, ValueError

Invalid value of a path segment field

class seismicrna.core.path.Segment(segment_name: str, field_types: dict[str, Field], *, order: int = 0, frmt: str | None = None)

Bases: object

build(**vals: Any)
property ext_type

Type of the segment’s file extension, or None if it has no file extension.

property exts: tuple[str, ...]

Valid file extensions of the segment.

match_longest_ext(text: str)

Find the longest extension of the given text that matches a valid file extension. If none match, return None.

parse(text: str)
seismicrna.core.path.build(*segment_types: Segment, **field_values: Any)

Return a pathlib.Path from the given segment types and field values.

seismicrna.core.path.builddir(*segment_types: Segment, **field_values: Any)

Build the path and create it on the file system as a directory if it does not already exist.

seismicrna.core.path.buildpar(*segment_types: Segment, **field_values: Any)

Build a path and create its parent directory if it does not already exist.

seismicrna.core.path.cast_path(input_path: Path, input_segments: Sequence[Segment], output_segments: Sequence[Segment], **override: Any)

Cast input_path made of input_segments to a new path made of output_segments.

Parameters:
  • input_path (pathlib.Path) – Input path from which to take the path fields.

  • input_segments (Sequence[Segment]) – Path segments to use to determine the fields in input_path.

  • output_segments (Sequence[Segment]) – Path segments to use to determine the fields in output_path.

  • **override (Any) – Override and supplement the fields in input_path.

Returns:

Path comprising output_segments made of fields in input_path (as determined by input_segments).

Return type:

pathlib.Path

seismicrna.core.path.create_path_type(*segment_types: Segment)

Create and cache a Path instance from the segment types.

seismicrna.core.path.deduplicate(paths: Iterable[str | Path])

Yield the non-redundant paths.

seismicrna.core.path.deduplicated(func: Callable)

Decorate a Path generator to yield non-redundant paths.

seismicrna.core.path.fill_whitespace(path: str | Path, fill: str = '_')

Replace all whitespace in path with fill.

seismicrna.core.path.find_files(path: str | Path, segments: Sequence[Segment])

Yield all files that match a sequence of path segments. The behavior depends on what path is:

  • If it is a file, then yield path if it matches the segments; otherwise, yield nothing.

  • If it is a directory, then search it recursively and yield every matching file in the directory and its subdirectories.

Parameters:
  • path (str | pathlib.Path) – Path of a file to check or a directory to search recursively.

  • segments (Sequence[Segment]) – Sequence(s) of Path segments to check if each file matches.

Returns:

Paths of files matching the segments.

Return type:

Generator[Path, Any, None]

seismicrna.core.path.find_files_chain(paths: Iterable[str | Path], segments: Sequence[Segment])

Yield from find_files called on every path in paths.

seismicrna.core.path.get_fields_in_seg_types(*segment_types: Segment) dict[str, Field]

Get all fields among the given segment types.

seismicrna.core.path.parse(path: str | Path, /, *segment_types: Segment)

Return the fields of a path based on the segment types.

seismicrna.core.path.parse_top_separate(path: str | Path, /, *segment_types: Segment)

Return the fields of a path, and the top field separately.

seismicrna.core.path.path_matches(path: str | Path, segments: Sequence[Segment])

Check if a path matches a sequence of path segments.

Parameters:
  • path (str | pathlib.Path) – Path of the file/directory.

  • segments (Sequence[Segment]) – Sequence of path segments to check if the file matches.

Returns:

Whether the path matches any given sequence of path segments.

Return type:

bool

seismicrna.core.path.randdir(parent: str | Path | None = None, prefix: str = '', suffix: str = '', length: int = 8, max_tries: int = 1000)

Build a path of a new directory that does not exist and create it on the file system.

seismicrna.core.path.randname(length: int)

Generate a random name with valid path characters.

seismicrna.core.path.sanitize(path: str | Path, strict: bool = False)

Sanitize a path-like object by ensuring it is an absolute path, eliminating symbolic links and redundant path separators/references, and returning a Path object.

Parameters:
  • path (str | pathlib.Path) – Path to sanitize.

  • strict (bool = False) – Require the path to exist and contain no symbolic link loops.

Returns:

Absolute, normalized, symlink-free path.

Return type:

pathlib.Path

seismicrna.core.path.transpath(to_dir: str | Path, from_dir: str | Path, path: str | Path, strict: bool = False)

Return the path that would be produced by moving path from from_dir to to_dir (but do not actually move the path on the file system). This function does not require that any of the given paths exist, unless strict is True.

Parameters:
  • to_dir (str | pathlib.Path) – Directory to which to move path.

  • from_dir (str | pathlib.Path) – Directory from which to move path; must contain path but not necessarily be the direct parent directory of path.

  • path (str | pathlib.Path) – Path to move; can be a file or directory.

  • strict (bool = False) – Require that all paths exist and contain no symbolic link loops.

Returns:

Hypothetical path after moving path from indir to outdir.

Return type:

pathlib.Path

seismicrna.core.path.transpaths(to_dir: str | Path, *paths: str | Path, strict: bool = False)

Return all paths that would be produced by moving all paths in paths from their longest common sub-path to to_dir (but do not actually move the paths on the file system). This function does not require that any of the given paths exist, unless strict is True.

Parameters:
  • to_dir (str | pathlib.Path) – Directory to which to move every path in path.

  • *paths (str | pathlib.Path) – Paths to move; can be files or directories. A common sub-path must exist among all of these paths.

  • strict (bool = False) – Require that all paths exist and contain no symbolic link loops.

Returns:

Hypothetical paths after moving all paths in path to outdir.

Return type:

tuple[pathlib.Path, ]

seismicrna.core.path.validate_int(num: int)
seismicrna.core.path.validate_str(txt: str)
seismicrna.core.path.validate_top(top: Path)
class seismicrna.core.report.BatchedRefseqReport(**kwargs: Any | Callable[[Report], Any])

Bases: BatchedReport, RefseqReport, ABC

Convenience class used as a base for several Report classes.

class seismicrna.core.report.BatchedReport(**kwargs: Any | Callable[[Report], Any])

Bases: Report, ABC

Report with a number of data batches (one file per batch).

classmethod batch_types() dict[str, type[ReadBatchIO]]

Type(s) of batch(es) for the report, keyed by name.

abstract classmethod fields()

All fields of the report.

classmethod get_batch_type(btype: str | None = None) type[ReadBatchIO]

Return a valid type of batch based on its name.

class seismicrna.core.report.Field(key: str, title: str, dtype: type, default: Any | None = None, *, iconv: Callable[[Any], Any] | None = None, oconv: Callable[[Any], Any] | None = None)

Bases: object

Field of a report.

default
dtype
iconv
key
oconv
title
class seismicrna.core.report.OptionField(option: Option, **kwargs)

Bases: Field

Field based on a command line option.

default
dtype
iconv
key
oconv
title
class seismicrna.core.report.RefseqReport(**kwargs: Any | Callable[[Report], Any])

Bases: Report, RefIO, ABC

Report associated with a reference sequence file.

abstract classmethod fields()

All fields of the report.

class seismicrna.core.report.Report(**kwargs: Any | Callable[[Report], Any])

Bases: FileIO, ABC

Abstract base class for a report from a step.

__setattr__(key: str, value: Any)

Validate the attribute name and value before setting it.

classmethod field_keys()

Keys of all fields of the report.

abstract classmethod fields()

All fields of the report.

classmethod from_dict(odata: dict[str, Any])

Convert a dict of raw values (keyed by the titles of their fields) into a dict of encoded values (keyed by the keys of their fields), from which a new Report is instantiated.

get_field(field: Field, missing_ok: bool = False)

Return the value of a field of the report using the field instance directly, not its key.

classmethod load(file: Path) Report

Load an object from a file.

save(top: Path, force: bool = False)

Save the report to a JSON file.

to_dict()

Return a dict of raw values of the fields, keyed by the titles of their fields.

seismicrna.core.report.calc_dt_minutes(began: datetime, ended: datetime)

Calculate the time taken in minutes.

seismicrna.core.report.calc_taken(report: Report)

Calculate the time taken in minutes.

seismicrna.core.report.default_key(key: str)

Get the default value of a field by its key.

seismicrna.core.report.field_keys() dict[str, Field]
seismicrna.core.report.field_titles() dict[str, Field]
seismicrna.core.report.fields()
seismicrna.core.report.get_oconv_array_float(precision: int = 3)
seismicrna.core.report.get_oconv_dict_float(precision: int = 3)
seismicrna.core.report.get_oconv_dict_list_float(precision: int = 3)
seismicrna.core.report.get_oconv_float(precision: int = 3)
seismicrna.core.report.get_oconv_list_float(precision: int = 3)
seismicrna.core.report.iconv_array_int(nums: list[int])
seismicrna.core.report.iconv_datetime(text: str)
seismicrna.core.report.iconv_dict_str_dict_int_dict_int_int(mapping: dict[Any, dict[Any, dict[Any, Any]]]) dict[str, dict[int, dict[int, int]]]
seismicrna.core.report.iconv_dict_str_int(mapping: dict[Any, Any]) dict[str, int]
seismicrna.core.report.iconv_int_keys(mapping: dict[Any, Any])
seismicrna.core.report.key_to_title(key: str)

Map a field’s key to its title.

seismicrna.core.report.lookup_key(key: str)

Get a field by its key.

seismicrna.core.report.lookup_title(title: str)

Get a field by its title.

seismicrna.core.report.oconv_array_int(nums: ndarray)
seismicrna.core.report.oconv_datetime(dtime: datetime)
seismicrna.core.run.run_func(logging_method: ~typing.Callable, default: ~typing.Callable | None = <class 'list'>, with_tmp: bool = False, pass_keep_tmp: bool = False, *args, **kwargs)

Decorator for a run function.

seismicrna.core.stats.calc_beta_mv(alpha: float, beta: float)

Find the mean and variance of a beta distribution from its alpha and beta parameters.

Parameters:
  • alpha (float) – Alpha parameter of the beta distribution.

  • beta (float) – Beta parameter of the beta distribution.

Returns:

Mean and variance of the beta distribution.

Return type:

tuple[float, float]

seismicrna.core.stats.calc_beta_params(mean: float, variance: float)

Find the alpha and beta parameters of a beta distribution from its mean and variance.

Parameters:
  • mean (float) – Mean of the beta distribution.

  • variance (float) – Variance of the beta distribution.

Returns:

Alpha and beta parameters of the beta distribution.

Return type:

tuple[float, float]

seismicrna.core.stats.calc_dirichlet_mv(alpha: ndarray)

Find the means and variances of a Dirichlet distribution from its concentration parameters.

Parameters:

alpha (np.ndarray) – Concentration parameters of the Dirichlet distribution.

Returns:

Means and variances of the Dirichlet distribution.

Return type:

tuple[np.ndarray, np.ndarray]

seismicrna.core.stats.calc_dirichlet_params(mean: ndarray, variance: ndarray)

Find the concentration parameters of a Dirichlet distribution from its mean and variance.

Parameters:
  • mean (np.ndarray) – Means.

  • variance (np.ndarray) – Variances.

Returns:

Concentration parameters.

Return type:

np.ndarray

class seismicrna.core.task.Task(func: Callable)

Bases: object

Wrap a parallelizable task in a try-except block so that if it fails, it just returns None rather than crashing the other tasks being run in parallel.

__call__(*args, **kwargs)

Call the task’s function in a try-except block, return the result if it succeeds, and return None otherwise.

seismicrna.core.task.as_list_of_tuples(args: Iterable[Any])

Given an iterable of arguments, return a list of 1-item tuples, each containing one of the given arguments. This function is useful for creating a list of tuples to pass to the args parameter of dispatch.

seismicrna.core.task.dispatch(funcs: list[Callable] | Callable, max_procs: int, parallel: bool, *, hybrid: bool = False, pass_n_procs: bool = True, args: list[tuple] | tuple = (), kwargs: dict[str, Any] | None = None)

Run one or more tasks in series or in parallel, depending on the number of tasks, the maximum number of processes, and whether tasks are allowed to be run in parallel.

Parameters:
  • funcs (list[Callable] | Callable) – The function(s) to run. Can be a list of functions or a single function that is not in a list. If a single function, then if args is a tuple, it is called once with that tuple as its positional arguments; and if args is a list of tuples, it is called for each tuple of positional arguments in args.

  • max_procs (int) – See docstring for get_num_parallel.

  • parallel (bool) – See docstring for get_num_parallel.

  • hybrid (bool = False) – See docstring for get_num_parallel.

  • pass_n_procs (bool = True) – Whether to pass the number of processes to the function as the keyword argument n_procs.

  • args (list[tuple] | tuple = ()) – Positional arguments to pass to each function in funcs. Can be a list of tuples of positional arguments or a single tuple that is not in a list. If a single tuple, then each function receives args as positional arguments. If a list, then args must be the same length as funcs; each function funcs[i] receives args[i] as positional arguments.

  • kwargs (dict[str, Any] | None = None) – Keyword arguments to pass to every function call.

Returns:

List of the return value of each run.

Return type:

list

seismicrna.core.task.fmt_func_args(func: Callable, *args, **kwargs)

Format the name and arguments of a function as a string.

seismicrna.core.task.get_num_parallel(n_tasks: int, max_procs: int, parallel: bool, hybrid: bool = False) tuple[int, int]

Determine how to parallelize the tasks.

Parameters:
  • n_tasks (int) – Number of tasks to parallelize. Must be ≥ 1.

  • max_procs (int) – Maximum number of processes to run at one time. Must be ≥ 1.

  • parallel (bool) – Whether multiple tasks may be run in parallel. If False, then the number of tasks to run in parallel is set to 1, but the number of processes to run for each task may be > 1.

  • hybrid (bool = False) – Whether to allow both multiple tasks to run in parallel and, at the same, each task to run multiple processes in parallel.

Returns:

  • Number of tasks to run in parallel. Always ≥ 1.

  • Number of processes to run for each task. Always ≥ 1.

Return type:

tuple[int, int]

seismicrna.core.tmp.get_release_working_dirs(tmp_dir: Path)
seismicrna.core.tmp.release_to_out(out_dir: Path, release_dir: Path, initial_path: Path)

Move temporary path(s) to the output directory.

seismicrna.core.tmp.with_tmp_dir(pass_keep_tmp: bool)

Make a temporary directory, then delete it after the run.

seismicrna.core.types.fit_uint_size(value: int)

Smallest number of bytes that will fit the value.

seismicrna.core.types.fit_uint_type(value: int)

Smallest unsigned int type that will fit the value.

seismicrna.core.types.get_byte_dtype(nchars: int)

NumPy byte type with the given number of characters.

seismicrna.core.types.get_dtype(code: str, size: int)

NumPy type with the given code and size.

seismicrna.core.types.get_max_uint(uint_type: type)

Maximum value of a NumPy unsigned integer type.

seismicrna.core.types.get_max_value(nbytes: int)

Get the maximum value of an unsigned integer of N bytes.

seismicrna.core.types.get_uint_dtype(nbytes: int)

NumPy uint data type with the given number of bytes.

seismicrna.core.types.get_uint_size(uint_type: type)

Size of a NumPy uint type in bytes.

seismicrna.core.types.get_uint_type(nbytes: int)

NumPy uint type with the given number of bytes.

Version information for SEISMIC-RNA

seismicrna.core.version.format_version(major: int = 0, minor: int = 19, patch: int = 0, prtag: str = '')
seismicrna.core.version.parse_version(version: str = '0.19.0')

Major and minor versions, patch, and pre-release tag.

seismicrna.core.write.need_write(file: Path, force: bool = False, warn: bool = True)

Determine whether a file must be written.

Parameters:
  • file (Path) – File for which to check the need for writing.

  • force (bool = False) – Force the file to be written, even if it already exists.

  • warn (bool = True) – If the file does not need to be written, then log a warning.

Returns:

Whether the file must be written.

Return type:

bool

seismicrna.core.write.write_mode(force: bool = False, binary: bool = False)

Get the mode in which to open a file for writing.

Parameters:
  • force (bool = False) – Force the file to be written, truncating the file if it exists. If False and the file exists, a FileExistsError will be raised.

  • binary (bool = False) – Write the file in binary mode instead of text mode.

Returns:

The mode argument for the builtin function open().

Return type:

str