seismicrna.core package
Subpackages
- seismicrna.core.arg package
- seismicrna.core.batch package
- Subpackages
- Submodules
accum_fits()accum_per_pos()accumulate()calc_count_per_pos()calc_count_per_read()calc_coverage()calc_reads_per_pos()calc_rels_per_pos()calc_rels_per_read()count_end_coords()EndCoordscount_reads_segments()find_contiguous_reads()find_read_end3s()find_read_end5s()mask_segment_ends()match_reads_segments()merge_read_ends()merge_segment_ends()sanitize_segment_ends()simulate_segment_ends()sort_segment_ends()count_base_types()iter_base_types()iter_windows()list_batch_nums()MutsBatchPartialMutsBatchSectionMutsBatchSectionMutsBatch.count_per_pos()SectionMutsBatch.count_per_read()SectionMutsBatch.cover_per_posSectionMutsBatch.cover_per_readSectionMutsBatch.iter_reads()SectionMutsBatch.iter_windows()SectionMutsBatch.matrixSectionMutsBatch.pos_indexSectionMutsBatch.reads_noclose_muts()SectionMutsBatch.reads_per_pos()SectionMutsBatch.rels_per_posSectionMutsBatch.rels_per_read
calc_muts_matrix()sanitize_muts()simulate_muts()AllReadBatchPartialReadBatchReadBatch
- seismicrna.core.extern package
- seismicrna.core.io package
- seismicrna.core.mu package
- Subpackages
- Submodules
calc_coeff_determ()calc_nrmsd()calc_pearson()calc_rmsd()calc_spearman()compare_windows()get_comp_func()get_comp_name()count_pos()counts_pos()counts_pos_consensus()auto_reframe()reframe()reframe_like()calc_gini()calc_signal_noise()any_nan()auto_remove_nan()auto_removes_nan()no_nan()remove_nan()removes_nan()calc_quantile()calc_ranks()calc_rms()normalize()standardize()winsorize()calc_p_clust()calc_p_clust_given_noclose()calc_p_ends_given_noclose()calc_p_ends_observed()calc_p_noclose()calc_p_noclose_given_ends()calc_params()calc_params_observed()calc_rectangluar_sum()triu_log()
- seismicrna.core.ngs package
- seismicrna.core.rel package
- Subpackages
- Submodules
HalfRelPatternHalfRelPattern.aHalfRelPattern.allc()HalfRelPattern.as_fancy()HalfRelPattern.as_match()HalfRelPattern.as_plain()HalfRelPattern.cHalfRelPattern.codesHalfRelPattern.compile()HalfRelPattern.decompile()HalfRelPattern.fits()HalfRelPattern.fmt_fancyHalfRelPattern.fmt_plainHalfRelPattern.from_counts()HalfRelPattern.from_report_format()HalfRelPattern.gHalfRelPattern.intersect()HalfRelPattern.mut_bitsHalfRelPattern.muts()HalfRelPattern.none()HalfRelPattern.patternsHalfRelPattern.ptrn_fancyHalfRelPattern.ptrn_plainHalfRelPattern.read_basesHalfRelPattern.ref_basesHalfRelPattern.refs()HalfRelPattern.tHalfRelPattern.to_report_format()
RelPattern
- seismicrna.core.rna package
- Subpackages
- Submodules
RNASectionrun_ct_to_db()run_db_to_ct()parse_ct()format_db_structure()parse_db()parse_db_strings()parse_db_structure()ct_to_db()db_to_ct()find_ct_section()from_ct()from_db()renumber_ct()to_ct()to_db()dict_to_pairs()dict_to_table()find_enclosing_pairs()find_root_pairs()map_nested()pairs_to_dict()pairs_to_table()renumber_pairs()table_to_dict()table_to_pairs()RNAProfilecompute_auc()compute_auc_roc()compute_roc_curve()compute_rolling_auc()RNAStateRNAStructureRna2dPartRna2dStemRna2dStemLoopRnaJunction
- seismicrna.core.seq package
- Subpackages
- Submodules
extract_fasta_seqname()format_fasta_name_line()format_fasta_record()format_fasta_seq_lines()get_fasta_seq()parse_fasta()valid_fasta_seqname()write_fasta()RefSeqsRefSectionsSectionSection.MASK_GUSection.MASK_LISTSection.MASK_POLYASection.add_mask()Section.coordSection.copy()Section.get_mask()Section.hyphenSection.lengthSection.mask_gu()Section.mask_list()Section.mask_namesSection.mask_polya()Section.masked_boolSection.masked_intSection.masked_zeroSection.rangeSection.range_intSection.range_oneSection.ref_sectSection.remove_mask()Section.renumber_from()Section.sizeSection.subsection()Section.to_dict()Section.unmaskedSection.unmasked_boolSection.unmasked_intSection.unmasked_zero
SectionFinderSectionTupleget_coords_by_ref()get_sect_coords_primers()get_shared_index()hyphenate_ends()index_to_pos()index_to_seq()intersect()iter_windows()seq_pos_to_index()unite()verify_index_names()window_to_margins()CompressedSeqDNARNAXNAXNA.__add__()XNA.__bool__()XNA.__contains__()XNA.__eq__()XNA.__getitem__()XNA.__hash__()XNA.__mul__()XNA.__repr__()XNA.alph()XNA.arrayXNA.compress()XNA.four()XNA.get_alphaset()XNA.get_comp()XNA.get_comptrans()XNA.get_nonalphaset()XNA.get_other_iupac()XNA.get_pictrans()XNA.kmers()XNA.pict()XNA.pictoXNA.random()XNA.rcXNA.t_or_u()
decompress()expand_degenerate_seq()
- seismicrna.core.tests package
- Submodules
TestCalcInverseTestCalcInverse.test_calc_inverse()TestCalcInverse.test_calc_inverse_fill_fwd()TestCalcInverse.test_calc_inverse_fill_fwd_max()TestCalcInverse.test_calc_inverse_fill_fwd_max_default()TestCalcInverse.test_calc_inverse_fill_rev()TestCalcInverse.test_calc_inverse_fill_rev_max()TestCalcInverse.test_calc_inverse_fill_rev_max_default()TestCalcInverse.test_calc_inverse_max()TestCalcInverse.test_empty()TestCalcInverse.test_empty_max()TestCalcInverse.test_is_inverse()TestCalcInverse.test_negative()TestCalcInverse.test_repeated()
TestEnsureSameLengthTestFindDimsTestFindDims.test_0d()TestFindDims.test_0d_1dim()TestFindDims.test_0d_none()TestFindDims.test_0d_nonzero()TestFindDims.test_0d_nonzero_extra()TestFindDims.test_1d()TestFindDims.test_1d_0dim_none()TestFindDims.test_1d_1d_crossed()TestFindDims.test_1d_1d_separate()TestFindDims.test_1d_1dim_none()TestFindDims.test_1d_2d_congruent()TestFindDims.test_1d_2d_crossed()TestFindDims.test_1d_2dim()TestFindDims.test_1d_2dim_none()TestFindDims.test_1d_nonzero()TestFindDims.test_2d()TestFindDims.test_2d_1dim_none()TestFindDims.test_2d_2d_congruent()TestFindDims.test_2d_2d_crossed()TestFindDims.test_2d_nonzero()TestFindDims.test_empty()TestFindDims.test_none_2d()
TestFindTrueDistsTestGetLengthTestLocateElementsTestTriangularTestClustHeaderTestClustHeader.test_clustered()TestClustHeader.test_clusts()TestClustHeader.test_index()TestClustHeader.test_iter_clust_indexes()TestClustHeader.test_level_keys()TestClustHeader.test_level_names()TestClustHeader.test_levels()TestClustHeader.test_max_order()TestClustHeader.test_min_order()TestClustHeader.test_modified_max_order()TestClustHeader.test_modified_min_order()TestClustHeader.test_modified_none()TestClustHeader.test_modified_rels()TestClustHeader.test_names()TestClustHeader.test_num_levels()TestClustHeader.test_orders()TestClustHeader.test_select_clust()TestClustHeader.test_select_clusts()TestClustHeader.test_select_extra()TestClustHeader.test_select_invalid_clust()TestClustHeader.test_select_invalid_order()TestClustHeader.test_select_none()TestClustHeader.test_select_order()TestClustHeader.test_select_order_clust_empty()TestClustHeader.test_select_order_clust_exist()TestClustHeader.test_select_orders()TestClustHeader.test_select_orders_clusts_exist()TestClustHeader.test_signature()
TestConstantsTestEqualHeadersTestFormatClustNameTestFormatClustNamesTestHeaderTestIndexClustsTestIndexOrderClustsTestIndexOrdersTestIndexOrdersClustsTestListClustsTestListOrderClustsTestListOrdersTestListOrders.test_negative_none()TestListOrders.test_negative_positive()TestListOrders.test_negative_zero()TestListOrders.test_positive_none()TestListOrders.test_positive_positive()TestListOrders.test_positive_zero()TestListOrders.test_zero_none()TestListOrders.test_zero_one()TestListOrders.test_zero_two()TestListOrders.test_zero_zero()
TestListOrdersClustsTestMakeHeaderTestParseHeaderTestParseHeader.test_clust()TestParseHeader.test_extra_index_names()TestParseHeader.test_extra_values()TestParseHeader.test_invalid_numeric()TestParseHeader.test_missing_index_names()TestParseHeader.test_missing_values()TestParseHeader.test_none()TestParseHeader.test_nonnumeric()TestParseHeader.test_rel_index()TestParseHeader.test_rel_index_invalid_name()TestParseHeader.test_rel_index_repeated()TestParseHeader.test_rel_index_valid_name()TestParseHeader.test_rel_multiindex()TestParseHeader.test_relclust()
TestRelClustHeaderTestRelClustHeader.test_clustered()TestRelClustHeader.test_index()TestRelClustHeader.test_iter_clust_indexes()TestRelClustHeader.test_level_keys()TestRelClustHeader.test_level_names()TestRelClustHeader.test_levels()TestRelClustHeader.test_modified_all()TestRelClustHeader.test_modified_empty_rels()TestRelClustHeader.test_modified_max_order()TestRelClustHeader.test_modified_max_order_0()TestRelClustHeader.test_modified_none()TestRelClustHeader.test_modified_nullified()TestRelClustHeader.test_modified_rels()TestRelClustHeader.test_num_levels()TestRelClustHeader.test_select_clust()TestRelClustHeader.test_select_extra()TestRelClustHeader.test_select_extra_emptystr()TestRelClustHeader.test_select_extra_none()TestRelClustHeader.test_select_extra_zero()TestRelClustHeader.test_select_invalid_clust()TestRelClustHeader.test_select_invalid_order()TestRelClustHeader.test_select_invalid_rel()TestRelClustHeader.test_select_none()TestRelClustHeader.test_select_order()TestRelClustHeader.test_select_order_clust_empty()TestRelClustHeader.test_select_order_clust_exist()TestRelClustHeader.test_select_rel()TestRelClustHeader.test_signature()
TestRelHeaderTestRelHeader.test_clustered()TestRelHeader.test_clusts()TestRelHeader.test_index()TestRelHeader.test_iter_clust_indexes()TestRelHeader.test_level_keys()TestRelHeader.test_level_names()TestRelHeader.test_levels()TestRelHeader.test_max_order()TestRelHeader.test_min_order()TestRelHeader.test_modified_empty_rels()TestRelHeader.test_modified_max_order()TestRelHeader.test_modified_min_order()TestRelHeader.test_modified_none()TestRelHeader.test_modified_rels()TestRelHeader.test_names()TestRelHeader.test_num_levels()TestRelHeader.test_orders()TestRelHeader.test_rels_duplicated()TestRelHeader.test_rels_empty()TestRelHeader.test_rels_normal()TestRelHeader.test_select_extra()TestRelHeader.test_select_extra_zero()TestRelHeader.test_select_invalid()TestRelHeader.test_select_none()TestRelHeader.test_select_one_rels()TestRelHeader.test_select_rel()TestRelHeader.test_select_two_rels()TestRelHeader.test_signature()TestRelHeader.test_size()
TestValidateOrderClustTestValidateOrderClust.test_float_clust()TestValidateOrderClust.test_float_order()TestValidateOrderClust.test_negative_zero_allowed()TestValidateOrderClust.test_one_zero_allowed()TestValidateOrderClust.test_one_zero_unallowed()TestValidateOrderClust.test_positive_positive()TestValidateOrderClust.test_zero_negative_allowed()TestValidateOrderClust.test_zero_one_allowed()TestValidateOrderClust.test_zero_one_unallowed()TestValidateOrderClust.test_zero_zero_allowed()TestValidateOrderClust.test_zero_zero_unallowed()
TestCalcBetaMVTestCalcBetaParamsTestCalcDirichletMVTestCalcDirichletParamsrand_dirichlet_alpha()TestFormatVersionTestParseVersionTestParseVersion.test_invalid_1()TestParseVersion.test_invalid_2()TestParseVersion.test_invalid_3()TestParseVersion.test_invalid_4()TestParseVersion.test_invalid_5()TestParseVersion.test_parse_default()TestParseVersion.test_parse_notag()TestParseVersion.test_parse_prtag_letter()TestParseVersion.test_parse_prtag_letters()TestParseVersion.test_parse_prtag_letters_numbers()
- Submodules
Submodules
- seismicrna.core.array.calc_inverse(target: ndarray, require: int = -1, fill: bool = False, fill_rev: bool = False, fill_default: int | None = None, verify: bool = True, what: str = 'array')
Calculate the inverse of target, such that if element i of target has value x, then element x of the inverse has value i.
>>> list(calc_inverse(np.array([3, 2, 7, 5, 1]))) [-1, 4, 1, 0, -1, 3, -1, 2] >>> list(calc_inverse(np.arange(5))) [0, 1, 2, 3, 4]
- Parameters:
target (
np.ndarray) – Target values; must be a 1-dimensional array of non-negative integers with no duplicate values.require (
int = -1) – Require the inverse to contain all indexes up to and including require (i.e. that its length is at least require + 1); ignored if require is -1; must be ≥ -1.fill (
bool = False) – Fill missing indexes (that do not appear in target).fill_rev (
bool = False) – Fill missing indexes in reverse order instead of forward order; only used if fill is True.fill_default (
int | None = None) – Value with which to fill before the first non-missing value has been encountered; if fill_rev is True, defaults to the length of target, otherwise to -1.verify (
bool = True) – Verify that all target values are unique, non-negative integers. If this is incorrect, then if verify is True, then ValueError will be raised; and if False, then the results of this function will be incorrect. Always set to True unless you have already verified that target is unique, non-negative integers.what (str =
"array") – What to name the array (only used for error messages).
- Returns:
Inverse of target.
- Return type:
np.ndarray
- seismicrna.core.array.check_naturals(values: ndarray, what: str = 'values')
Raise ValueError if the values are not monotonically increasing natural numbers.
- seismicrna.core.array.ensure_order(array1: ndarray, array2: ndarray, what1: str = 'array1', what2: str = 'array2', gt_eq: bool = False)
Ensure that array1 is ≤ or ≥ array2, element-wise.
- Parameters:
array1 (
np.ndarray) – Array 1 (same length as array2).array2 (
np.ndarray) – Array 2 (same length as array1).what1 (str =
"array1") – What array1 contains (only used for error messages).what2 (str =
"array2") – What array2 contains (only used for error messages).gt_eq (
bool = False) – Ensure array1 ≥ array2 if True, otherwise array1 ≤ array2.
- Returns:
Shared length of array1 and array2.
- Return type:
- seismicrna.core.array.ensure_same_length(arr1: ndarray, arr2: ndarray, what1: str = 'array1', what2: str = 'array2')
- seismicrna.core.array.find_dims(dims: Sequence[Sequence[str | None]], arrays: Sequence[ndarray], names: Sequence[str] | None = None, nonzero: Iterable[str] | bool = False)
Check the dimensions of the arrays.
- seismicrna.core.array.find_true_dists(booleans: ndarray)
Find the distance to each True element in a boolean array.
- seismicrna.core.array.locate_elements(collection: ndarray, *elements: ndarray, what: str = 'collection', verify: bool = True)
Find the index at which each element of elements occurs in collection.
>>> list(locate_elements(np.array([4, 1, 2, 7, 5, 3]), np.array([5, 2, 5]))) [4, 2, 4]
- Parameters:
collection (
np.ndarray) – Collection in which to find each element in elements; must be a 1-dimensional array of non-negative integers with no duplicate values.*elements (
np.ndarray) – Elements to find; must be a 1-dimensional array that is a subset of collection, although duplicate values are permitted.what (str =
"collection") – What to name the collection (only used for error messages).verify (
bool = True) – Verify that all values in collection are unique, non-negative integers and that all items in elements are in collections.
- Returns:
Index of each element of elements in collections.
- Return type:
np.ndarray
- seismicrna.core.array.sanitize_values(values: Iterable[int], lower_limit: int, upper_limit: int, whats: str = 'values')
Validate and sort values, and return them as an array.
- seismicrna.core.array.stochastic_round(values: ndarray)
Round values to integers stochastically, so that the probability of rounding up equals the mantissa.
- seismicrna.core.array.triangular(n: int)
The n th triangular number (n ≥ 0): number of items in an equilateral triangle with n items on each side.
- class seismicrna.core.data.ArrowDataset(data1: MutsDataset, data2: Dataset)
Bases:
MultistepDataset,NarrowDataset,ABCDataset made by integrating two datasets from different steps of the workflow, with one section.
- class seismicrna.core.data.Dataset
Bases:
ABCDataset comprising batches of data.
- property batch_nums
Numbers of the batches.
- iter_batches()
Yield each batch.
- property num_reads
Number of reads in the dataset.
- abstract property pattern: RelPattern | None
Pattern of mutations to count.
- class seismicrna.core.data.LoadFunction(data_type: type[Dataset], /, *more_types: type[Dataset])
Bases:
objectFunction to load a dataset.
- property dataset_types
Types of datasets that this function can load.
- property report_path_auto_fields
Automatic field values of the report file path.
- property report_path_seg_types
Segment types of the report file path.
- class seismicrna.core.data.LoadedDataset(report: BatchedReport, top: Path)
-
Dataset created by loading directly from a Report.
- property end3
3’ end of the section.
- property end5
5’ end of the section.
- get_batch(batch_num: int) ReadBatchIO | MutsBatchIO
Get a specific batch of data.
- abstract classmethod get_batch_type() type[ReadBatchIO | MutsBatchIO]
Type of batch.
- classmethod get_btype_name()
Name of the type of batch.
- abstract classmethod get_report_type() type[BatchedReport]
Type of report.
- property num_batches
Number of batches.
- property ref
Name of the reference.
- property sample
Name of the sample.
- property sect
Name of the section.
- property top
Top-level directory of the dataset.
- class seismicrna.core.data.LoadedMutsDataset(report: BatchedReport, top: Path)
Bases:
LoadedDataset,NarrowDataset,ABC- property end3
3’ end of the section.
- property end5
5’ end of the section.
- property refseq
Sequence of the reference.
- property sect
Name of the section.
- class seismicrna.core.data.MergedDataset(datasets: Iterable[Dataset])
-
Dataset made by merging one or more constituent datasets.
- abstract classmethod get_dataset_load_func() LoadFunction
Function to load one constituent dataset.
- property pattern
Pattern of mutations to count.
- property ref
Name of the reference.
- property top
Top-level directory of the dataset.
- class seismicrna.core.data.MergedMutsDataset(datasets: Iterable[Dataset])
Bases:
MergedDataset,MutsDataset,ABCMergedDataset with explicit mutational data.
- property refseq
Sequence of the reference.
- class seismicrna.core.data.MergedUnbiasDataset(datasets: Iterable[Dataset])
Bases:
MergedDataset,UnbiasDataset,ABCMergedDataset with attributes for correcting observer bias.
- property min_mut_gap
Minimum gap between two mutations.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- class seismicrna.core.data.MultistepDataset(data1: MutsDataset, data2: Dataset)
Bases:
MutsDataset,ABCDataset made by integrating two datasets from different steps of the workflow.
- property end3
3’ end of the section.
- property end5
5’ end of the section.
- abstract classmethod get_dataset1_load_func() LoadFunction
Function to load Dataset 1.
- classmethod get_dataset1_report_file(dataset2_report_file: Path)
Given the report file for Dataset 2, determine the report file for Dataset 1.
- classmethod get_dataset2_load_func()
Function to load Dataset 2.
- classmethod get_report_type()
Type of report.
- property num_batches
Number of batches.
- property ref
Name of the reference.
- property refseq
Sequence of the reference.
- property sample
Name of the sample.
- property sect
Name of the section.
- property top
Top-level directory of the dataset.
- class seismicrna.core.data.MutsDataset
-
Dataset with explicit mutational data.
- property reflen
Length of the reference sequence.
- class seismicrna.core.data.NarrowDataset
Bases:
MutsDataset,ABCMutsDataset with one section, in contrast to a WideDataset that unites one or more sections.
- property section
Section of the dataset.
- class seismicrna.core.data.TallDataset(sample: str, datasets: Iterable[Dataset])
Bases:
MergedDataset,NarrowDataset,ABCDataset made by vertically pooling other datasets from one or more samples aligned to the same reference sequence.
- property end3
3’ end of the section.
- property end5
5’ end of the section.
- property num_batches
Number of batches.
- property sample
Name of the sample.
- property sect
Name of the section.
- class seismicrna.core.data.TallMutsDataset(sample: str, datasets: Iterable[Dataset])
Bases:
TallDataset,MergedMutsDataset,ABCTallDataset with mutational data.
- class seismicrna.core.data.UnbiasDataset
-
Dataset with attributes for correcting observer bias.
- class seismicrna.core.data.WideDataset(sect: str, clusts: dict | None, datasets: Iterable[Dataset])
Bases:
MergedMutsDataset,ABCDataset made by horizontally joining other datasets from one or more sections of the same reference sequence.
- property end3
3’ end of the section.
- property end5
5’ end of the section.
- property num_batches
Number of batches.
- property sample
Name of the sample.
- property sect
Name of the section.
- property section
Section of the dataset.
- property sects
Names of all joined sections.
- seismicrna.core.data.load_datasets(input_path: Iterable[str | Path], load_func: LoadFunction)
Yield a Dataset from each report file in input_path.
- Parameters:
input_path (
Iterable[str | Path]) – Input paths to be searched recursively for report files.load_func (
LoadFunction) – Function to load the dataset from each report file.
- class seismicrna.core.header.ClustHeader(*, max_order: int, min_order: int = 1, **kwargs)
Bases:
HeaderHeader of order and cluster numbers.
- classmethod clustered()
Whether the header has clusters.
- property index
Index of the header.
- classmethod levels()
Levels of the index.
- property max_order
Maximum number of clusters (≥ 1) if clustered, else 0.
- property min_order
Minimum number of clusters (≥ 1) if clustered, else 1.
- class seismicrna.core.header.Header
Bases:
ABCHeader for a table.
- property clusts
Order and cluster numbers of the header.
- property index: Index
Index of the header.
- iter_clust_indexes()
For each cluster in the header, yield an Index or MultiIndex of every column in the header that is part of the cluster.
- classmethod level_keys()
Level keys of the index.
- classmethod level_names()
Level names of the index.
- abstract classmethod levels()
Levels of the index.
- modified(**kwargs)
Return a new header with a possibly modified signature.
- Parameters:
**kwargs – Keyword arguments for modifying the signature of the header. Each argument given here will be passed to make_header and override the attribute (if any) with the same name in this header’s signature. Attributes of this header’s signature that are not overriden will also be passed to make_header.
- Returns:
New header with a possibly modified signature.
- Return type:
- property names
Formatted name of each cluster.
- classmethod num_levels()
Number of levels.
- property orders
Index of order numbers.
- select(**kwargs) Index
Select and return items from the header as an Index.
- property signature
Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.
- property size
Number of items in the Header.
- class seismicrna.core.header.RelClustHeader(*, max_order: int, min_order: int = 1, **kwargs)
Bases:
ClustHeader,RelHeaderHeader of relationships with order and cluster numbers.
- property index
Index of the header.
- class seismicrna.core.header.RelHeader(*, rels: Iterable[str], **kwargs)
Bases:
HeaderHeader of relationships.
- classmethod clustered()
Whether the header has clusters.
- property index
Index of the header.
- classmethod levels()
Levels of the index.
- property max_order
Maximum number of clusters (≥ 1) if clustered, else 0.
- property min_order
Minimum number of clusters (≥ 1) if clustered, else 1.
- property rels: ndarray
Relationships.
- property signature
Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.
- seismicrna.core.header.format_clust_name(order: int, clust: int, allow_zero: bool = False)
Format a pair of order and cluster numbers into a name.
- seismicrna.core.header.format_clust_names(clusts: Iterable[tuple[int, int]], allow_zero: bool = False, allow_duplicates: bool = False)
Format pairs of order and cluster numbers into a list of names.
- Parameters:
clusts (
Iterable[tuple[int,int]]) – Zero or more pairs of order and cluster numbers.allow_zero (
bool = False) – Allow order and cluster to be 0.allow_duplicates (
bool = False) – Allow order and cluster pairs to be duplicated.
- Returns:
List of names of the pairs of order and cluster numbers.
- Return type:
list[str]- Raises:
ValueError – If allow_duplicates is False and an order-cluster pair occurs more than once.
- seismicrna.core.header.index_clusts(order: int)
Index of cluster numbers for one order.
- Parameters:
order (
int) – Number of clusters (≥ 0)- Returns:
Index of cluster numbers
- Return type:
pd.Index
- seismicrna.core.header.index_order_clusts(order: int)
List order and cluster numbers as a MultiIndex for one order.
- Parameters:
order (
int) – Number of clusters (≥ 0)- Returns:
Index wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.
- Return type:
pd.MultiIndex
- seismicrna.core.header.index_orders(max_order: int, min_order: int = 1)
Index of order numbers from min_order to max_order.
- Parameters:
max_order (
int) – Maximum number of clusters (≥ 0)min_order (
int = 1) – Minimum number of clusters (≥ 1)
- Returns:
Index of order numbers
- Return type:
pd.Index
- seismicrna.core.header.index_orders_clusts(max_order: int, min_order: int = 1)
List order and cluster numbers as a MultiIndex for every order from min_order to max_order.
- Parameters:
max_order (
int) – Maximum number of clusters (≥ 0)min_order (
int = 1) – Minimum number of clusters (≥ 1)
- Returns:
Index wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.
- Return type:
pd.MultiIndex
- seismicrna.core.header.list_clusts(order: int)
List all cluster numbers for one order.
- Parameters:
order (
int) – Number of clusters (≥ 0)- Returns:
List of cluster numbers.
- Return type:
list[int]
- seismicrna.core.header.list_order_clusts(order: int)
List order and cluster numbers as 2-tuples for one order.
- Parameters:
order (
int) – Number of clusters (≥ 0)- Returns:
List wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.
- Return type:
list[tuple[int,int]]
- seismicrna.core.header.list_orders(max_order: int, min_order: int = 1)
List order numbers from min_order to max_order.
- Parameters:
max_order (
int) – Maximum number of clusters (≥ 0)min_order (
int = 1) – Minimum number of clusters (≥ 1)
- Returns:
List of numbers of clusters
- Return type:
list[int]
- seismicrna.core.header.list_orders_clusts(max_order: int, min_order: int = 1)
List order and cluster numbers as 2-tuples for every order from min_order to max_order.
- Parameters:
max_order (
int) – Maximum number of clusters (≥ 0)min_order (
int = 1) – Minimum number of clusters (≥ 1)
- Returns:
List wherein each item is a tuple of the order of clustering (i.e. number of clusters) and the cluster number.
- Return type:
list[tuple[int,int]]
- seismicrna.core.header.make_header(*, rels: Iterable[str] = (), max_order: int = 0, min_order: int = 1)
Make a new Header of an appropriate type.
- Parameters:
rels (
Iterable[str]) – Relationships in the header.max_order (
int = 0) – Maximum number of clusters (≥ 1), or 0 if not clustered.min_order (
int = 1) – Minimum number of clusters (≥ 1), or 1 if not clustered.
- Returns:
Header of the appropriate type.
- Return type:
- seismicrna.core.header.parse_header(index: Index | MultiIndex)
Parse an Index into a Header of an appropriate type.
- Parameters:
index (
pd.Index | pd.MultiIndex) – Index to parse.- Returns:
New Header whose index is index.
- Return type:
- seismicrna.core.header.validate_order_clust(order: int, clust: int, allow_zero: bool = False)
Validate a pair of order and cluster numbers.
- Parameters:
- Returns:
If the order and cluster numbers form a valid pair.
- Return type:
- Raises:
TypeError – If order or cluster is not an integer.
ValueError – If the order and cluster numbers do not form a valid pair.
Core – Logging Module
Purpose
Central manager of logging.
- class seismicrna.core.logs.AnsiCode
Bases:
objectFormat text with ANSI codes.
- BLUE = 94
- BOLD = 1
- CODES = (0, 1, 4, 91, 92, 93, 94, 95, 96)
- CYAN = 96
- END = 0
- GREEN = 92
- PURPLE = 95
- RED = 91
- ULINE = 4
- YELLOW = 93
- classmethod end()
Convenience function to end formatting.
- class seismicrna.core.logs.ColorFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)
Bases:
Formatter- ansi_codes = {10: (94,), 20: (96,), 30: (93,), 40: (91,), 50: (95, 1)}
- seismicrna.core.logs.exc_info()
Whether to log exception information.
- seismicrna.core.logs.get_config()
Get the configuration parameters of a logger.
- seismicrna.core.logs.get_top_logger()
Return the top-level logger.
- seismicrna.core.logs.get_verbosity(verbose: int = 0, quiet: int = 0)
Get the logging level based on the verbose and quiet arguments.
- Parameters:
verbose (
int [0,2]) – 0 (): Log only warnings and errors 1 (-v): Also log status updates 2 (-vv): Also log detailed information (useful for debugging)quiet (
int [0,2]) – 0 (): Suppress only status updates and detailed information 1 (-q): Also suppress warnings 2 (-qq): Also suppress non-critical error messages (discouraged)verbosity (Giving both verbose and quiet flags causes the)
verbose=0 (to default to)
quiet=0.
- seismicrna.core.logs.log_exceptions(logging_method: Callable, default: Callable | None)
If any exception occurs, catch it and return an empty list.
- seismicrna.core.logs.set_config(verbose: int, quiet: int, log_file: str | None = None, log_color: bool = True)
Configure the main logger with handlers and verbosity.
Path Core Module
Most of the steps in SEISMIC-RNA produce files that other steps use. For example, the ‘align’ step writes alignment map (BAM) files, from which the ‘relate’ step writes relation vector files, which both the ‘mask’ and ‘table’ steps use.
Steps that pass files to each other must agree on
the path to the file, so that the second step can find the file
the meaning of each part of the path, so that the second step can parse information contained in the path
Although these path conventions could be written separately in each subpackage or module, this strategy is not ideal for several reasons:
It would risk inconsistencies among the modules, causing bugs.
Changing the conventions would require modifying every module, which would be not only tedious but also risky for the first reason.
Defining all the conventions in one place would reduce the size of the code base, improving readability, maintainability, and distribution.
This module defines all file path conventions for all other modules.
- class seismicrna.core.path.Field(dtype: type[str | int | Path], options: Iterable = (), is_ext: bool = False)
Bases:
object
- class seismicrna.core.path.Path(*seg_types: Segment)
Bases:
object
- exception seismicrna.core.path.PathTypeError
-
Use of the wrong type of path or segment
- exception seismicrna.core.path.PathValueError
Bases:
PathError,ValueErrorInvalid value of a path segment field
- class seismicrna.core.path.Segment(segment_name: str, field_types: dict[str, Field], *, order: int = 0, frmt: str | None = None)
Bases:
object- property ext_type
Type of the segment’s file extension, or None if it has no file extension.
- seismicrna.core.path.build(*segment_types: Segment, **field_values: Any)
Return a pathlib.Path from the given segment types and field values.
- seismicrna.core.path.builddir(*segment_types: Segment, **field_values: Any)
Build the path and create it on the file system as a directory if it does not already exist.
- seismicrna.core.path.buildpar(*segment_types: Segment, **field_values: Any)
Build a path and create its parent directory if it does not already exist.
- seismicrna.core.path.cast_path(input_path: Path, input_segments: Sequence[Segment], output_segments: Sequence[Segment], **override: Any)
Cast input_path made of input_segments to a new path made of output_segments.
- Parameters:
input_path (
pathlib.Path) – Input path from which to take the path fields.input_segments (
Sequence[Segment]) – Path segments to use to determine the fields in input_path.output_segments (
Sequence[Segment]) – Path segments to use to determine the fields in output_path.**override (
Any) – Override and supplement the fields in input_path.
- Returns:
Path comprising output_segments made of fields in input_path (as determined by input_segments).
- Return type:
- seismicrna.core.path.create_path_type(*segment_types: Segment)
Create and cache a Path instance from the segment types.
- seismicrna.core.path.deduplicated(func: Callable)
Decorate a Path generator to yield non-redundant paths.
- seismicrna.core.path.fill_whitespace(path: str | Path, fill: str = '_')
Replace all whitespace in path with fill.
- seismicrna.core.path.find_files(path: str | Path, segments: Sequence[Segment])
Yield all files that match a sequence of path segments. The behavior depends on what path is:
If it is a file, then yield path if it matches the segments; otherwise, yield nothing.
If it is a directory, then search it recursively and yield every matching file in the directory and its subdirectories.
- Parameters:
path (
str | pathlib.Path) – Path of a file to check or a directory to search recursively.segments (
Sequence[Segment]) – Sequence(s) of Path segments to check if each file matches.
- Returns:
Paths of files matching the segments.
- Return type:
Generator[Path,Any,None]
- seismicrna.core.path.find_files_chain(paths: Iterable[str | Path], segments: Sequence[Segment])
Yield from find_files called on every path in paths.
- seismicrna.core.path.get_fields_in_seg_types(*segment_types: Segment) dict[str, Field]
Get all fields among the given segment types.
- seismicrna.core.path.parse(path: str | Path, /, *segment_types: Segment)
Return the fields of a path based on the segment types.
- seismicrna.core.path.parse_top_separate(path: str | Path, /, *segment_types: Segment)
Return the fields of a path, and the top field separately.
- seismicrna.core.path.path_matches(path: str | Path, segments: Sequence[Segment])
Check if a path matches a sequence of path segments.
- Parameters:
path (
str | pathlib.Path) – Path of the file/directory.segments (
Sequence[Segment]) – Sequence of path segments to check if the file matches.
- Returns:
Whether the path matches any given sequence of path segments.
- Return type:
- seismicrna.core.path.randdir(parent: str | Path | None = None, prefix: str = '', suffix: str = '', length: int = 8, max_tries: int = 1000)
Build a path of a new directory that does not exist and create it on the file system.
- seismicrna.core.path.sanitize(path: str | Path, strict: bool = False)
Sanitize a path-like object by ensuring it is an absolute path, eliminating symbolic links and redundant path separators/references, and returning a Path object.
- Parameters:
path (
str | pathlib.Path) – Path to sanitize.strict (
bool = False) – Require the path to exist and contain no symbolic link loops.
- Returns:
Absolute, normalized, symlink-free path.
- Return type:
- seismicrna.core.path.transpath(to_dir: str | Path, from_dir: str | Path, path: str | Path, strict: bool = False)
Return the path that would be produced by moving path from from_dir to to_dir (but do not actually move the path on the file system). This function does not require that any of the given paths exist, unless strict is True.
- Parameters:
to_dir (
str | pathlib.Path) – Directory to which to move path.from_dir (
str | pathlib.Path) – Directory from which to move path; must contain path but not necessarily be the direct parent directory of path.path (
str | pathlib.Path) – Path to move; can be a file or directory.strict (
bool = False) – Require that all paths exist and contain no symbolic link loops.
- Returns:
Hypothetical path after moving path from indir to outdir.
- Return type:
- seismicrna.core.path.transpaths(to_dir: str | Path, *paths: str | Path, strict: bool = False)
Return all paths that would be produced by moving all paths in paths from their longest common sub-path to to_dir (but do not actually move the paths on the file system). This function does not require that any of the given paths exist, unless strict is True.
- Parameters:
to_dir (
str | pathlib.Path) – Directory to which to move every path in path.*paths (
str | pathlib.Path) – Paths to move; can be files or directories. A common sub-path must exist among all of these paths.strict (
bool = False) – Require that all paths exist and contain no symbolic link loops.
- Returns:
Hypothetical paths after moving all paths in path to outdir.
- Return type:
tuple[pathlib.Path,]
- class seismicrna.core.report.BatchedRefseqReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport,RefseqReport,ABCConvenience class used as a base for several Report classes.
- class seismicrna.core.report.BatchedReport(**kwargs: Any | Callable[[Report], Any])
-
Report with a number of data batches (one file per batch).
- classmethod batch_types() dict[str, type[ReadBatchIO]]
Type(s) of batch(es) for the report, keyed by name.
- abstract classmethod fields()
All fields of the report.
- classmethod get_batch_type(btype: str | None = None) type[ReadBatchIO]
Return a valid type of batch based on its name.
- class seismicrna.core.report.Field(key: str, title: str, dtype: type, default: Any | None = None, *, iconv: Callable[[Any], Any] | None = None, oconv: Callable[[Any], Any] | None = None)
Bases:
objectField of a report.
- default
- dtype
- iconv
- key
- oconv
- title
- class seismicrna.core.report.OptionField(option: Option, **kwargs)
Bases:
FieldField based on a command line option.
- default
- dtype
- iconv
- key
- oconv
- title
- class seismicrna.core.report.RefseqReport(**kwargs: Any | Callable[[Report], Any])
-
Report associated with a reference sequence file.
- abstract classmethod fields()
All fields of the report.
- class seismicrna.core.report.Report(**kwargs: Any | Callable[[Report], Any])
-
Abstract base class for a report from a step.
- classmethod field_keys()
Keys of all fields of the report.
- abstract classmethod fields()
All fields of the report.
- classmethod from_dict(odata: dict[str, Any])
Convert a dict of raw values (keyed by the titles of their fields) into a dict of encoded values (keyed by the keys of their fields), from which a new Report is instantiated.
- get_field(field: Field, missing_ok: bool = False)
Return the value of a field of the report using the field instance directly, not its key.
- to_dict()
Return a dict of raw values of the fields, keyed by the titles of their fields.
- seismicrna.core.report.calc_dt_minutes(began: datetime, ended: datetime)
Calculate the time taken in minutes.
- seismicrna.core.report.fields()
- seismicrna.core.report.iconv_dict_str_dict_int_dict_int_int(mapping: dict[Any, dict[Any, dict[Any, Any]]]) dict[str, dict[int, dict[int, int]]]
- seismicrna.core.report.oconv_array_int(nums: ndarray)
- seismicrna.core.run.run_func(logging_method: ~typing.Callable, default: ~typing.Callable | None = <class 'list'>, with_tmp: bool = False, pass_keep_tmp: bool = False, *args, **kwargs)
Decorator for a run function.
- seismicrna.core.stats.calc_beta_mv(alpha: float, beta: float)
Find the mean and variance of a beta distribution from its alpha and beta parameters.
- seismicrna.core.stats.calc_beta_params(mean: float, variance: float)
Find the alpha and beta parameters of a beta distribution from its mean and variance.
- seismicrna.core.stats.calc_dirichlet_mv(alpha: ndarray)
Find the means and variances of a Dirichlet distribution from its concentration parameters.
- Parameters:
alpha (
np.ndarray) – Concentration parameters of the Dirichlet distribution.- Returns:
Means and variances of the Dirichlet distribution.
- Return type:
tuple[np.ndarray,np.ndarray]
- seismicrna.core.stats.calc_dirichlet_params(mean: ndarray, variance: ndarray)
Find the concentration parameters of a Dirichlet distribution from its mean and variance.
- Parameters:
mean (
np.ndarray) – Means.variance (
np.ndarray) – Variances.
- Returns:
Concentration parameters.
- Return type:
np.ndarray
- class seismicrna.core.task.Task(func: Callable)
Bases:
objectWrap a parallelizable task in a try-except block so that if it fails, it just returns None rather than crashing the other tasks being run in parallel.
- __call__(*args, **kwargs)
Call the task’s function in a try-except block, return the result if it succeeds, and return None otherwise.
- seismicrna.core.task.as_list_of_tuples(args: Iterable[Any])
Given an iterable of arguments, return a list of 1-item tuples, each containing one of the given arguments. This function is useful for creating a list of tuples to pass to the args parameter of dispatch.
- seismicrna.core.task.dispatch(funcs: list[Callable] | Callable, max_procs: int, parallel: bool, *, hybrid: bool = False, pass_n_procs: bool = True, args: list[tuple] | tuple = (), kwargs: dict[str, Any] | None = None)
Run one or more tasks in series or in parallel, depending on the number of tasks, the maximum number of processes, and whether tasks are allowed to be run in parallel.
- Parameters:
funcs (
list[Callable] | Callable) – The function(s) to run. Can be a list of functions or a single function that is not in a list. If a single function, then if args is a tuple, it is called once with that tuple as its positional arguments; and if args is a list of tuples, it is called for each tuple of positional arguments in args.max_procs (
int) – See docstring for get_num_parallel.parallel (
bool) – See docstring for get_num_parallel.hybrid (
bool = False) – See docstring for get_num_parallel.pass_n_procs (
bool = True) – Whether to pass the number of processes to the function as the keyword argument n_procs.args (
list[tuple] | tuple = ()) – Positional arguments to pass to each function in funcs. Can be a list of tuples of positional arguments or a single tuple that is not in a list. If a single tuple, then each function receives args as positional arguments. If a list, then args must be the same length as funcs; each function funcs[i] receives args[i] as positional arguments.kwargs (
dict[str,Any] | None = None) – Keyword arguments to pass to every function call.
- Returns:
List of the return value of each run.
- Return type:
- seismicrna.core.task.fmt_func_args(func: Callable, *args, **kwargs)
Format the name and arguments of a function as a string.
- seismicrna.core.task.get_num_parallel(n_tasks: int, max_procs: int, parallel: bool, hybrid: bool = False) tuple[int, int]
Determine how to parallelize the tasks.
- Parameters:
n_tasks (
int) – Number of tasks to parallelize. Must be ≥ 1.max_procs (
int) – Maximum number of processes to run at one time. Must be ≥ 1.parallel (
bool) – Whether multiple tasks may be run in parallel. If False, then the number of tasks to run in parallel is set to 1, but the number of processes to run for each task may be > 1.hybrid (
bool = False) – Whether to allow both multiple tasks to run in parallel and, at the same, each task to run multiple processes in parallel.
- Returns:
Number of tasks to run in parallel. Always ≥ 1.
Number of processes to run for each task. Always ≥ 1.
- Return type:
tuple[int,int]
- seismicrna.core.tmp.release_to_out(out_dir: Path, release_dir: Path, initial_path: Path)
Move temporary path(s) to the output directory.
- seismicrna.core.tmp.with_tmp_dir(pass_keep_tmp: bool)
Make a temporary directory, then delete it after the run.
- seismicrna.core.types.fit_uint_type(value: int)
Smallest unsigned int type that will fit the value.
- seismicrna.core.types.get_byte_dtype(nchars: int)
NumPy byte type with the given number of characters.
- seismicrna.core.types.get_max_uint(uint_type: type)
Maximum value of a NumPy unsigned integer type.
- seismicrna.core.types.get_max_value(nbytes: int)
Get the maximum value of an unsigned integer of N bytes.
- seismicrna.core.types.get_uint_dtype(nbytes: int)
NumPy uint data type with the given number of bytes.
Version information for SEISMIC-RNA
- seismicrna.core.version.format_version(major: int = 0, minor: int = 19, patch: int = 0, prtag: str = '')
- seismicrna.core.version.parse_version(version: str = '0.19.0')
Major and minor versions, patch, and pre-release tag.
- seismicrna.core.write.need_write(file: Path, force: bool = False, warn: bool = True)
Determine whether a file must be written.
- Parameters:
file (
Path) – File for which to check the need for writing.force (
bool = False) – Force the file to be written, even if it already exists.warn (
bool = True) – If the file does not need to be written, then log a warning.
- Returns:
Whether the file must be written.
- Return type:
- seismicrna.core.write.write_mode(force: bool = False, binary: bool = False)
Get the mode in which to open a file for writing.
- Parameters:
force (
bool = False) – Force the file to be written, truncating the file if it exists. If False and the file exists, a FileExistsError will be raised.binary (
bool = False) – Write the file in binary mode instead of text mode.
- Returns:
The mode argument for the builtin function open().
- Return type: