PyTUPS User API
The following describes the main classes that make up PyTUPS. Most users will primarily interact with the functionalities exposed in these classes.
Core Components
Dataset (pytups.core.Dataset.Dataset)
- class pytups.core.Dataset.Dataset[source]
A wrapper class for interacting with the NOAA Studies API.
Manages the retrieval, parsing, and aggregation of NOAA study data, and provides methods to access summaries, publications, sites, and external data files.
- BASE_URL
The NOAA API endpoint URL.
- Type:
str
- studies
A mapping from NOAADatasetId to NOAADataset instances.
- Type:
dict
- data_table_index
A mapping from dataTableID to associated study, site, and paleo data.
- Type:
dict
- search_studies(...)[source]
Searches for studies using provided parameters and parses the response.
- get_data(dataTableIDs, file_urls)[source]
Fetches and returns external data based on data table IDs or file URLs.
- BASE_URL = 'https://www.ncei.noaa.gov/access/paleo-search/study/search.json'
- get_data(dataTableIDs=None, file_urls=None)[source]
Fetch external data for given dataTableIDs or file URLs, perform validations, and attach study and site metadata.
- Parameters:
dataTableIDs (list or str, optional) – One or more NOAA data table IDs.
file_urls (list or str, optional) – One or more file URLs.
- Returns:
A list of DataFrames corresponding to the fetched data.
- Return type:
list of pandas.DataFrame
- Raises:
ValueError – For missing parent study mapping, missing file URL, or proprietary/unsupported file types.
Exception – Propagates any exceptions raised by the parser.
- get_data_deprecated(dataTableIDs=None, file_urls=None)[source]
Fetch external data for given dataTableIDs or file URLs and attach study/site metadata.
- Parameters:
dataTableIDs (list or str, optional) – One or more NOAA data table IDs.
file_urls (list or str, optional) – One or more file URLs.
- Returns:
A list of DataFrames, each corresponding to fetched data.
- Return type:
list of pandas.DataFrame
- get_publications_dataframe()[source]
Get a DataFrame of all publications aggregated from the studies.
- Returns:
A DataFrame containing publication details with study context.
- Return type:
pandas.DataFrame
- get_sites_dataframe()[source]
Get a DataFrame of all sites aggregated from the studies, including paleo data.
- Returns:
A DataFrame containing site details with study context and paleo data.
- Return type:
pandas.DataFrame
- get_summary_dataframe()[source]
Get a DataFrame summarizing all loaded studies.
- Returns:
A DataFrame with a summary of study metadata and components.
- Return type:
pandas.DataFrame
- search_studies(xml_id=None, noaa_id=None, data_publisher='NOAA', data_type_id=None, keywords=None, investigators=None, max_lat=None, min_lat=None, max_lon=None, min_lon=None, location=None, publication=None, search_text=None, earliest_year=None, latest_year=None, cv_whats=None, recent=False)[source]
Search for NOAA studies using the provided parameters.
At least one parameter must be specified for a search to be initiated.
- Parameters:
xml_id (str, optional) – XML identifier for a study.
noaa_id (str, optional) – NOAA study identifier.
data_publisher (str, optional) – Publisher of the data, default is “NOAA”.
data_type_id (str, optional) – Data type identifier.
keywords (str, optional) – Keywords for the search.
investigators (str, optional) – Investigator names.
max_lat (float, optional) – Maximum latitude.
min_lat (float, optional) – Minimum latitude.
max_lon (float, optional) – Maximum longitude.
min_lon (float, optional) – Minimum longitude.
location (str, optional) – Location description.
publication (str, optional) – Publication details.
search_text (str, optional) – Additional text to search within the study.
earliest_year (int, optional) – Earliest year of study.
latest_year (int, optional) – Latest year of study.
cv_whats (str, optional) – Controlled vocabulary term.
recent (bool, optional) – Flag to filter recent studies.
- Returns:
The method populates internal attributes with the retrieved data. Requires at least one single parameter. Parameter validation to be implemented soon.
- Return type:
None
NOAADataset (pytups.core.NOAADataset.NOAADataset)
- class pytups.core.NOAADataset.NOAADataset(study_data)[source]
This class encapsulates study metadata and its related components (e.g. publications, sites) retrieved from the NOAA API.
- study_id
The unique NOAA study identifier.
- Type:
str
- xml_id
The XML identifier of the study.
- Type:
str
- metadata
A dictionary containing basic metadata such as studyName, dataType, earliestYearBP, etc.
- Type:
dict
- investigators
A comma-separated string of investigator names.
- Type:
str
- publications
A list of Publication objects associated with the study.
- Type:
list of Publication
Utility Classes
PaleoData (pytups.utils.PaleoData.PaleoData)
- class pytups.utils.PaleoData.PaleoData(paleo_data, study_id, site_id)[source]
Represents paleo data associated with a site.
- datatable_id
The NOAA data table identifier.
- Type:
str
- dataTableName
The name of the data table.
- Type:
str
- timeUnit
The unit of time for the data.
- Type:
str
- file_url
The URL from which the data file can be fetched.
- Type:
str
- variables
A list of variable names or identifiers.
- Type:
list
- study_id
The NOAA study ID this data belongs to.
- Type:
str
- site_id
The site identifier this data belongs to.
- Type:
str
Publication (pytups.utils.Publication.Publication)
- class pytups.utils.Publication.Publication(pub_data)[source]
Represents a publication within a study.
- author
The name of the author(s) of the publication.
- Type:
str
- title
The title of the publication.
- Type:
str
- journal
The journal where the publication appeared.
- Type:
str
- year
The publication year.
- Type:
str
- volume
The volume number (if applicable).
- Type:
str or None
- number
The issue number (if applicable).
- Type:
str or None
- pages
The page numbers (if applicable).
- Type:
str or None
- pub_type
The type of publication.
- Type:
str or None
- doi
The Digital Object Identifier.
- Type:
str or None
- url
URL for the publication.
- Type:
str or None
- study_id
The NOAA study ID to which this publication belongs.
- Type:
str or None
Site (pytups.utils.Site.Site)
- class pytups.utils.Site.Site(site_data, study_id)[source]
Represents a site within a study.
- site_id
The unique identifier for the site.
- Type:
str
- site_name
The name of the site.
- Type:
str
- location_name
A descriptive location name.
- Type:
str
- lat
The latitude coordinate.
- Type:
float or str
- lon
The longitude coordinate.
- Type:
float or str
- min_elevation
The minimum elevation in meters.
- Type:
float or None
- max_elevation
The maximum elevation in meters.
- Type:
float or None
Parsers
StandardParser (pytups.utils.Parser.StandardParser.StandardParser)
- class pytups.utils.Parser.StandardParser.StandardParser(url=None)[source]
StandardParser encapsulates the complete workflow for downloading and parsing a NOAA text file.
The class maintains attributes such as the URL, file lines, metadata boundaries, extracted variable names, header skip count, parsed data, and the final DataFrame.
- url
The URL of the file to parse.
- Type:
str
- lines
The content of the file split into lines.
- Type:
list of str
- meta_start
The index of the first metadata line.
- Type:
int
- meta_end
The index of the last metadata line.
- Type:
int
- variables
The extracted variable names.
- Type:
list of str
- skip_lines
The number of header lines to skip in the data block.
- Type:
int
- data
The parsed data rows.
- Type:
list of list of str
- df
The constructed DataFrame.
- Type:
pandas.DataFrame