pytups.core package

Submodules

pytups.core.NOAADataset module

class pytups.core.NOAADataset.NOAADataset(study_data)[source]

Bases: object

This class encapsulates study metadata and its related components (e.g. publications, sites) retrieved from the NOAA API.

study_id

The unique NOAA study identifier.

Type:

str

xml_id

The XML identifier of the study.

Type:

str

metadata

A dictionary containing basic metadata such as studyName, dataType, earliestYearBP, etc.

Type:

dict

investigators

A comma-separated string of investigator names.

Type:

str

publications

A list of Publication objects associated with the study.

Type:

list of Publication

sites

A list of Site objects associated with the study.

Type:

list of Site

__init__(study_data)[source]

Initializes the NOAADataset instance using a dictionary of study data.

_load_metadata(study_data)[source]

Extracts metadata from the study data.

_load_investigators(study_data)[source]

Extracts investigator details from the study data.

to_dict()[source]

Returns a dictionary summary of the study and its components.

to_dict()[source]

Convert the study data and its components to a dictionary.

Returns:

A dictionary representing the study including metadata, investigators, publications, and sites.

Return type:

dict

pytups.core.Dataset module

class pytups.core.Dataset.Dataset[source]

Bases: object

A wrapper class for interacting with the NOAA Studies API.

Manages the retrieval, parsing, and aggregation of NOAA study data, and provides methods to access summaries, publications, sites, and external data files.

BASE_URL

The NOAA API endpoint URL.

Type:

str

studies

A mapping from NOAADatasetId to NOAADataset instances.

Type:

dict

data_table_index

A mapping from dataTableID to associated study, site, and paleo data.

Type:

dict

__init__()[source]

Initializes the Dataset.

search_studies(...)[source]

Searches for studies using provided parameters and parses the response.

_fetch_api(params)[source]

Internal method to make an HTTP GET request to the NOAA API.

_parse_response(data)[source]

Internal method to parse the JSON response and populate studies.

get_summary_dataframe()[source]

Returns a DataFrame summarizing all loaded studies.

get_publications_dataframe()[source]

Returns a DataFrame of publications aggregated from studies.

get_sites_dataframe()[source]

Returns a DataFrame of sites aggregated from studies.

get_data(dataTableIDs, file_urls)[source]

Fetches and returns external data based on data table IDs or file URLs.

BASE_URL = 'https://www.ncei.noaa.gov/access/paleo-search/study/search.json'
get_data(dataTableIDs=None, file_urls=None)[source]

Fetch external data for given dataTableIDs or file URLs, perform validations, and attach study and site metadata.

Parameters:
  • dataTableIDs (list or str, optional) – One or more NOAA data table IDs.

  • file_urls (list or str, optional) – One or more file URLs.

Returns:

A list of DataFrames corresponding to the fetched data.

Return type:

list of pandas.DataFrame

Raises:
  • ValueError – For missing parent study mapping, missing file URL, or proprietary/unsupported file types.

  • Exception – Propagates any exceptions raised by the parser.

get_data_deprecated(dataTableIDs=None, file_urls=None)[source]

Fetch external data for given dataTableIDs or file URLs and attach study/site metadata.

Parameters:
  • dataTableIDs (list or str, optional) – One or more NOAA data table IDs.

  • file_urls (list or str, optional) – One or more file URLs.

Returns:

A list of DataFrames, each corresponding to fetched data.

Return type:

list of pandas.DataFrame

get_publications_dataframe()[source]

Get a DataFrame of all publications aggregated from the studies.

Returns:

A DataFrame containing publication details with study context.

Return type:

pandas.DataFrame

get_sites_dataframe()[source]

Get a DataFrame of all sites aggregated from the studies, including paleo data.

Returns:

A DataFrame containing site details with study context and paleo data.

Return type:

pandas.DataFrame

get_summary_dataframe()[source]

Get a DataFrame summarizing all loaded studies.

Returns:

A DataFrame with a summary of study metadata and components.

Return type:

pandas.DataFrame

search_studies(xml_id=None, noaa_id=None, data_publisher='NOAA', data_type_id=None, keywords=None, investigators=None, max_lat=None, min_lat=None, max_lon=None, min_lon=None, location=None, publication=None, search_text=None, earliest_year=None, latest_year=None, cv_whats=None, recent=False)[source]

Search for NOAA studies using the provided parameters.

At least one parameter must be specified for a search to be initiated.

Parameters:
  • xml_id (str, optional) – XML identifier for a study.

  • noaa_id (str, optional) – NOAA study identifier.

  • data_publisher (str, optional) – Publisher of the data, default is “NOAA”.

  • data_type_id (str, optional) – Data type identifier.

  • keywords (str, optional) – Keywords for the search.

  • investigators (str, optional) – Investigator names.

  • max_lat (float, optional) – Maximum latitude.

  • min_lat (float, optional) – Minimum latitude.

  • max_lon (float, optional) – Maximum longitude.

  • min_lon (float, optional) – Minimum longitude.

  • location (str, optional) – Location description.

  • publication (str, optional) – Publication details.

  • search_text (str, optional) – Additional text to search within the study.

  • earliest_year (int, optional) – Earliest year of study.

  • latest_year (int, optional) – Latest year of study.

  • cv_whats (str, optional) – Controlled vocabulary term.

  • recent (bool, optional) – Flag to filter recent studies.

Returns:

The method populates internal attributes with the retrieved data. Requires at least one single parameter. Parameter validation to be implemented soon.

Return type:

None

Module contents