PyTUPS User API

The following describes the main classes that make up PyTUPS. Most users will primarily interact with the functionalities exposed in these classes.

Core Components

Dataset (pytups.core.Dataset.Dataset)

class pytups.core.Dataset.Dataset[source]

A wrapper class for interacting with the NOAA Studies API.

Manages the retrieval, parsing, and aggregation of NOAA study data, and provides methods to access summaries, publications, sites, and external data files.

BASE_URL

The NOAA API endpoint URL.

Type:

str

studies

A mapping from NOAADatasetId to NOAADataset instances.

Type:

dict

data_table_index

A mapping from dataTableID to associated study, site, and paleo data.

Type:

dict

__init__()[source]

Initializes the Dataset.

search_studies(...)[source]

Searches for studies using provided parameters and parses the response.

_fetch_api(params)[source]

Internal method to make an HTTP GET request to the NOAA API.

_parse_response(data)[source]

Internal method to parse the JSON response and populate studies.

get_summary_dataframe()[source]

Returns a DataFrame summarizing all loaded studies.

get_publications_dataframe()[source]

Returns a DataFrame of publications aggregated from studies.

get_sites_dataframe()[source]

Returns a DataFrame of sites aggregated from studies.

get_data(dataTableIDs, file_urls)[source]

Fetches and returns external data based on data table IDs or file URLs.

BASE_URL = 'https://www.ncei.noaa.gov/access/paleo-search/study/search.json'
get_data(dataTableIDs=None, file_urls=None)[source]

Fetch external data for given dataTableIDs or file URLs, perform validations, and attach study and site metadata.

Parameters:
  • dataTableIDs (list or str, optional) – One or more NOAA data table IDs.

  • file_urls (list or str, optional) – One or more file URLs.

Returns:

A list of DataFrames corresponding to the fetched data.

Return type:

list of pandas.DataFrame

Raises:
  • ValueError – For missing parent study mapping, missing file URL, or proprietary/unsupported file types.

  • Exception – Propagates any exceptions raised by the parser.

get_data_deprecated(dataTableIDs=None, file_urls=None)[source]

Fetch external data for given dataTableIDs or file URLs and attach study/site metadata.

Parameters:
  • dataTableIDs (list or str, optional) – One or more NOAA data table IDs.

  • file_urls (list or str, optional) – One or more file URLs.

Returns:

A list of DataFrames, each corresponding to fetched data.

Return type:

list of pandas.DataFrame

get_publications_dataframe()[source]

Get a DataFrame of all publications aggregated from the studies.

Returns:

A DataFrame containing publication details with study context.

Return type:

pandas.DataFrame

get_sites_dataframe()[source]

Get a DataFrame of all sites aggregated from the studies, including paleo data.

Returns:

A DataFrame containing site details with study context and paleo data.

Return type:

pandas.DataFrame

get_summary_dataframe()[source]

Get a DataFrame summarizing all loaded studies.

Returns:

A DataFrame with a summary of study metadata and components.

Return type:

pandas.DataFrame

search_studies(xml_id=None, noaa_id=None, data_publisher='NOAA', data_type_id=None, keywords=None, investigators=None, max_lat=None, min_lat=None, max_lon=None, min_lon=None, location=None, publication=None, search_text=None, earliest_year=None, latest_year=None, cv_whats=None, recent=False)[source]

Search for NOAA studies using the provided parameters.

At least one parameter must be specified for a search to be initiated.

Parameters:
  • xml_id (str, optional) – XML identifier for a study.

  • noaa_id (str, optional) – NOAA study identifier.

  • data_publisher (str, optional) – Publisher of the data, default is “NOAA”.

  • data_type_id (str, optional) – Data type identifier.

  • keywords (str, optional) – Keywords for the search.

  • investigators (str, optional) – Investigator names.

  • max_lat (float, optional) – Maximum latitude.

  • min_lat (float, optional) – Minimum latitude.

  • max_lon (float, optional) – Maximum longitude.

  • min_lon (float, optional) – Minimum longitude.

  • location (str, optional) – Location description.

  • publication (str, optional) – Publication details.

  • search_text (str, optional) – Additional text to search within the study.

  • earliest_year (int, optional) – Earliest year of study.

  • latest_year (int, optional) – Latest year of study.

  • cv_whats (str, optional) – Controlled vocabulary term.

  • recent (bool, optional) – Flag to filter recent studies.

Returns:

The method populates internal attributes with the retrieved data. Requires at least one single parameter. Parameter validation to be implemented soon.

Return type:

None

NOAADataset (pytups.core.NOAADataset.NOAADataset)

class pytups.core.NOAADataset.NOAADataset(study_data)[source]

This class encapsulates study metadata and its related components (e.g. publications, sites) retrieved from the NOAA API.

study_id

The unique NOAA study identifier.

Type:

str

xml_id

The XML identifier of the study.

Type:

str

metadata

A dictionary containing basic metadata such as studyName, dataType, earliestYearBP, etc.

Type:

dict

investigators

A comma-separated string of investigator names.

Type:

str

publications

A list of Publication objects associated with the study.

Type:

list of Publication

sites

A list of Site objects associated with the study.

Type:

list of Site

__init__(study_data)[source]

Initializes the NOAADataset instance using a dictionary of study data.

_load_metadata(study_data)[source]

Extracts metadata from the study data.

_load_investigators(study_data)[source]

Extracts investigator details from the study data.

to_dict()[source]

Returns a dictionary summary of the study and its components.

to_dict()[source]

Convert the study data and its components to a dictionary.

Returns:

A dictionary representing the study including metadata, investigators, publications, and sites.

Return type:

dict

Utility Classes

PaleoData (pytups.utils.PaleoData.PaleoData)

class pytups.utils.PaleoData.PaleoData(paleo_data, study_id, site_id)[source]

Represents paleo data associated with a site.

datatable_id

The NOAA data table identifier.

Type:

str

dataTableName

The name of the data table.

Type:

str

timeUnit

The unit of time for the data.

Type:

str

file_url

The URL from which the data file can be fetched.

Type:

str

variables

A list of variable names or identifiers.

Type:

list

study_id

The NOAA study ID this data belongs to.

Type:

str

site_id

The site identifier this data belongs to.

Type:

str

to_dict()[source]

Return a dictionary representation of the paleo data.

to_dict()[source]

Convert the paleo data into a dictionary.

Returns:

A dictionary representation of the paleo data.

Return type:

dict

Publication (pytups.utils.Publication.Publication)

class pytups.utils.Publication.Publication(pub_data)[source]

Represents a publication within a study.

author

The name of the author(s) of the publication.

Type:

str

title

The title of the publication.

Type:

str

journal

The journal where the publication appeared.

Type:

str

year

The publication year.

Type:

str

volume

The volume number (if applicable).

Type:

str or None

number

The issue number (if applicable).

Type:

str or None

pages

The page numbers (if applicable).

Type:

str or None

pub_type

The type of publication.

Type:

str or None

doi

The Digital Object Identifier.

Type:

str or None

url

URL for the publication.

Type:

str or None

study_id

The NOAA study ID to which this publication belongs.

Type:

str or None

get_citation_key()[source]

Generate and return a unique citation key.

to_dict()[source]

Return a dictionary representation of the publication.

get_citation_key()[source]

Generate a unique citation key for the publication.

Returns:

A citation key in the format: “<LastName>_<FirstSignificantWord>_<Year>_<StudyID>”.

Return type:

str

to_dict()[source]

Convert the publication data into a dictionary.

Returns:

A dictionary representation of the publication.

Return type:

dict

Site (pytups.utils.Site.Site)

class pytups.utils.Site.Site(site_data, study_id)[source]

Represents a site within a study.

site_id

The unique identifier for the site.

Type:

str

site_name

The name of the site.

Type:

str

location_name

A descriptive location name.

Type:

str

lat

The latitude coordinate.

Type:

float or str

lon

The longitude coordinate.

Type:

float or str

min_elevation

The minimum elevation in meters.

Type:

float or None

max_elevation

The maximum elevation in meters.

Type:

float or None

paleo_data

A list of PaleoData objects associated with this site.

Type:

list of PaleoData

to_dict()[source]

Return a dictionary representation of the site.

to_dict()[source]

Convert the site data into a dictionary.

Returns:

A dictionary representation of the site, including its paleo data.

Return type:

dict

Parsers

StandardParser (pytups.utils.Parser.StandardParser.StandardParser)

class pytups.utils.Parser.StandardParser.StandardParser(url=None)[source]

StandardParser encapsulates the complete workflow for downloading and parsing a NOAA text file.

The class maintains attributes such as the URL, file lines, metadata boundaries, extracted variable names, header skip count, parsed data, and the final DataFrame.

url

The URL of the file to parse.

Type:

str

lines

The content of the file split into lines.

Type:

list of str

meta_start

The index of the first metadata line.

Type:

int

meta_end

The index of the last metadata line.

Type:

int

variables

The extracted variable names.

Type:

list of str

skip_lines

The number of header lines to skip in the data block.

Type:

int

data

The parsed data rows.

Type:

list of list of str

df

The constructed DataFrame.

Type:

pandas.DataFrame

parse(url=None)[source]

Execute the full parsing workflow and return the constructed DataFrame.

_fetch_file()[source]

Fetch the file and set the ‘lines’ attribute.

_identify_metadata()[source]

Identify metadata boundaries and set ‘meta_start’ and ‘meta_end’.

_extract_variables()[source]

Extract variable names and header skip count, setting ‘variables’ and ‘skip_lines’.

_parse_data()[source]

Parse the data block from the file and set the ‘data’ attribute.

_construct_dataframe()[source]

Construct the final DataFrame from parsed data and variables.

parse(url=None)[source]

Orchestrate the full parsing process.

Parameters:

url (str, optional) – The URL to parse. If provided, it overrides the existing URL attribute.

Returns:

The constructed DataFrame.

Return type:

pandas.DataFrame

Raises:

ParsingError – If any step of the parsing process fails.