Published October 1, 2024 | Version 1.0
Dataset Open

Data and Processing from "Carbon-centric dynamics of Earth's marine phytoplankton"

  • 1. ROR icon Dalhousie University

Contributors

Researcher:

Supervisor:

  • 1. ROR icon Dalhousie University

Description

Brief Summary:
This documentation is for associated data and code for: 
A. Stoer, K. Fennel, Carbon-centric dynamics of Earth's marine phytoplankton. Proceedings of the National Academy of Sciences (2024).
 
To cite this software and data, please use:
A. Stoer, K. Fennel, Data and processing from "Carbon-centric dynamics of Earth's marine phytoplankton". Zenodo. https://doi.org/10.5281/zenodo.10949682. Deposited 1 October 2024.
 
List of folders and subfolders and what they contain:
  1. raw data: Contains raw data used in the analysis. This folder does not contain the satellite imagery, which will need to be downloaded from the NASA Ocean Color website (https://oceancolor.gsfc.nasa.gov/).
    1. bgc-argo float data (subfolder): Includes Argo data from its original source or put into a similar Argo format
    2. global region data (subfolder): Includes data used to subset the Argo profiles into each 10deg lat region and basin.
    3. graff et al 2015 data (subfolder): Include the data digitized from Graff et al.'s Fig. 2.
  2. processed data: data processing by this study (Stoer and Fennel, 2024)
    1. processed bgc-argo data (subfolder): A binned processed file is present for each Argo float used in the analysis. Note these files include those describe in Table S1 (these are later processed in "3_stock_bloom_calc.py")
    2. processed satellite data (subfolder): includes a 10-deg latitude averaged for each satellite image processed (called "chl_sat_df_merged.csv"). This is later used to calculate a satellite chlorophyll-a climatology in "3_stock_bloom_calc.py".
    3. processed chla-irrad data (subfolder): includes the quality-controlled light diffuse attenuation data coupled with the chlorophyll-a fluorescence data to calculate slope factor corrections (the file is called "processed chla-irrad data.csv").
    4. processed topography data (subfolder): includes smoothed topography data (file named "ETOPO_2022_v1_60s_N90W180_surface_mod.tiff").
  3. software:
    1. 0_ftp_argo_data_download.py: This program downloads the Argo data from the Global Data Assembly Center's FTP. Running this program will provide new Argo float profiles. However, there will be new floats and profiles present if downloaded. This will not match the historical record of Argo floats used in this analysis but could be useful for replicating this analysis when more data becomes available. The historical record of BGC-Argo floats are present in "/raw data/bgc-argo float data/" path. If you wish to downloaded other float data, see Gordon et al. (2020), Hamilton and Leidos (2017) and the data from the misclab website (https://misclab.umeoce.maine.edu/floats/).
    2. 1_argo_data_processing.py: This program quality-controls and bins the biogeochemical data into a consistent format. This includes corrections and checks, like the spike/noise test or the non-photochemical quenching correction.
    3. 2_sat_data_processing.py: this program processes the satellite data downloaded from the NASA Ocean Color website.
    4. 3_stock_bloom_calc.py: this is the main program used to described the results of the study. The program takes the processed Argo data and groups it into regions and calculates slope factors, phytoplankton carbon & chlorophyll-a, global stocks, and bloom metrics.
    5. 4_stock_calc_longhurst_province.py: This program repeats the global stocks calculations performed in "3_stock_bloom_calc.py" but bases the grouping on Longhurst Biogeochemical Provinces.
How to Replicate this Analysis:
Each program should be run in the order listed above. Path names where the data files have been downloaded will need to be updated in the code.
 
To use the exact same Sprof files as used in the paper, skip running "0_ftp_argo_data_download.py" and start with "1_argo_data_processing.py" instead. Use the float data from the folder "bgc-argo float data". The program "0_ftp_argo_data_download.py" downloads the latest data from Argo database, so it is useful for updating the analysis. The program "1_argo_data_processing.py" may also be skipped to save time and the processed BGC-Argo float data may be used instead (see folder named "processed bgc-argo data"). 
 
Similarly, the program "2_sat_data_processing.py" may also be skipped, which otherwise can take multiple hours to process. The raw data is available from the NASA Ocean Color website (https://oceancolor.gsfc.nasa.gov/). The processed data from "2_sat_data_processing.py" is available so this step may be skipped to save time as well.
 
The program "3_stock_bloom_calc.py" will require running "ocean_toolbox.py" (see below) in another tab. The portion of the program that involves QC for the irradiance profiles has been commented out to save processing time, and the pre-processed data used in the study has been linked instead (see folder "processed light data"). Similarly, pre-processed topography data is present in this repository. The original Earth Topography data can be accessed at https://www.ncei.noaa.gov/products/etopo-global-relief-model.

 

A version of "3_stock_bloom_calc.py" using Longhurst provinces is available for exploring alternative groupings and their effects on stock calculations. See the program named "4_stock_calc_longhurst_province.py". You will need to download the Longhurst biogeochemical provinces from https://www.marineregions.org/.

To explore the effects of different slope factors, averaging methods, bbp spectral slopes, etc, the user will likely want to make changes to "3_stock_bloom_calc.py". Please do not hesitate to contact the correponding author (Adam Stoer) for guidance or questions.

ocean_toolbox.py:

import statsmodels.formula.api as smf
import os
import matplotlib.pyplot as plt
import numpy as np
from uncertainties import unumpy as unp
from scipy import stats

def file_grab(root,find,start): #grabs files by file extensions and location
    filelst = []
    for subdir, dirs, files in os.walk(root):
        for file in files:
            filepath = subdir + os.sep + file
            if filepath.endswith(find):
                if filepath.startswith(start):
                    filelst.append(filepath)
    return filelst

def sep_bbp(data, name_z, name_chla, name_bbp):
    
    '''
    data: Pandas Dataframe containing the profile data
    name_z: name of the depth variable in data
    name_chla: name of the chlorophyll-a variable in data
    name_bbp: name of the particle backscattering variable in data    
    
    returns: the data variable with particle backscattering partitioned into 
    phytoplankton (bbpphy) and non-algal particle components (bbpnap).
    '''
    #name_chla = 'chla'
    #name_z = 'depth'
    #name_bbp = 'bbp470'
    dcm = data[data.loc[:,name_chla]==data.loc[:,name_chla].max()][name_z].values[0] # Find depth of deep chla maximum
    part_prof = data[(data.loc[:,name_bbp]<np.median(data.loc[:,name_bbp]))] # find median bbp of profile
        
    mod = smf.quantreg('bbp470 ~ ' + str(name_z), 
                       part_prof).fit(q=0.01) # Find model to 1 percentile
    y_pred = mod.predict(part_prof.loc[:,name_z]) # Create predicted bbp_nap
    
    part_prof.loc[:,'bbp_back'] = y_pred.values # Predicted bbp NAP from linear trend
    z_lim = part_prof.loc[(part_prof.loc[:,'bbp_back'].div(part_prof.loc[:,name_bbp])>=1), name_z].min()                                         
    
    # Find depth where bbp NAP and bbp intersect
    data.loc[data[name_z]>=z_lim, 'bbp_back'] = data.loc[data[name_z]>=z_lim, name_bbp].tolist()
    data.loc[data[name_z]<z_lim,'bbp_back'] = data.loc[data[name_z]==z_lim, name_bbp].values[0] #data.loc[data[name_z]<z_lim, name_z].mul(lr.slope).add(lr.intercept)
    
    
    data.loc[:,'bbpphy'] = data.loc[:, name_bbp].sub(data.loc[:,'bbp_back']) # Subtract bbp NAP from bbp for bbp from phytoplankton
    data.loc[(data['bbpphy']<0)|(data['depth']>z_lim),'bbpphy'] = 0 # Subtract bbp NAP from bbp for bbp from phytoplankton

    return data['bbpphy'], z_lim

def bbp_to_cphy(bbp_data, sf):
    
    '''
    data: Pandas Dataframe containing the profile data
    name_bbp: name of the particulate backscattering variable in data
    name_bbp_err: name of particulate backscattering error variable in data
    
    returns: the data variable with particle backscattering [/m] converted into
    phytoplankton carbon [mg/m^3].
    '''
    
    cphy_data = bbp_data.mul(sf)    

    return cphy_data

Files

Stoer.and.Fennel.2024.PNAS.Data.zip

Files (5.3 GB)

Name Size Download all
md5:8edcb6a29e7f26de722858b0148790cc
5.3 GB Preview Download