Data and Processing from "Carbon-centric dynamics of Earth's marine phytoplankton"
- raw data: Contains raw data used in the analysis. This folder does not contain the satellite imagery, which will need to be downloaded from the NASA Ocean Color website (
- bgc-argo float data (subfolder): Includes Argo data from its original source or put into a similar Argo format
- global region data (subfolder): Includes data used to subset the Argo profiles into each 10deg lat region and basin.
- graff et al 2015 data (subfolder): Include the data digitized from Graff et al.'s Fig. 2.
- processed data: data processing by this study (Stoer and Fennel, 2024)
- processed bgc-argo data (subfolder): A binned processed file is present for each Argo float used in the analysis. Note these files include those describe in Table S1 (these are later processed in "")
- processed satellite data (subfolder): includes a 10-deg latitude averaged for each satellite image processed (called "chl_sat_df_merged.csv"). This is later used to calculate a satellite chlorophyll-a climatology in "".
- processed chla-irrad data (subfolder): includes the quality-controlled light diffuse attenuation data coupled with the chlorophyll-a fluorescence data to calculate slope factor corrections (the file is called "processed chla-irrad data.csv").
- processed topography data (subfolder): includes smoothed topography data (file named "ETOPO_2022_v1_60s_N90W180_surface_mod.tiff").
- software:
- This program downloads the Argo data from the Global Data Assembly Center's FTP. Running this program will provide new Argo float profiles. However, there will be new floats and profiles present if downloaded. This will not match the historical record of Argo floats used in this analysis but could be useful for replicating this analysis when more data becomes available. The historical record of BGC-Argo floats are present in "/raw data/bgc-argo float data/" path. If you wish to downloaded other float data, see Gordon et al. (2020), Hamilton and Leidos (2017) and the data from the misclab website (
- This program quality-controls and bins the biogeochemical data into a consistent format. This includes corrections and checks, like the spike/noise test or the non-photochemical quenching correction.
- this program processes the satellite data downloaded from the NASA Ocean Color website.
- this is the main program used to described the results of the study. The program takes the processed Argo data and groups it into regions and calculates slope factors, phytoplankton carbon & chlorophyll-a, global stocks, and bloom metrics.
- This program repeats the global stocks calculations performed in "" but bases the grouping on Longhurst Biogeochemical Provinces.
A version of "" using Longhurst provinces is available for exploring alternative groupings and their effects on stock calculations. See the program named "". You will need to download the Longhurst biogeochemical provinces from
To explore the effects of different slope factors, averaging methods, bbp spectral slopes, etc, the user will likely want to make changes to "". Please do not hesitate to contact the correponding author (Adam Stoer) for guidance or questions.
import statsmodels.formula.api as smf
import os
import matplotlib.pyplot as plt
import numpy as np
from uncertainties import unumpy as unp
from scipy import stats
def file_grab(root,find,start): #grabs files by file extensions and location
filelst = []
for subdir, dirs, files in os.walk(root):
for file in files:
filepath = subdir + os.sep + file
if filepath.endswith(find):
if filepath.startswith(start):
return filelst
def sep_bbp(data, name_z, name_chla, name_bbp):
data: Pandas Dataframe containing the profile data
name_z: name of the depth variable in data
name_chla: name of the chlorophyll-a variable in data
name_bbp: name of the particle backscattering variable in data
returns: the data variable with particle backscattering partitioned into
phytoplankton (bbpphy) and non-algal particle components (bbpnap).
#name_chla = 'chla'
#name_z = 'depth'
#name_bbp = 'bbp470'
dcm = data[data.loc[:,name_chla]==data.loc[:,name_chla].max()][name_z].values[0] # Find depth of deep chla maximum
part_prof = data[(data.loc[:,name_bbp]<np.median(data.loc[:,name_bbp]))] # find median bbp of profile
mod = smf.quantreg('bbp470 ~ ' + str(name_z),
part_prof).fit(q=0.01) # Find model to 1 percentile
y_pred = mod.predict(part_prof.loc[:,name_z]) # Create predicted bbp_nap
part_prof.loc[:,'bbp_back'] = y_pred.values # Predicted bbp NAP from linear trend
z_lim = part_prof.loc[(part_prof.loc[:,'bbp_back'].div(part_prof.loc[:,name_bbp])>=1), name_z].min()
# Find depth where bbp NAP and bbp intersect
data.loc[data[name_z]>=z_lim, 'bbp_back'] = data.loc[data[name_z]>=z_lim, name_bbp].tolist()
data.loc[data[name_z]<z_lim,'bbp_back'] = data.loc[data[name_z]==z_lim, name_bbp].values[0] #data.loc[data[name_z]<z_lim, name_z].mul(lr.slope).add(lr.intercept)
data.loc[:,'bbpphy'] = data.loc[:, name_bbp].sub(data.loc[:,'bbp_back']) # Subtract bbp NAP from bbp for bbp from phytoplankton
data.loc[(data['bbpphy']<0)|(data['depth']>z_lim),'bbpphy'] = 0 # Subtract bbp NAP from bbp for bbp from phytoplankton
return data['bbpphy'], z_lim
def bbp_to_cphy(bbp_data, sf):
data: Pandas Dataframe containing the profile data
name_bbp: name of the particulate backscattering variable in data
name_bbp_err: name of particulate backscattering error variable in data
returns: the data variable with particle backscattering [/m] converted into
phytoplankton carbon [mg/m^3].
cphy_data = bbp_data.mul(sf)
return cphy_data
(5.3 GB)
Name | Size | Download all |
5.3 GB | Preview Download |