Annual maps of cropland abandonment, land cover, and other derived data for time-series analysis of cropland abandonment

Crawford, Christopher L.; Yin, He; Radeloff, Volker C.; Wilcove, David S.

doi:10.5281/zenodo.5348287

Published March 26, 2022 | Version 1.0.0

Dataset Open

Annual maps of cropland abandonment, land cover, and other derived data for time-series analysis of cropland abandonment

1. Princeton University
2. Kent State University
3. University of Wisconsin - Madison

This archive contains raw annual land cover maps, cropland abandonment maps, and accompanying derived data products to support:

Crawford C.L., Yin, H., Radeloff, V.C., and Wilcove, D.S. 2022. Rural land abandonment is too ephemeral to provide major benefits for biodiversity and climate. Science Advances doi.org/10.1126/sciadv.abm8999.

An archive of the analysis scripts developed for this project can be found at: https://github.com/chriscra/abandonment_trajectories (https://doi.org/10.5281/zenodo.6383127).

Note that the label "_2022_02_07" in many file names refers to the date of the primary analysis. "dts” or “dt” refer to “data.tables," large .csv files that were manipulated using the data.table package in R (Dowle and Srinivasan 2021, http://r-datatable.com/). “Rasters” refer to “.tif” files that were processed using the raster and terra packages in R (Hijmans, 2022; https://rspatial.org/terra/; https://rspatial.org/raster).

Data files fall into one of four categories of data derived during our analysis of abandonment: observed, potential, maximum, or recultivation. Derived datasets also follow the same naming convention, though are aggregated across sites. These four categories are as follows (using “age_dts” for our site in Shaanxi Province, China as an example):

observed abandonment identified through our primary analysis, with a threshold of five years. These files do not have a specific label beyond the description of the file and the date of analysis (e.g., shaanxi_age_2022_02_07.csv);
potential abandonment for a scenario without any recultivation, in which abandoned croplands are left abandoned from the year of initial abandonment through the end of the time series, with the label “_potential” (e.g., shaanxi_potential_age_2022_02_07.csv);
maximum age of abandonment over the course of the time series, with the label “_max” (e.g., shaanxi_max_age_2022_02_07.csv);
recultivation periods, corresponding to the lengths of recultivation periods following abandonment, given the label “_recult” (e.g., shaanxi_recult_age_2022_02_07.csv).

This archive includes multiple .zip files, the contents of which are described below:

age_dts.zip - Maps of abandonment age (i.e., how long each pixel has been abandoned for, as of that year, also referred to as length, duration, etc.), for each year between 1987-2017 for all 11 sites. These maps are stored as .csv files, where each row is a pixel, the first two columns refer to the x and y coordinates (in terms of longitude and latitude), and subsequent columns contain the abandonment age values for an individual year (where years are labeled with "y" followed by the year, e.g., "y1987"). Maps are given with a latitude and longitude coordinate reference system. Folder contains observed age, potential age (“_potential”), maximum age (“_max”), and recultivation lengths (“_recult”) for all sites. Maximum age .csv files include only three columns: x, y, and the maximum length (i.e., “max age”, in years) for each pixel throughout the entire time series (1987-2017). Files were produced using the custom functions "cc_filter_abn_dt()," “cc_calc_max_age()," “cc_calc_potential_age(),” and “cc_calc_recult_age();” see "_util/_util_functions.R."
age_rasters.zip - Maps of abandonment age (i.e., how long each pixel has been abandoned for), for each year between 1987-2017 for all 11 sites. Maps are stored as .tif files, where each band corresponds to one of the 31 years in our analysis (1987-2017), in ascending order (i.e., the first layer is 1987 and the 31st layer is 2017). Folder contains observed age, potential age (“_potential”), and maximum age (“_max”) rasters for all sites. Maximum age rasters include just one band (“layer”). These rasters match the corresponding .csv files contained in "age_dts.zip.”
derived_data.zip - summary datasets created throughout this analysis, listed below.
diff.zip - .csv files for each of our eleven sites containing the year-to-year lagged differences in abandonment age (i.e., length of time abandoned) for each pixel. The rows correspond to a single pixel of land, and the columns refer to the year the difference is in reference to. These rows do not have longitude or latitude values associated with them; however, rows correspond to the same rows in the .csv files in "input_data.tables.zip" and "age_dts.zip." These files were produced using the custom function "cc_diff_dt()" (much like the base R function "diff()"), contained within the custom function "cc_filter_abn_dt()" (see "_util/_util_functions.R"). Folder contains diff files for observed abandonment, potential abandonment (“_potential”), and recultivation lengths (“_recult”) for all sites.
input_dts.zip - annual land cover maps for eleven sites with four land cover classes (see below), adapted from Yin et al. 2020 Remote Sensing of Environment (https://doi.org/10.1016/j.rse.2020.111873). Like “age_dts,” these maps are stored as .csv files, where each row is a pixel and the first two columns refer to x and y coordinates (in terms of longitude and latitude). Subsequent columns contain the land cover class for an individual year (e.g., "y1987"). Note that these maps were recoded from Yin et al. 2020 so that land cover classification was consistent across sites (see below). This contains two files for each site: the raw land cover maps from Yin et al. 2020 (after recoding), and a “clean” version produced by applying 5- and 8-year temporal filters to the raw input (see custom function “cc_temporal_filter_lc(),” in “_util/_util_functions.R” and “1_prep_r_to_dt.R”). These files correspond to those in "input_rasters.zip," and serve as the primary inputs for the analysis.
input_rasters.zip - annual land cover maps for eleven sites with four land cover classes (see below), adapted from Yin et al. 2020 Remote Sensing of Environment. Maps are stored as ".tif" files, where each band corresponds one of the 31 years in our analysis (1987-2017), in ascending order (i.e., the first layer is 1987 and the 31st layer is 2017). Maps are given with a latitude and longitude coordinate reference system. Note that these maps were recoded so that land cover classes matched across sites (see below). Contains two files for each site: the raw land cover maps (after recoding), and a “clean” version that has been processed with 5- and 8-year temporal filters (see above). These files match those in "input_dts.zip."
length.zip - .csv files containing the length (i.e., age or duration, in years) of each distinct individual period of abandonment at each site. This folder contains length files for observed and potential abandonment, as well as recultivation lengths. Produced using the custom function "cc_filter_abn_dt()" and “cc_extract_length();” see "_util/_util_functions.R."

derived_data.zip contains the following files:

"site_df.csv" - a simple .csv containing descriptive information for each of our eleven sites, along with the original land cover codes used by Yin et al. 2020 (updated so that all eleven sites in how land cover classes were coded; see below).
Primary derived datasets for both observed abandonment (“area_dat”) and potential abandonment (“potential_area_dat”).
- area_dat - Shows the area (in ha) in each land cover class at each site in each year (1987-2017), along with the area of cropland abandoned in each year following a five-year abandonment threshold (abandoned for >=5 years) or no threshold (abandoned for >=1 years). Produced using custom functions "cc_calc_area_per_lc_abn()" via "cc_summarize_abn_dts()". See scripts "cluster/2_analyze_abn.R" and "_util/_util_functions.R."
- persistence_dat - A .csv containing the area of cropland abandoned (ha) for a given "cohort" of abandoned cropland (i.e., a group of cropland abandoned in the same year, also called "year_abn") in a specific year. This area is also given as a proportion of the initial area abandoned in each cohort, or the area of each cohort when it was first classified as abandoned at year 5 ("initial_area_abn"). The "age" is given as the number of years since a given cohort of abandoned cropland was last actively cultivated, and "time" is marked relative to the 5th year, when our five-year definition first classifies that land as abandoned (and where the proportion of abandoned land remaining abandoned is 1). Produced using custom functions "cc_calc_persistence()" via "cc_summarize_abn_dts()". See scripts "cluster/2_analyze_abn.R" and "_util/_util_functions.R." This serves as the main input for our linear models of recultivation (“decay”) trajectories.
- turnover_dat - A .csv showing the annual gross gain, annual gross loss, and annual net change in the area (in ha) of abandoned cropland at each site in each year of the time series. Produced using custom functions "cc_calc_abn_diff()" via "cc_summarize_abn_dts()" (see "_util/_util_functions.R"), implemented in "cluster/2_analyze_abn.R." This file is only produced for observed abandonment.
Area summary files (for observed abandonment only)
- area_summary_df - Contains a range of summary values relating to the area of cropland abandonment for each of our eleven sites. All area values are given in hectares (ha) unless stated otherwise. It contains 16 variables as columns, including 1) "site," 2) "total_site_area_ha_2017" - the total site area (ha) in 2017, 3) "cropland_area_1987" - the area in cropland in 1987 (ha), 4) "area_abn_ha_2017" - the area of cropland abandoned as of 2017 (ha), 5) "area_ever_abn_ha" - the total area of those pixels that were abandoned at least once during the time series (corresponding to the area of potential abandonment, as of 2017), 6) "total_crop_extent_ha" - the total area of those pixels that were classified as cropland at least once during the time series, 7) "total_area_abn_remaining_2017" - duplicate of "area_abn_ha_2017," the area abandoned as of 2017 (ha), taken from "area_recult_threshold," 8) "total_initial_area_abn" - the sum of the initial area of each cohort of abandonment when it is first classified as "abandoned," i.e., at the 5 year mark (note that this is cumulative, and because it counts those pixels that were abandoned more than once, it is therefore larger than "area_ever_abn_ha"), taken from "area_recult_threshold" 9) "total_area_abn_recultivated_2017" - the area of abandoned land that was recultivated as of 2017 (cumulatively, i.e., "total_initial_area_abn" - "area_abn_ha_2017"), taken from "area_recult_threshold," 10) "proportion_recultivated" - the proportion of all abandoned cropland (including multiple periods per pixel) that was recultivated by 2017, taken from "area_recult_threshold," 11) "area_2017_as_prop_site" - area abandoned as of 2017 as a proportion of the total site area, 12) "area_2017_as_prop_total_crop" - area abandoned as of 2017 as a proportion of the total crop extent, 13) "area_2017_as_prop_crop87" - area abandoned as of 2017 as a proportion of cropland area in 1987, 14) "area_ever_abn_as_prop_site" - area ever abandoned as a proportion of the total site area, 15) "area_ever_abn_as_prop_total_crop" - area ever abandoned as a proportion of the total crop extent, 16) "area_ever_abn_as_prop_crop87" - area ever abandoned as a proportion of cropland area in 1987. See script "1_summary_stats.Rmd."
- area_recult_threshold - Contains data on the proportion of observed abandoned cropland area that is recultivated by the end of our time series. This includes the area of abandoned cropland as of 2017 ("total_area_abn_remaining_2017") and the sum of the initial area of each cohort of abandonment when it is first classified as abandoned (at year 5; "total_initial_area_abn"). This "total_initial_area_abn" is cumulative, and allows for pixels that were abandoned multiple times during the time series to be counted multiple times. The difference between these two columns yields the "total_area_abn_recultivated_2017," which in turn is used to calculate the "proportion_recultivated," and the (ascending) "order" of sites based on this proportion. This file includes recultivation stats for each site for three abandonment definitions: 5, 7, and 10 years. See script "1_summary_stats.Rmd."
- abn_lc_area_2017 - Contains the number of pixels and corresponding area (in ha) of abandoned cropland in the year 2017 at each site, according to the land cover class (either woody vegetation [2], or herbaceous vegetation [4]) and the age in 2017 (5 to 30 years). See script "cluster/6_lc_of_abn.R."
- abn_prop_lc_2017 - Contains the number of pixels and corresponding area (ha) of cropland abandoned in the year 2017 in each land cover type (woody vegetation [2], or herbaceous vegetation [4]). It also shows this area as a proportion of the total area abandoned at each site (i.e., in either land cover class: 2 or 4). See script "cluster/6_lc_of_abn.R."
Carbon
- carbon_df – contains the observed and potential carbon accumulation in abandoned croplands in each site in each year (in Mg C), for two abandonment thresholds: 5 years (our default abandonment definition) and 1 year (i.e., no threshold). Each data point corresponds to one of two scenarios (“type” column), either “observed” or “potential.” Carbon accumulation figures are for both the sum of forest and soil carbon at each site in a given year. Carbon accumulation is listed in three columns: 1) “C_up_to_20” contains the total carbon accumulated in those abandoned croplands with abandonment durations between 5 and 20 years. 2) “C_21_30” contains the total carbon accumulation in croplands with durations between 21 and 30 years, which are differentiated in order to account for non-linear carbon accumulation rates in soils over time, and 3) “total_C_Mg” contains the sum of the previous two columns, representing the total carbon accumulated across all abandoned croplands in each year.
- soc_mean – contains mean soil organic carbon accumulation rates for years 1-20 and years 21-80, derived from Sanderman et al. 2020 (in Mg C; https://doi.org/10.7910/DVN/HA17D3). These values correspond to accumulation rates in croplands upon abandonment and regeneration to natural vegetation (Sanderman et al. 2020’s “rewilding” scenario). These mean values are calculated across those pixels identified as cropland by Sanderman et al. 2020 at each site. Mean values in year 20 and 80 are contained in columns “mean_soc_20” and “mean_soc_80” respectively, and the annualized rate over the first 20 years and the subsequent years 21 through 80 are contained in columns “mean_annual_soc_1_20” and “mean_annual_soc_21_80” respectively.
Decay model data – two R data files containing data products for our linear models of abandonment recultivation trajectories.
- decay_endpoints_files – an R data file (.rds) containing seven data products produced as part of our common endpoint analysis, which calculated mean trajectories for each site across a range of common endpoints, ensuring that means were based on coefficient estimates derived from a consistent number of observations for each cohort. These files are:
  - common_endpoint_dat – a .csv containing subsets of “persistence_dat” for each “endpoint” (7 through 29).
  - endpoint_n – a .csv describing, for each endpoint, the corresponding number of observations per cohort (“n_obs”), the number of cohorts (“n_cohorts”), the total number of observations across cohorts included (“total_obs”), and the cohorts that meet the endpoint threshold (“cohorts”).
  - coef_l3_endpoints – corresponding model coefficients for our primary model (“l3”) parameterized by the range of subsets across endpoints.
  - augment_endpoints – fitted values (i.e., model predictions) for linear models produced across the full range of endpoint subsets.
  - fitted_endpoints – a simplified .csv containing the mean linear and log coefficients for each site at each endpoint, and the corresponding predicted proportion remaining abandoned through time (based on the “age,” or duration, of abandonment).
  - time_to_endpoints – a .csv containing, for mean trajectories for each endpoint at each site, the estimated time required for a given amount of abandoned cropland in a cohort to be recultivated (deciles, 10% through 100%).
  - endpoint_half_lives – a .csv containing the half-lives calculated for the mean trajectories for each endpoint at each site.
- decay_mod_archive - an R data file (.rds) containing eleven data products derived from linear models of abandonment recultivation ("decay"):
  - lm_mega_lin_log_lin_l – the primary linear model produced in our analysis. This model is referred to as “lin_log_lin” (or “l3”) because the model predicts linear persistence (“lin”) as a function of a log term of time (“log”) and a linear term of time (“lin”). “mega” refers to the fact that this model is run for the full dataset, pooled across all 11 sites.
  - coef_l3_mega – a .csv containing model coefficients for our primary linear model of recultivation (“lin_log_lin”, or “l3”), with a single row each for the linear term of time and the log term of time, for 26 cohorts at 11 sites.
  - mean_coef_l3_mega – a data frame containing the mean coefficient values for the log and linear terms of time across cohorts at each site. This also contains the mean of the low and high coefficient estimates, based on the 95% confidence interval.
  - half_lives_all_cohorts_l3 – half-lives calculated for each cohort at each site, for our primary model.
  - half_life_mean_coefs_l3 – half-lives calculated based on the mean trajectory for each site (based on the mean log coefficients and mean linear coefficients across all cohorts), for our primary model.
  - mod_AIC_mega – Akaike Information Criterion (AIC) values for all tested model specifications.
  - fitted_combo – fitted values (i.e., model predictions) for our primary model (“l3”) and a series of alternative model specifications (“l3_trim” – excluding cohorts with fewer than 5 observations; “lin_log” – a model including only one log time term; “log2_lin” – in which the log of persistence is predicted by log and linear time terms; and “l3_no_cohort” – our primary model, predicting linear persistence as a function of log time and linear time, but without cohort-level fixed effects).
  - time_to_combo – contains the estimated time required for a certain amount of abandoned cropland in a cohort to be recultivated (deciles, 10% through 100%). See script "2_decay_models.Rmd." These values are calculated for a range of alternative model specifications ("l3_trim", “lin_log”, "log2_lin", and "l3_no_cohort"; see above).
Length data – includes “_distill_df” files and “mean_length_df” files for observed, potential, and recultivation.
- length_distill_df - .csvs containing the number ("freq") of abandonment periods of a specific "length" of time (i.e., age) at each site over the course of the entire time series. Derived from the "length" files in "length.zip." See script "cluster/5_distill_lengths.R."
- mean_length_df - .csvs with the mean, median, and standard deviation, for each site, for both "all" lengths or just the "max" length per pixel, and for a range of abandonment definitions (1, 3, 5, 7, and 10 years). Derived from "length_distill_df." See script "1_summary_stats.Rmd."
Duration summary files – includes “summary_stats_all_sites” and “summary_stats_all_sites_pooled,” for observed and potential abandonment, and recultivation periods following abandonment.
- “summary_stats_all_sites” - A simple .csv derived from "mean_length_df" files containing summary stats across the 11 sites. This includes the mean of the mean abandonment duration ("length", in years) for each of our 11 sites ("mean_of_means"), the standard deviation of these site mean abandonment lengths ("sd_of_means"), the mean of the standard deviation at each site ("mean_of_sds"), the mean median ("mean_of_medians"), and the mean number of abandonment periods ("mean_n_abn_periods"). Note that length "all" indicates that these stats account for all periods (including multiple per pixel), rather than just the max duration per pixel. See script "1_summary_stats.Rmd."
- “summary_stats_all_sites_pooled” - A summary .csv similar to "summary_stats_all_sites," but calculated by pooling all distinct periods of abandonment across all eleven sites, and then calculating the mean, median, and standard deviation of abandonment duration. See script "1_summary_stats.Rmd."
Comparing annual approach to identifying abandonment to a two-timepoint (“2yr”) approach:
- abn_2yr_ages_df - Contains the age of former croplands identified as "abandoned" using a two-timepoint method (i.e., 2017 - 1987), where age values (as of 2017) are derived from our map of abandonment identified using the full annual time series. This includes the area in hectares (ha), in each age class (along with the number of pixels), at each of our 11 sites. This dataset is used to calculate the percent of cropland "abandonment" identified using the two-year method that is actually too "young," i.e., less than 5 years old, and therefore not truly abandonment according to our five-year abandonment definition
- abn_2yr_overestimation - Compares the area (in hectares) of cropland abandonment at each site identified with our full annual time series (and a five-year abandonment definition) and the "abandonment" identified using a two-timepoint method (2017-1987). This also includes the percent difference in area between the two methods, the Jaccard similarity of the areas identified as abandonment, and the percent of "young" (i.e., <5-year-old) "abandonment" identified by the two-timepoint method.

Input land cover maps:

As noted, the file "input_rasters.zip" contain the raw annual land cover maps for eleven sites generated by:

Yin, H., A. Brandão, J. Buchner, D. Helmers, B. G. Iuliano, N. E. Kimambo, K. E. Lewińska, E. Razenkova, A. Rizayeva, N. Rogova, S. A. Spawn, Y. Xie, and V. C. Radeloff. 2020. Monitoring cropland abandonment with Landsat time series. Remote Sensing of Environment 246:111873. https://doi.org/10.1016/j.rse.2020.111873

These land cover maps served as raw inputs for this project and form the basis of the analysis.

All land cover maps have a resolution of 30-m and exist for each year from 1987 through 2017. The exceptions are Nebraska / Wyoming (1986-2018) and Wisconsin (1987-2018); these additional years were excluded from our analysis of abandonment duration.

Land cover categories in these maps are coded as follows:

Non-vegetated area (e.g., water, urban, barren land)
Woody vegetation (e.g., forests)
Cropland
Herbaceous vegetation (e.g., grassland)

Site file names correspond to the following geographic locations:

belarus = Vitebsk, Belarus / Smolensk, Russia
bosnia_herzegovina = Bosnia & Herzegovina
chongqing = Chongqing, China
goias = Goiás, Brazil
iraq = Iraq
mato_grosso = Mato Grosso, Brazil
nebraska = Nebraska / Wyoming, USA
orenburg = Orenburg, Russia / Uralsk, Kazakhstan
shaanxi = Shaanxi/Shanxi, China
volgograd = Volgograd, Russia
wisconsin = Wisconsin, USA

This dataset is minimally altered from Yin et al. 2020. However, land cover codes were updated for five sites (Iraq, Nebraska/Wyoming, Orenburg/Uralsk, Volgograd, and Wisconsin) in order to maintain consistency in how land cover was coded across all sites. The original land cover codes (matching Yin et al. 2020) are described in the file "site_df.csv" and are as follows:

Iraq: 1 Non-vegetated; 2 Cropland; 3 Woody; 4 Herbaceous
Nebraska / Wyoming (USA): 1 Cropland; 2 Woody; 3 Non-vegetated; 4 Herbaceous
Orenburg, Russia / Uralsk, Kazakhstan: 1 Non-vegetated; 2 Cropland; 3 Herbaceous; 4 Woody
Volgograd (Russia): 1 Non-vegetated; 2 Cropland; 3 Herbaceous; 4 Woody
Wisconsin (USA): 1 Cropland; 2 Herbaceous; 3 Woody; 4 Non-vegetated

Notes

This work was supported by the High Meadows Foundation and the NASA Land Cover and Land Use Change Program (Grant no. 80NSSC18K0343) and analyses were performed using Princeton Research Computing resources at Princeton University.

Files

age_dts.zip

Files (38.6 GB)

Name	Size	Download all
age_dts.zip md5:16c42e6de0b35e4a6f53c430aac67957	7.7 GB	Preview Download
age_rasters.zip md5:29ba03f5c725d62e63dfd1793b944f87	5.3 GB	Preview Download
derived_data.zip md5:28e0f7a7692b0590765fe34580dbe612	8.4 MB	Preview Download
diff.zip md5:ab7db84cf313c488975e720d427d25ef	6.1 GB	Preview Download
input_dts.zip md5:8e6a48799652a05fbb00ba201513c845	14.1 GB	Preview Download
input_rasters.zip md5:b75693d56bdfd063c5ed8fba3e3678c3	5.2 GB	Preview Download
length.zip md5:31c568cd8a160ddf0c9af340ce7a838f	245.5 MB	Preview Download

Additional details

Yin, H., A. Brandão, J. Buchner, D. Helmers, B. G. Iuliano, N. E. Kimambo, K. E. Lewińska, E. Razenkova, A. Rizayeva, N. Rogova, S. A. Spawn, Y. Xie, and V. C. Radeloff. 2020. Monitoring cropland abandonment with Landsat time series. Remote Sensing of Environment 246:111873. https://doi.org/10.1016/j.rse.2020.111873
Sanderman, J. Woolf, D., Lehmann, J., Rivard, C., Poggio, L., Heuvelink, G., Bossio, D. 2020. Soils Revealed soil carbon futures. doi:10.7910/DVN/HA17D3.
Hijmans, R. J. 2022. Terra: Spatial data analysis. https://rspatial.org/terra/
Dowle, M., Srinivasan, A. 2021 Data.table: Extension of 'data.frame.' http://r- datatable.com/

	All versions	This version
Views	769	769
Downloads	501	501
Data volume	5.6 TB	5.6 TB

Annual maps of cropland abandonment, land cover, and other derived data for time-series analysis of cropland abandonment

Authors/Creators

Description

Notes

Files

age_dts.zip

Files (38.6 GB)

Additional details

References