TITLE: A groundwater well database for Brazil (GWDBrazil) ABSTRACT: Sufficient spatiotemporal in-situ groundwater-level measurements are essential for sustainable water management. Despite their importance, lack of harmonized, quality-controlled datasets has hindered large-scale groundwater studies in Brazil. In collaboration with the Geological Survey of Brazil, we present the Groundwater Wells Database for Brazil (GWDBrazil), which consolidates and standardizes information from over 351,000 wells, with records dating from 1900 to 2024, including about 450 wells with continuous daily monitoring from 2010 to 2024. Cross-verification steps were applied to ensure data accuracy. GWDBrazil is available in both tabular form and vector points, comprising information such as location, well depth, and well purpose. The dataset also provides data to support integrated surface and groundwater management, such the distance from each well to the nearest river and aquifer information. This dataset is intended to serve as a valuable resource for researchers, decision-makers, and stakeholders, providing essential information to support comprehensive water management strategies in Brazil. FOLDER STRUCTURE: The dataset is organized into four main folders: 1. data - This folder contains the processed products derived from the study A Groundwater Well Database for Brazil (GWDBrazil). 1.1 csv - Tabular Data. 1.1.1 SIAGAS_data.csv - Final SIAGAS dataset. 1.1.2 SIAGAS_data_flagged.csv - Final SIAGAS dataset with flagged data. 1.1.3 Additional_data.csv - Supplementary data for surface water and groundwater interaction studies. 1.1.4 RIMAS_data_flagged - Final RIMAS dataset with error and outlier flags. 1.1.4.1 Rimas_IdWell.csv - Overview of the number of data available in the final RIMAS dataset. 1.1.4.2 Rimas_IdWell.csv - Final RIMAS dataset where each CSV represents a single well. Note: Some RIMAS wells may contain data prior to 2010 as they were used in previous SGB projects. 1.2 netCDF - Includes data from continuous groundwater level monitoring wells (2010 - 2024) in netCDF format. 1.2.1 rimas_groundwater_levels.nc - NetCDF equivalent of the RIMAS_data_flagged folder, excluding data with potential errors. The file is not in a regular grid format. 1.2.2 rimas_groundwater_levels.csv - CSV file with all data from the RIMAS_data_flagged folder, excluding data with potential errors. 1.2.3 rimas_groundwater_atts.csv - File with the locations (latitude and longitude) of the data in the RIMAS_data_flagged folder. 1.3 shapefile - Shapefile Data. 1.3.1 SIAGAS_data.shp - Shapefile equivalent of SIAGAS_data.csv 1.3.2 SIAGAS_data_flagged.shp - Shapefile equivalent of SIAGAS_data_flagged.csv 1.3.3 Additional_data.shp - Shapefile equivalent of Additional_data.csv 2. raw_data - This folder contains the original datasets extracted from Geological Survey of Brazil projects. 2.1 RIMAS - Data from the Integrated Groundwater Monitoring Network Project (RIMAS – in Portuguese: Rede Integrada de Monitoramento das Águas Subterrâneas; SGB, 2024a) 2.1.1 groundwater_level_monitoring - Groundwater data timeseries. 2.1.1.1 RimasWeb_Exportacao_Dados_Nivel_Dagua_IdWell.csv - Each CSV represents a unique well. 2.1.2 hydrochemical_monitoring - Hydrogeochemical data. 2.1.2.1 RimasWeb_Exportacao_Dados_Analise_Quimica_IdWell.csv - Each CSV represents a unique well. 2.2 SIAGAS - Data from the Groundwater Information System (SIAGAS – in Portuguese: Sistema de Informações de Águas Subterrâneas; SGB, 2024b). 2.2.1 PT_amostra-fisico-quimica_EN_water_quality_data_physicochemical_analysis.csv - Water quality data. 2.2.2 PT_aquifero_EN_aquifer_data.csv - Aquifer-related data. 2.2.3 PT_dados_construtivos_EN_drilling_data.csv - Well construction data. 2.2.4 PT_dados_gerais_EN_general_data.csv - General well information. 2.2.5 PT_dados_hidraulicos_EN_pumping_data.csv - Pumping test data. 2.2.6 PT_litologia_EN_lithological_data.csv - Lithological well data. 3. supplementary_tables. 3.1 Table_S1-Definitions_of_attributes_in_SIAGAS_dataset - Definition of attributes from the SIAGAS project by the Geological Survey of Brazil. 3.2 Table_S2-Translation_of_terms_used_by_SGB(in Portuguese)_into_internationally_used_terms.xlsx - Translation of SIAGAS terms from Portuguese to internationally recognized terms. 3.3 Table_S3-Overlaid_Aquifers_and_Aquifer_Confinement.xlsx - Summary of aquifer layers per record and their confinement status based on raw data from the SIAGAS project. 3.4 Table_S4-No_well_records_step_quality_control.xlsx - Summary of the 9,655 records classified as non-wells that were removed during the quality control step. 3.5 Table_S5-Duplicate_records_step_quality_control.xlsx - Summary of the 4,711 records classified as duplicates that were removed during the quality control step. 3.6 Table_S6-Records_whitout_any_data_step_quality_control.xlsx - Summary of the 5,814 records removed during the quality control step due to the absence of any data indicating when they were drilled. 3.7 Table_S8-RIMAS_wells_step_quality_control.xlsx - Overview of the data from 453 RIMAS wells. 4. codes - This folder contains the main codes used for this study. Note: Some steps in this workflow were performed manually with support from members of the Brazilian Geological Survey. Nonetheless, the methodology is reproducible using the procedures detailed in the accompanying paper. 4.1 merge_and_standardization.R - Merges and standardizes the data. 4.2 check_aquifer_data.R - Verifies aquifer data in the database. 4.3 check_lithological_and_water_quality_data.R - Verifies lithological and water quality data in the database. 4.4 check_gw_data_RIMAS - Verifies water level data in the RIMAS dataset. 4.5 check_hydrochemical_data_RIMAS - Verifies hydrochemical data in the RIMAS dataset. 4.6 figures_and_analysis - Generates the main analyses and figures in the paper. 4.7 csv_to_NetCDF.ipynb - Jupyter Notebook (Python) to convert csv RIMAS data to NetCDF format. The output NetCDF file is not in a regular grid. 4.8 test_NetCDFfile.R - Tests the NetCDF files generated in this study. USAGE NOTE: The GWDBrazil dataset has wide-ranging applications. Users are strongly encouraged to read the accompanying paper A Groundwater Well Database for Brazil (GWDBrazil) before using the data. This will help understand the criteria used for data refinement and its limitations. Users should critically evaluate the level of detail and accuracy required for their specific applications. While extensive quality control has been applied in collaboration with the Geological Survey of Brazil, additional regional and local validation may be necessary for specific studies. CONTACT: For any questions or recommendations, please contact: Lead Author: J.G.S.M.U (gescilam@usp.br) Corresponding Author: P.T.S.O (paulotarsoms@gmail.com) CITATION: If you use this dataset, please cite it as follows: Uchôa, J.G.S.M., Oliveira, P.T.S., Ballarin, A.S., Gastmans, A., Anache, J.A.A., Scanlon, B.R.S., Camanho, C.R.C., Filho, V.J.F. & Wendland, E.C. A groundwater well database for Brazil (GWDBrazil) (Version 2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15098047 LICENSE: This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. You are free to share and adapt the material as long as appropriate credit is given. We recommend checking this README regularly for updates. ###### Overview of the variables in SIAGAS_data.csv ###### Type Header in the study Definition NA values Data Type General data ID_well Unique well ID 0.00% Integer Latitude Well latitude 0.00% Float Longitude Well longitude 0.00% Float ID_city_IBGE IBGE City ID 0.00% Integer City Brazilian city where the well is located 0.00% String State Brazilian state where the well is located 0.00% String Types_of_wells Type of well 3.53% String Well_status Well status 38.07% String Well_water_use Primary use of the groundwater 41.56% String Surface_elevation Surface elevation of the well [m.a.s.l] 83.81% Float Year_reported Year in which it was entered into SIAGAS system 2.58% String Last_updated_reported Last updated reported into SIAGAS system 81.66% String Drilling data Drilling_reported Drilling date 23.44% String Well_depth Well depth [m] 17.58% Float Pumping data Pumping_test_date Pumping test date 55.78% String Static_water_level Static water level [m] 55.48% Float Dynamic_water_level Dynamic water level [m] 63.85% Float Well_capacity Well capacity [m3/h] 60.28% Float Aquifer data Aquifer_broad General aquifer, or aquifer systems, that the well overlays 68.56% String Aquifer_confinement Aquifer confinement 74.52% String Lithological data Lithological_data Lithological records (Yes, NA) 63.58%* String Water quality data Water_quality_data Water quality data records (Yes, NA) 65.26%* String *For lithological data and water quality data, NA values were assigned if the data was not provided. ########################################################## ###### Overview of the variables in SIAGAS_data_flagged.csv ###### Same variables in the SIAGAS_data.csv with flagging system to indicate data quality: -1 for inconsistent data, and 0 for raw data (i.e., data where no inconsistencies were identified, or data quality could not be assessed) in 14 variables: ‘Latitude’; ‘Longitude’; ‘ID_city_IBGE’; ‘City’; ‘State’; ‘Surface_elevation’; ‘Year_reported’; ‘Last_updated_reported’; ‘Drilling_reported’; ‘Well_depth’; ‘Pumping_test_date’; ‘Static_water_level’; ‘Dynamic_water_level’ and ‘Well_capacity’ ################################################################## ###### Overview of the variables in Rimas_IdWell.csv ###### Header in the study Data Type Month String Day String Hour String Level (m) Float Flag_error Integer (0 or -1) Flag_outlier Integer (0 or -1) The error flag was set to -1 if the data met any of the following criteria: (i) duplicate values, (ii) absolute daily water level changes exceeding 10 meters, (iii) constant head values persisting for more than 30 days, or (iv) physically implausible readings (e.g., values exceeding well depth or negative values). The outlier flag was set to -1 for data identified as outliers by the HydroSight toolbox double exponential smoothing time-series model. ################################################################## ###### Overview of the variables in Additional_data.csv ###### Category Header in the study Definition Data Type Source Climate Mean_pr_an Mean annual precipitation (mm) Float (Xavier et al., 2022) Mean_eto_an Mean annual potential evapotranspiration (mm) Float (Xavier et al., 2022) Aridity_index Aridity index (-) Float (Xavier et al., 2022) Topography Sur_ele_ANADEM Surface Elevation from ANADEM (m.a.s.l) Float (Laipelt et al., 2024) Sur_ele_MERIT Surface Elevation from MERIT (m.a.s.l) Float (Yamazaki et al., 2017) LULC Main_landuse Main land use around the well String (MapBiomas, 2023) Aquifer Aq_permeability Log aquifer permeability (m2) Float (Huscroft et al., 2018) Aq_porosity Aquifer porosity [%] Float (Gleeson et al., 2014) Overlaying_aq Aquifer outcrop zone overlaid by the well String (ANA, 2016) Surface waters Dis_stream Distance to the closest stream 2nd order or higher (m) Float (Linke et al., 2019) ############################################################# REFERENCES: ANA - National Water and Sanitation Agency. (2016). Sistemas Aquíferos. https://metadados.snirh.gov.br/geonetwork/srv/api/records/3ec60e4f-85ea-4ba7-a90c-734b57594f90 Last acess: 10/30/2024. Huscroft, J., Gleeson, T., Hartmann, J., & Börker, J. (2018). Compiling and mapping global permeability of the unconsolidated and Consolidated Earth: Global Hydrogeology Maps 2.0 (GLHYMPS 2.0). Geophysical Research Letters, 45(4), 1897–1904. https://doi.org/10.1002/2017gl075860 Gleeson, T., Moosdorf, N., Hartmann, J., & van Beek, L. P. (2014). A glimpse beneath Earth’s surface: Global Hydrogeology maps (GLHYMPS) of permeability and porosity. Geophysical Research Letters, 41(11), 3891–3898. https://doi.org/10.1002/2014gl059856 Laipelt, L., Comini de Andrade, B., Collischonn, W., Amorim, A., Paiva, R. C., & Ruhoff, A. (2024). ANADEM: A Digital Terrain Model for South America. Remote Sensing. https://doi.org/10.3390/rs16132321 Linke, S., Lehner, B., Ouellet Dallaire, C., Ariwi, J., Grill, G., Anand, M., Beames, P., Burchard-Levine, V., Maxwell, S., Moidu, H., Tan, F., & Thieme, M. (2019). Global Hydro-environmental sub-basin and river reach characteristics at high spatial resolution. Scientific Data, 6(1). https://doi.org/10.1038/s41597-019-0300-6 MapBiomas. (2023). Coleção 8 da Série Anual de Mapas de Cobertura e Uso da Terra do Brasil. Retrieved from https://brasil.mapbiomas.org/colecoes-mapbiomas/ Last acess: 08/30/2024. SGB - Geological Survey of Brazil [Serviço Geológico do Brasil]. (2024a). Projeto Rede Integrada de Monitoramento das Águas Subterrâneas. Retrieved from https://rimasweb.sgb.gov.br/layout/apresentacao.php. Last acess: 07/30/2024. SGB - Geological Survey of Brazil [Serviço Geológico do Brasil]. (2024b). Sistema de Informações de Águas Subterrâneas. Retrieved from https://siagasweb.sgb.gov.br/layout/apresentacao.php. Last acess: 07/30/2024. Xavier, A. C., Scanlon, B. R., King, C. W. & Alves, A. I. (2022) New improved Brazilian daily weather gridded data (1961–2020). International Journal of Climatology. 42, 8390–8404. https://doi.org/10.1002/joc.7731 Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T., O’Loughlin, F., Neal, J. C., Sampson, C. C., Kanae, S., & Bates, P. D. (2017). A high‐accuracy map of global terrain elevations. Geophysical Research Letters, 44(11), 5844–5853. https://doi.org/10.1002/2017gl072874