Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California

Published by U.S. Geological Survey | Department of the Interior | Catalog Last Checked: May 05, 2026 at 07:33 PM | Dataset Last Updated: August 26, 2020 at 12:00 AM

The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993–2014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86–100 percent, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95 percent, and ROC values were 0.87–0.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97 percent, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The predictive performance of both the DO and Mn models was reasonable, considering all three of these fit metrics and the low percentages of low-DO and high-Mn events in the data. See associated journal article (Rosecrans and others, 2017) for complete summary of BRT modeling methods, model fit metrics, and relative influence of predictor variables for a given DO or Mn BRT model. The modeled response variables for the DO BRT models were based on measured DO values from wells at the following thresholds: <0.5 milligrams per liter (mg/L), <1.0 mg/L, and <2.0 mg/L, and these thresholds values were considered anoxic based on literature reviews. The modeled response variables for the Mn BRT models were based on measured Mn values from wells at the following exceedance thresholds: >50 micrograms per liter (µg/L), >150 µg/L, and >300 µg/L. (The 150 µg/L manganese threshold represents one-half the USGS HBSL.) The prediction grid discretization below land surface was in 15-m intervals to a depth of 122 m, followed by intervals of 30 m to a depth of 300 m, resulting in 14 two-dimensional probability grids for each constituent (DO and Mn) and threshold. Probability grid maps were also created for the shallow aquifer and deep aquifer represented by the median domestic and public-supply well depths, respectively. A depth of 46 m was used to stratify wells from the training dataset into the shallow and deep aquifer and was derived from depth percentiles associated with domestic and public supply in previous work by Burow and others (2013). In this work, the median well depth categorized as domestic was 30 m below land surface (bls), and the median well depth categorized as public-supply wells was 100 m bls. Therefore, datasets contained in the folders named "DO BRT prediction grids.zip" and "Mn BRT prediction grids.zip" each have 42 probability grids representing specific depths for each of the selected thresholds of DO and Mn BRT threshold models described above. The dataset contained in the folder named "PublicSupply&DomesticGrids.zip" contains probability grids represented by the domestic and public-supply drinking water depths for each of the six BRT models described above (12 grids total).

Resources

2 resources available

Digital Data

XML

Visit Page
Original Metadata

XML

Download

Find Related Datasets

Search by Tags

Click any tag below to search for similar datasets

Complete Metadata

@type	dcat:Dataset
accessLevel	public
bureauCode	[ "010:12" ]
contactPoint	{ "fn": "Celia Z. Rosecrans", "@type": "vcard:Contact", "hasEmail": "mailto:crosecrans@usgs.gov" }
description	The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993–2014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86–100 percent, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95 percent, and ROC values were 0.87–0.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97 percent, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The predictive performance of both the DO and Mn models was reasonable, considering all three of these fit metrics and the low percentages of low-DO and high-Mn events in the data. See associated journal article (Rosecrans and others, 2017) for complete summary of BRT modeling methods, model fit metrics, and relative influence of predictor variables for a given DO or Mn BRT model. The modeled response variables for the DO BRT models were based on measured DO values from wells at the following thresholds: <0.5 milligrams per liter (mg/L), <1.0 mg/L, and <2.0 mg/L, and these thresholds values were considered anoxic based on literature reviews. The modeled response variables for the Mn BRT models were based on measured Mn values from wells at the following exceedance thresholds: >50 micrograms per liter (µg/L), >150 µg/L, and >300 µg/L. (The 150 µg/L manganese threshold represents one-half the USGS HBSL.) The prediction grid discretization below land surface was in 15-m intervals to a depth of 122 m, followed by intervals of 30 m to a depth of 300 m, resulting in 14 two-dimensional probability grids for each constituent (DO and Mn) and threshold. Probability grid maps were also created for the shallow aquifer and deep aquifer represented by the median domestic and public-supply well depths, respectively. A depth of 46 m was used to stratify wells from the training dataset into the shallow and deep aquifer and was derived from depth percentiles associated with domestic and public supply in previous work by Burow and others (2013). In this work, the median well depth categorized as domestic was 30 m below land surface (bls), and the median well depth categorized as public-supply wells was 100 m bls. Therefore, datasets contained in the folders named "DO BRT prediction grids.zip" and "Mn BRT prediction grids.zip" each have 42 probability grids representing specific depths for each of the selected thresholds of DO and Mn BRT threshold models described above. The dataset contained in the folder named "PublicSupply&DomesticGrids.zip" contains probability grids represented by the domestic and public-supply drinking water depths for each of the six BRT models described above (12 grids total).
distribution	[ { "@type": "dcat:Distribution", "title": "Digital Data", "format": "XML", "accessURL": "https://doi.org/10.5066/F7T151S1", "mediaType": "application/http", "description": "Landing page for access to the data" }, { "@type": "dcat:Distribution", "title": "Original Metadata", "format": "XML", "mediaType": "text/xml", "description": "The metadata original format", "downloadURL": "https://data.usgs.gov/datacatalog/metadata/USGS.57f433c3e4b0bc0bec033fc9.xml" } ]
identifier	http://datainventory.doi.gov/id/dataset/USGS_57f433c3e4b0bc0bec033fc9
keyword	[ "Boosted Regression Trees", "California", "Central Valley, California", "Domestic Well Water Use", "Drinking Water Use", "Groundwater", "Hydrogeology", "Predictive Modeling", "Public Supply Water Use", "Statistical Analysis", "USGS:57f433c3e4b0bc0bec033fc9", "United States", "Visualization Methods", "Water Quality" ]
modified	2020-08-26T00:00:00Z
publisher	{ "name": "U.S. Geological Survey", "@type": "org:Organization" }
spatial	-122.904, 34.957, -118.53, 40.63
theme	[ "geospatial" ]
title	Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California

Have questions or suggestions about this dataset? Reach out to the contact below.

Location