Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Histogram-based gradient boosted regression tree model of mean ages of shallow well samples in the Great Lakes Basin, USA

Metadata Updated: July 20, 2024

Green and others (2021) developed a gradient boosted regression tree model to predict the mean groundwater age, or travel time, for shallow wells across a portion of the Great Lakes basin in the United States. Their study applied machine learning methods to predict ages in wells using well construction, well chemistry, and landscape characteristics. For a dataset of age tracers in 961 water samples, the mean travel time from the land surface to the sample location (center of saturated open interval) was estimated for each sample using parametric functions. The mean travel times were then modeled using a gradient boosting machine algorithm with cross validation tuning of model hyperparameters. The model contained in this data release was converted from the original model in the R language developed by Green and others (2021) to a python-based histogram-based gradient boosting regression tree (HGBRT) model (Pedregosa and others, 2011). Conversion to python facilitate the model's use as a support model for a groundwater nitrate decision support tool by Juckem and others (2024). The hyperparameters of the HGBRT model were adjusted using a Bayesian optimization algorithm (Head and others, 2021), with a goal of producing similar results as the original model by Green and others (2021). A total of 72 predictor variables were used for model development, including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, water quality constituents, and land use. Model results indicate that the mean of the natural logarithm of mean groundwater age for the wells used to train and test the model is 3.39 ln(years) with a root mean square error (RMSE) of 0.76 ln(years) for the holdout data to the HGBRT model. The RMSE for the holdout data (0.76) is similar to the RMSE from the original model for holdout data (0.84) reported by Green and others (2021). When the simulated values from the HGBRT model are back transformed from log space, the mean groundwater age is 55.9 years with an RMSE of 35.4 years for the testing data (Green and others, 2021 do not report matching results; simulated ages are reported for predicted ages at 14,335 non-sampled wells). The increased relative RMSE for back transformed ages reflects increasing error as values increase in the untransformed values. Aside from the overall HGBRT methods contained as part of a python script, this data release includes a self-contained model directory for recreating the HGBRT model published in this data release. Three directories are available within this data release that define: 1. python attributes and input predictor variables, 2. model input and 3. the output model. The output directory also includes a model object (age_ml_model.joblib) for the HGBRT model used to predict the natural logarithm of the mean groundwater age. This model object is used directly by the groundwater nitrate decision support tool by Juckem and others (2024).

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date July 20, 2024
Metadata Updated Date July 20, 2024

Metadata Source

Harvested from DOI EDI

Additional Metadata

Resource Type Dataset
Metadata Created Date July 20, 2024
Metadata Updated Date July 20, 2024
Publisher U.S. Geological Survey
Maintainer
@Id http://datainventory.doi.gov/id/dataset/b55be493501efd331e7a15a287e44fc5
Identifier USGS:6500b48dd34ed30c2057f9b5
Data Last Modified 20240320
Category geospatial
Public Access Level public
Bureau Code 010:12
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://datainventory.doi.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 3647b138-987a-4cd1-8068-be1ccba6d72d
Harvest Source Id 52bfcc16-6e15-478f-809a-b1bc76f1aeda
Harvest Source Title DOI EDI
Metadata Type geospatial
Old Spatial -93.6255,40.9301,-73.3447,48.5457
Publisher Hierarchy White House > U.S. Department of the Interior > U.S. Geological Survey
Source Datajson Identifier True
Source Hash 9bcf219cf4c6f1c202a743c6f6247f1c23b8c9780d4768c2cb80fff2e7eaf899
Source Schema Version 1.1
Spatial {"type": "Polygon", "coordinates": -93.6255, 40.9301, -93.6255, 48.5457, -73.3447, 48.5457, -73.3447, 40.9301, -93.6255, 40.9301}

Didn't find what you're looking for? Suggest a dataset here.