Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Data and Model Archive for Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York

Metadata Updated: December 11, 2025

Data and preliminary machine-learning models used to predict manganese and 1,4-dioxane in groundwater on Long Island are documented in this data release. Concentration data used to develop the models were from 910 wells for manganese and 553 wells for 1,4-dioxane, primarily public supply wells, from U.S. Geological Survey, U.S. Environmental Protection Agency (USEPA), and Suffolk County Water Authority sources. Thirty-two explanatory variables describe depth, groundwater flow, land use, soil properties, and other features of the aquifer system. The models use XGBoost, an ensemble tree machine learning method. Four models are documented for manganese, predicting the probability of concentrations relative to four thresholds: 10 micrograms per liter (detection), 50 micrograms per liter (the USEPA Secondary Maximum Contaminant Level), 150 micrograms per liter, and 300 micrograms per liter (the USEPA lifetime health advisory). One model is documented for 1,4-dioxane, predicting the probability of concentrations relative to 0.07 micrograms per liter (detection). The models were used to predict concentrations in two layers of the upper glacial aquifer and three layers of the Magothy aquifer. Predictions were made at a 500-square-foot resolution across the entire island for manganese and across Suffolk County, which occupies the eastern two-thirds of Long Island, for 1,4-dioxane.
The data are provided in data tables, raster files, and model files. One data table describes the 32 explanatory variables (LI_mn_14dx_exp_vars.txt). One data table describes the well data and includes the manganese and 1,4-dioxane concentrations, explanatory variables, and predictions for the wells (LI_mn_14dx_well_data.txt). There is a compressed group (zip file) of five files providing the explanatory variable data used to make predictions for the five aquifer layers (LI_mn_14dx_predinput_griddata.zip) and a zip file of 25 files providing model predictions for each model and aquifer layer (LI_mn_14dx_predoutput_rasters.zip). The data release also contains a tif-format raster file of the prediction grid (LI_mn_14dx_prediction_grid.tif). The models are documented in a zip file (LI_mn_14dx_models.zip) that contains the model object files (R data format) and scripts that can be used to run the models to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by names and numbers as follows: 1_upper_glacial, top layer of the upper glacial aquifer; 3_upper_glacial, bottom layer of the upper glacial aquifer; 5_Magothy, top layer of the Magothy aquifer; 14_Magothy, middle layer of the Magothy aquifer; and 23_Magothy, bottom layer of the Magothy aquifer.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date September 14, 2025
Metadata Updated Date December 11, 2025

Metadata Source

Harvested from DOI USGS DCAT-US

Additional Metadata

Resource Type Dataset
Metadata Created Date September 14, 2025
Metadata Updated Date December 11, 2025
Publisher U.S. Geological Survey
Maintainer
Identifier http://datainventory.doi.gov/id/dataset/usgs-614d0486d34e0df5fb986a43
Data Last Modified 2023-03-22T00:00:00Z
Category geospatial
Public Access Level public
Bureau Code 010:12
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://ddi.doi.gov/usgs-data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id c4e19efd-9cbd-4301-8db3-14c1a8a89e54
Harvest Source Id 2b80d118-ab3a-48ba-bd93-996bbacefac2
Harvest Source Title DOI USGS DCAT-US
Metadata Type geospatial
Source Datajson Identifier True
Source Hash ebfab9750a30f2f150df85abe5b5c5d7b6d3043fd8450493d3a13c4b3d1c3370
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.