Model outputs and model code for machine learning models forecasting streamflow drought across the Conterminous United States

Published by U.S. Geological Survey | Department of the Interior | Catalog Last Checked: May 05, 2026 at 09:21 PM | Dataset Last Updated: September 19, 2025 at 12:00 AM

We applied machine learning (ML) models to forecast streamflow drought from 1 to 13 weeks into the future at more than 3,000 streamgage locations across the Conterminous United States. We applied two machine learning methods (long short-term memory [LSTM] neural network; light gradient boosting [LightGBM] machine) and two benchmark model approaches (persistence; autoregressive integrated moving average [ARIMA]) to predict weekly streamflow percentiles with independent models for each forecast horizon. Both ML models were trained using all percentiles (LSTM-all, LightGBM-all) and only percentiles below 30% (LSTM<30, LightGBM<30). We evaluated model performance regionally and nationally for drought occurrence (the classification performance for a future date) and for drought onset/termination (performance identifying drought starts and ends). This data release contains two zipped archives, one for model outputs (model_outputs.zip), the other for model code (model_code.zip). The model_outputs.zip folder contains files with feature importance data (importance_data.feather); model performance data (performance_data.feather); observed and predicted streamflow percentile time series data (timeseries_data.feather); additional explanatory data (explanatory_data.feather) that were used in the related primary publication to interpret model outputs in this data release; and static watershed attribute data (conus_static_inputs_gages.csv) that were used as model inputs. The model_code.zip folder contains three folders for the different modeling approaches (LSTM, LightGBM, benchmark). The benchmark folder contains two Jupyter notebooks, one for running the persistence models, the other for running the ARIMA models. The LSTM and LightGBM folders contain Jupyter notebooks, with R and Python scripts, for running the LSTM and LightGBM models.

Resources

2 resources available

Digital Data

XML

Visit Page
Original Metadata

XML

Download

Find Related Datasets

Search by Tags

Click any tag below to search for similar datasets

Complete Metadata

@type	dcat:Dataset
accessLevel	public
bureauCode	[ "010:12" ]
contactPoint	{ "fn": "Ryan R McShane", "@type": "vcard:Contact", "hasEmail": "mailto:rmcshane@usgs.gov" }
description	We applied machine learning (ML) models to forecast streamflow drought from 1 to 13 weeks into the future at more than 3,000 streamgage locations across the Conterminous United States. We applied two machine learning methods (long short-term memory [LSTM] neural network; light gradient boosting [LightGBM] machine) and two benchmark model approaches (persistence; autoregressive integrated moving average [ARIMA]) to predict weekly streamflow percentiles with independent models for each forecast horizon. Both ML models were trained using all percentiles (LSTM-all, LightGBM-all) and only percentiles below 30% (LSTM<30, LightGBM<30). We evaluated model performance regionally and nationally for drought occurrence (the classification performance for a future date) and for drought onset/termination (performance identifying drought starts and ends). This data release contains two zipped archives, one for model outputs (model_outputs.zip), the other for model code (model_code.zip). The model_outputs.zip folder contains files with feature importance data (importance_data.feather); model performance data (performance_data.feather); observed and predicted streamflow percentile time series data (timeseries_data.feather); additional explanatory data (explanatory_data.feather) that were used in the related primary publication to interpret model outputs in this data release; and static watershed attribute data (conus_static_inputs_gages.csv) that were used as model inputs. The model_code.zip folder contains three folders for the different modeling approaches (LSTM, LightGBM, benchmark). The benchmark folder contains two Jupyter notebooks, one for running the persistence models, the other for running the ARIMA models. The LSTM and LightGBM folders contain Jupyter notebooks, with R and Python scripts, for running the LSTM and LightGBM models.
distribution	[ { "@type": "dcat:Distribution", "title": "Digital Data", "format": "XML", "accessURL": "https://doi.org/10.5066/P132NSWY", "mediaType": "application/http", "description": "Landing page for access to the data" }, { "@type": "dcat:Distribution", "title": "Original Metadata", "format": "XML", "mediaType": "text/xml", "description": "The metadata original format", "downloadURL": "https://data.usgs.gov/datacatalog/metadata/USGS.687685d8d4be020e2014bdb5.xml" } ]
identifier	http://datainventory.doi.gov/id/dataset/USGS_687685d8d4be020e2014bdb5
keyword	[ "USGS:687685d8d4be020e2014bdb5", "autoregressive integrated moving average", "deep learning", "drought", "gradient boosting machine", "hydrology", "inlandWaters", "long short-term memory", "machine learning", "modeling", "neural network", "streamflow", "time series", "water resources" ]
modified	2025-09-19T00:00:00Z
publisher	{ "name": "U.S. Geological Survey", "@type": "org:Organization" }
spatial	-124.7564516, 24.5232175, -66.9490904, 49.384487
theme	[ "geospatial" ]
title	Model outputs and model code for machine learning models forecasting streamflow drought across the Conterminous United States

Have questions or suggestions about this dataset? Reach out to the contact below.

Location