Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Try the next-generation Data Catalog at catalog-beta.data.gov and help shape it with your feedback.

Supplementary material for Lee et al. in review: Harmonization and Revision of a National Diatom Dataset for Use in the Development of Water Quality Indicators

Metadata Updated: December 13, 2025

ABSTRACT Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008-2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variability explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove the analyst signal, this work clarifies the extent of the problem and provides a method to minimize analyst signal. Resolution of these taxonomic issues makes large datasets such as the NRSA more suitable for the development of diatom-based water quality indicators. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019). NOTE: This dataset has been removed from public access due to revocation. Please refer inquiries regarding this dataset to the listed contact person.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

No file downloads have been provided. The publisher may provide downloads in the future or they may be available from their other links.

Dates

Metadata Created Date November 12, 2020
Metadata Updated Date December 13, 2025

Metadata Source

Harvested from EPA ScienceHub

Additional Metadata

Resource Type Dataset
Metadata Created Date November 12, 2020
Metadata Updated Date December 13, 2025
Publisher U.S. Environmental Protection Agency
Maintainer
Identifier https://doi.org/10.23719/1502631
Data Last Modified 2018-10-09
Public Access Level public
Bureau Code 020:00
Schema Version https://project-open-data.cio.gov/v1.1/schema
Harvest Object Id 9e4fa4ad-e9ac-4bfa-b914-fc50d2104459
Harvest Source Id 04b59eaf-ae53-4066-93db-80f2ed0df446
Harvest Source Title EPA ScienceHub
License https://pasteur.epa.gov/license/sciencehub-license.html
Program Code 020:096
Publisher Hierarchy U.S. Government > U.S. Environmental Protection Agency
Source Datajson Identifier True
Source Hash c8be317ab0deb21c5707a5a98be55d7e6bdfd44ee2288e15b89bfbf935195d7e
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.