Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Try the next-generation Data Catalog at catalog-beta.data.gov and help shape it with your feedback.

Training dataset for NABat Machine Learning V1.0

Metadata Updated: January 21, 2026

Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date January 12, 2026
Metadata Updated Date January 21, 2026

Metadata Source

Harvested from DOI USGS DCAT-US

Additional Metadata

Resource Type Dataset
Metadata Created Date January 12, 2026
Metadata Updated Date January 21, 2026
Publisher U.S. Geological Survey
Maintainer
Identifier http://datainventory.doi.gov/id/dataset/USGS_627ed4b2d34e3bef0c9a2f30
Data Last Modified 2022-07-07T00:00:00Z
Category geospatial
Public Access Level public
Bureau Code 010:12
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://ddi.doi.gov/usgs-data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Datagov Dedupe Retained 20260121062710
Harvest Object Id 53a29a9d-c25e-49d0-a4a1-627b29767a46
Harvest Source Id 2b80d118-ab3a-48ba-bd93-996bbacefac2
Harvest Source Title DOI USGS DCAT-US
Metadata Type geospatial
Old Spatial {"type": "Polygon", "coordinates": -142.3828, 23.8858, -142.3828, 61.6064, -59.4141, 61.6064, -59.4141, 23.8858, -142.3828, 23.8858}
Source Datajson Identifier True
Source Hash cb628a84eb4ae26d4f4e0ec20b42cb40427e085cf15d528375870c38dfc0e5d6
Source Schema Version 1.1
Spatial {"type": "Polygon", "coordinates": -142.3828, 23.8858, -142.3828, 61.6064, -59.4141, 61.6064, -59.4141, 23.8858, -142.3828, 23.8858}

Didn't find what you're looking for? Suggest a dataset here.