Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Try the next-generation Data Catalog at catalog-beta.data.gov and help shape it with your feedback.

This is a Non-Federal dataset covered by different Terms of Use than Data.gov.

California Black Rail Species Habitat Model (Categorical) - RADMAP [ds3251]

Metadata Updated: February 23, 2026

The Range and Distribution Mapping and Analysis Project (RADMAP) in the California Department of Fish and Wildlife’s (CDFW) Biogeographic Data Branch (BDB) develops and maintains spatial models for use in conservation decision making, including species range maps and species habitat models. RADMAP is building a library of vetted species range maps and habitat models within California for use by CDFW staff and partners.

The categorical species habitat model (SHM) is derived from a continuous SHM, splitting the continuous model output into predicted habitat and non-habitat. This map simplifies continuous SHMs, categorizing predicted habitat into areas with high, medium, and low relative probabilities of supporting the species. Habitat (values higher than the minimum training presence threshold) is divided into low, medium, and high categories based on expert-chosen predicted thresholds. The focal taxon may or may not actually occur in areas predicted with a high relative probability of habitat use; habitat may be suitable but unoccupied, particularly for taxa with small and/or declining populations or limited mobility. Models may not accurately reflect all habitats used per taxon due to a dearth of presence data, thereby limiting the scope of environmental space represented by the SHM. Users should refer to the validation metrics and consider the level of uncertainty associated with the model when interpreting model outputs. Areas with high relative predicted values indicate areas where the habitat is most likely to support the species and may be prioritized for locating survey or monitoring sites for scientific studies aiming to conserve and protect focal taxa.

Occurrence records were obtained from various sources, as indicated below. To reduce sampling bias and avoid model overfitting, which can reduce model applicability to unsampled areas, we excluded spatially autocorrelated presence records. Owing to the breadth of our study region, we accounted for topographical heterogeneity using a digital elevation model and filtered occurrence records based on this new raster at three distinct distances. Areas with species occurrences and high topographical heterogeneity were filtered at closer Euclidean distances. For potentially ecologically relevant environmental covariate inputs we computed a Pearson correlation coefficient matrix to assess the strength of association among variables; those that were highly autocorrelated (greater than or equal to 0.7) were removed from further analyses and the most ecologically relevant variables kept. All covariates were continuous and formatted at a resolution of 30 m.

Potential habitat use was estimated using a maximum entropy approach (Maxent) implemented via R language that relied on presence data and comparisons between environmental covariate values at presence localities and those at randomly selected background sites (Phillips et al. 2006). To demarcate the specific geographical area used for model calibration, background locations were selected via local adaptive convex-hull polygons based on known species’ dispersal and movement limitations. This allowed RADMAP to exclude uncolonized suitable habitat, potentially due to dispersal barriers or inhibitory biotic interactions; it also precluded overfitting models to environmental conditions immediately adjacent to occurrence records, thus improving predictive performance. For each model, we randomly sampled > 10,000 background locations. Five feature-class combinations and five regularization-multiplier settings were adjusted rather than using default Maxent settings to improve model fit. We considered a range of regularization multipliers in integer-sized increments from 1 to 5, then divided data into training and test groups using geographically structured k-fold cross validation (k = 4) to reduce overfitting to environmental conditions among spatial partitions. A total of 25 models were run during this phase of model development. Additionally, 25 full models were run using all available presence data.

Models were evaluated with multiple statistics, including test omission error rate (OER) and true skill statistic (TSS), which are threshold-dependent metrics. Second, we generated receiver-operating-characteristic curves and assessed model performance using area-under-the-curve (AUC) analyses for test data, a threshold-independent metric. AUC calculates an average value for the k-folds used on the analysis and assesses the difference between AUC training and test data (AUCdiff), the latter of which is used to quantify overfitting (i.e., lower values indicate better fits). The training model with the lowest OER and highest AUC and TSS values for both test and training models was considered the top model. We calculated the contribution percent of each predictor variable to the top model to identify the explicit role of each in influencing the distribution of a species. Models were extrapolated to the taxon’s range to predict across all potentially occupied areas within California.

Results of the Maxent model are included in the pdf attachment, including top model review score, validation metrics, model output details, covariate response curves, percent contribution of covariates to the top model, and a full list of covariates included in the model.

The top model results are linked here: https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=238501.

Each focal taxon’s location data was extracted (when applicable) and collated from the following list of data sources. BIOS datasets are bracketed with their “ds” numbers and can be located on CDFW’s BIOS viewer: https://wildlife.ca.gov/Data/BIOS.

  • California Natural Diversity Database,

  • Terrestrial Species Monitoring [ds2826],

  • North American Bat Monitoring Data Portal,

  • VertNet,

  • Breeding Bird Survey,

  • Wildlife Insights,

  • eBird,

  • iNaturalist,

  • other available CDFW or partner data.

Please refer to the Range Map and Species Habitat Model Use Case Guidance document on how best to interpret RADMAP outputs, including range maps, continuous SHMs, and categorical SHMs. Specifically, users should follow these guidelines to determine which products to utilize for conservation, management, and policy decision making use cases: https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=222269.

Access & Use Information

Public: This dataset is intended for public access and use. Non-Federal: This dataset is covered by different Terms of Use than Data.gov. License: See this page for license information.

Downloads & Resources

Dates

Metadata Created Date February 23, 2026
Metadata Updated Date February 23, 2026

Metadata Source

Harvested from State of California

Additional Metadata

Resource Type Dataset
Metadata Created Date February 23, 2026
Metadata Updated Date February 23, 2026
Publisher California Department of Fish and Wildlife
Maintainer
Identifier b1e5d901-df7b-4503-85cd-9639aa08919d
Data First Published 2026-01-23T18:25:19.000Z
Data Last Modified 2026-01-23T18:29:18.000Z
Category Natural Resources
Public Access Level public
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id d43e724c-ea93-4c4f-adfa-065419569e20
Harvest Source Id 3ba8a0c1-5dc2-4897-940f-81922d3cf8bc
Harvest Source Title State of California
License http://www.opendefinition.org/licenses/cc-by
Source Datajson Identifier True
Source Hash 0ac9fe1867a8a7fdbfce7085a66fb4f73d34e3d2135d27996e41bcef7d9a3e45
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.