Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset

Metadata Updated: July 22, 2021

In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/- assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and the OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57-73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity.

This dataset is associated with the following publication: Pradeep, P., R. Judson, D. DeMarini, N. Keshava, T. Martin, J. Dean, C. Gibbons, A. Simha, S. Warren, M. Gwinn, and G. Patlewicz. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 18: 100167, (2021).

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

References

https://doi.org/10.1016/j.comtox.2021.100167

Dates

Metadata Created Date July 1, 2021
Metadata Updated Date July 22, 2021

Metadata Source

Harvested from EPA ScienceHub

Additional Metadata

Resource Type Dataset
Metadata Created Date July 1, 2021
Metadata Updated Date July 22, 2021
Publisher U.S. EPA Office of Research and Development (ORD)
Maintainer
Identifier https://doi.org/10.23719/1520958
Data Last Modified 2021-03-25
Public Access Level public
Bureau Code 020:00
Schema Version https://project-open-data.cio.gov/v1.1/schema
Harvest Object Id 6919d136-c71a-4dfd-93c1-cab5a304aa83
Harvest Source Id 04b59eaf-ae53-4066-93db-80f2ed0df446
Harvest Source Title EPA ScienceHub
License https://pasteur.epa.gov/license/sciencehub-license.html
Program Code 020:000
Publisher Hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
Related Documents https://doi.org/10.1016/j.comtox.2021.100167
Source Datajson Identifier True
Source Hash 93dd578851fec4a15db4f077c3e14165bdc40091
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.