Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content


Metadata Updated: May 2, 2021

In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4382 in vivo studies for 1201 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. In order to address the complexity problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using Random Forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 20% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 mg/kg/day to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 38% of the variance with an RMSE of 0.76 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.1 to 0.8 log10 mg/kg/day. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we have generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for thousands of chemicals.

This dataset is associated with the following publication: Truong, L., G. Ouedraogo, L. Pham, J. Clouzeau, S. Loisel-Joubert, D. Blanchet, H. Noçairi, W. Setzer, R. Judson, C. Grulke, K. Mansouri, and M. Martin. (Archives of Toxicology) Predicting In Vivo Effect Levels for Repeat Dose Systemic Toxicity using Chemical, Biological, Kinetic and Study Covariates. Archives of Toxicology. Springer, New York, NY, USA, 92(2): 587-600, (2018).

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources



Metadata Created Date November 12, 2020
Metadata Updated Date May 2, 2021

Metadata Source

Harvested from EPA ScienceHub

Additional Metadata

Resource Type Dataset
Metadata Created Date November 12, 2020
Metadata Updated Date May 2, 2021
Publisher U.S. EPA Office of Research and Development (ORD)
Data Last Modified 2017-03-08
Public Access Level public
Bureau Code 020:00
Schema Version
Harvest Object Id 28df544b-23dd-4eee-a87e-870153924cef
Harvest Source Id 04b59eaf-ae53-4066-93db-80f2ed0df446
Harvest Source Title EPA ScienceHub
Program Code 020:095
Publisher Hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
Related Documents
Source Datajson Identifier True
Source Hash f12409e95652aa703bb31b842b7bc0001d17c785
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.