Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Systematic Approaches for the Encoding of Chemical Groups: A Case Study

Metadata Updated: May 20, 2024

The Supporting Information contains the following material: data set with ARN groups downloaded from https://echa.europa.eu/assessment-regulatory-needs in Feb 2023 (S1_2023_02_03_assessment-of-regulatory-needs--arn─export.xlsx)

Curated data set with ARN groups with molecular structures and their quality scores, that was used for building the models (S2_ARN_groups.xlsx)

Descriptive statistics for the 86 substance groups (S3_ARN_stats.xlsx). For each group, we provide the number of substances as in the ARN group, the number of substances matched in DSSTox, the DSSTox substance type and the number of substances with structural information and its quality; document with additional figures and explanations referred to in the manuscript (S4_SystematicGroupingSI.docx)

Predicted groups, probabilities and domain assessment for all nonconfidential substances registered under REACH (S5_rf_application_1_results_redacted.xlsx); Cross-validation scoring results obtained in every iteration of outer and inner grid search for the random forest (RF) model (S6_outer_inner_grid_details_rf.xlsx)

Cross-validation scoring results obtained in every iteration of outer and inner grid search for the nearest neighbor (kNN) model (S7_outer_inner_grid_details_kn.xlsx)

Cross-validation scoring results obtained for the gradient boosting (GB) model. This data set is only provided for completeness because the GB model was evaluated but not used further. Due to the computational cost, we only performed the inner grid search using the optimal fingerprint parameters identified by the outer grid search with kNN and RF (radius 2, length 2,560) (S8_outer_inner_grid_details_gb.xlsx) (ZIP)

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

Dates

Metadata Created Date May 20, 2024
Metadata Updated Date May 20, 2024

Metadata Source

Harvested from EPA ScienceHub

Additional Metadata

Resource Type Dataset
Metadata Created Date May 20, 2024
Metadata Updated Date May 20, 2024
Publisher U.S. EPA Office of Research and Development (ORD)
Maintainer
Identifier https://doi.org/10.23719/1530990
Data Last Modified 2024-03-01
Public Access Level public
Bureau Code 020:00
Schema Version https://project-open-data.cio.gov/v1.1/schema
Harvest Object Id 16ee5b78-ab69-4708-9fe3-fa0933fcf8ea
Harvest Source Id 04b59eaf-ae53-4066-93db-80f2ed0df446
Harvest Source Title EPA ScienceHub
License https://pasteur.epa.gov/license/sciencehub-license.html
Program Code 020:000
Publisher Hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
Source Datajson Identifier True
Source Hash b656795605b295941edc9743973c1053c08c0673d8c9d4582d09d693ade235be
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.