This supplemental material describes two sets of methods; first, it briefly describes the process used to create the EPA’s LitDB database, and second, it describes how a subset of records were extracted from LitDB to be included in the reference chemical database RefChemDB. LitDB is a database of data elements extracted from the xml download of all MEDLINE PubMed records. Perl scripts are used to extract the identifying information from each citation record, information like title, abstract, authors and PubMed ID. Additionally, the MeSH (Medical subject heading) terms are extracted with subheadings (also known as qualifiers).
The Perl scripts extract to text files which are then loaded into a mysql database.
The MeSH heading and descriptor tree files are also downloaded into mysql tables. They are available at https://www.nlm.nih.gov/mesh/filelist.html.
To make the data more useful for research in chemicals, the data is passed stepwise through a series of algorithms.
This dataset is associated with the following publication:
Judson, R., R. Thomas, N. Baker, A. Simha, X.M. Howey, C. Marable, N. Kleinstreuer, and K. Houck. Workflow for Defining Reference Chemicals for Assessing Performance of In Vitro Assays (Altex). ALTEX. Society ALTEX Edition, Kuesnacht, SWITZERLAND, 36(2): 261-276, (2019).