Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models

Metadata Updated: July 29, 2022

The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations.Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

Dates

Metadata Created Date March 11, 2021
Metadata Updated Date July 29, 2022
Data Update Frequency R/P1Y

Metadata Source

Harvested from NIST

Additional Metadata

Resource Type Dataset
Metadata Created Date March 11, 2021
Metadata Updated Date July 29, 2022
Publisher National Institute of Standards and Technology
Maintainer
Identifier ark:/88434/mds2-2301
Data First Published 2020-10-23
Language en
Data Last Modified 2020-09-22 00:00:00
Category Information Technology:Data and informatics, Materials:Modeling and computational material science, Materials:Materials characterization
Public Access Level public
Data Update Frequency R/P1Y
Bureau Code 006:55
Metadata Context https://project-open-data.cio.gov/v1.1/schema/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id b1a17f6c-a990-4046-980a-afa360fa9b3d
Harvest Source Id 74e175d9-66b3-4323-ac98-e2a90eeb93c0
Harvest Source Title NIST
Homepage URL https://data.nist.gov/od/id/mds2-2301
License https://www.nist.gov/open/license
Program Code 006:045
Source Datajson Identifier True
Source Hash 155076061f1998404d8f7010d738554fbfcf8ce8
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.