Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Challenging Medically-Relevant Genes Benchmark Set

Metadata Updated: September 30, 2025

CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

References

https://doi.org/10.1101/2021.06.07.444885
https://doi.org/10.1038/s41592-020-01056-5

Dates

Metadata Created Date September 30, 2025
Metadata Updated Date September 30, 2025

Metadata Source

Harvested from Commerce Non Spatial Data.json Harvest Source

Additional Metadata

Resource Type Dataset
Metadata Created Date September 30, 2025
Metadata Updated Date September 30, 2025
Publisher National Institute of Standards and Technology
Maintainer
Identifier ark:/88434/mds2-2475
Language en
Data Last Modified 2021-09-29 00:00:00
Category Bioscience:Genomics
Public Access Level public
Bureau Code 006:55
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 2466b711-a459-4c56-9aa0-0a666b7ac596
Harvest Source Id bce99b55-29c1-47be-b214-b8e71e9180b1
Harvest Source Title Commerce Non Spatial Data.json Harvest Source
Homepage URL https://data.nist.gov/od/id/mds2-2475
License https://www.nist.gov/open/license
Program Code 006:045
Related Documents https://doi.org/10.1101/2021.06.07.444885, https://doi.org/10.1038/s41592-020-01056-5
Source Datajson Identifier True
Source Hash 9c5dc6f845a0701b66e2aed149296b938e13e12d184ffbd0660d81a0801aa3e7
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.