Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Challenging Medically-Relevant Genes Benchmark Set

Metadata Updated: July 29, 2022

CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources

References

https://doi.org/10.1101/2021.06.07.444885
https://doi.org/10.1038/s41592-020-01056-5

Dates

Metadata Created Date November 29, 2021
Metadata Updated Date July 29, 2022

Metadata Source

Harvested from NIST

Additional Metadata

Resource Type Dataset
Metadata Created Date November 29, 2021
Metadata Updated Date July 29, 2022
Publisher National Institute of Standards and Technology
Maintainer
Identifier ark:/88434/mds2-2475
Language en
Data Last Modified 2021-09-29 00:00:00
Category Bioscience:Genomics
Public Access Level public
Bureau Code 006:55
Metadata Context https://project-open-data.cio.gov/v1.1/schema/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id b2ab1849-b31c-4c63-8bae-0f16f09abb45
Harvest Source Id 74e175d9-66b3-4323-ac98-e2a90eeb93c0
Harvest Source Title NIST
Homepage URL https://data.nist.gov/od/id/mds2-2475
License https://www.nist.gov/open/license
Program Code 006:045
Related Documents https://doi.org/10.1101/2021.06.07.444885, https://doi.org/10.1038/s41592-020-01056-5
Source Datajson Identifier True
Source Hash fccb0ba3473527651ea6bee234fda01d467c1c32
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.