Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Data from: Genetic variation among 481 diverse soybean accessions

Metadata Updated: April 21, 2025

This data is from the manuscript titled: "Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing". SNP calls were obtained from resequencing 481 diverse soybean lines comprising 52 wild (Glycine soja) and 429 cultivated (Glycine max). This dataset contains 6 gzipped VCF (Variant Call Format) files with variant calls for all 481 USB accessions, all G. max accessions, G. soja accessions, accessions sequenced at 15x coverage, accessions sequenced at 40x coverage, and 106 accessions re-sequenced from a previous study (Valliyodan et al. 2016). SNPs were called using the Haplotype caller algorithm from the Genome Analysis Toolkit (GATK) version gatk-2.5-2-gf57256b. A total of 7.8 million SNPs were identified between the 481 re-sequenced accessions. SNPs were assigned IDs using the script "assign_name.awk" available at https://github.com/soybase/SoySNP-Names. SNP effects were predicted using SnpEff 3.0.Dataset also available at https://soybase.org/data/v2/Glycine/max/diversity/Wm82.gnm2.div.Valliyodan_Brown_2021/Funding support provided by the United Soybean Board for the large-scale sequencing of soybean genomes (project #1320-532-5615), Bayer (previously Monsanto and Bayer), and Corteva (previously Dow AgroSciences), with in-kind support for analysis from USDA Agricultural Research Service project 5030-21000-069-00-D.Resources in this dataset:Resource Title: Data Dictionary.File Name: Data_Dictionary_USB481.csvResource Description: Provides the name of Data file with details of Data type, Description of data content, Correspondence to SoyBase Data Store File, and Size of file.Resource Title: List_of_Accessions.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_list.txt.gzResource Description: Table containing the list of all the accessions that were re-sequenced and the metadata associated with each accession.Resource Title: Alignment_used_for_Phylogenetic_trees.fna.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.fna_.gzResource Description: Aligned SNP data for USB481 accessions, based on SNPs sampled at one SNP per 25kbResource Title: Phylogenetic_tree.nh.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.nh_.txt.gzResource Description: Phylogenetic tree (newick format) of SNP data for USB481 data, based on SNPs sampled at one SNP per 25kbResource Title: Phylogenetic_tree.pxml.txt.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.sampled_25Kpos.pxml_.txt.gzResource Description: Phylogenetic tree (phyloxml format; colored) of SNP data for USB481 data, based on SNPs sampled at one SNP per 25kbResource Title: SNP_Effect_predictions.gff3.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff.gff3_.gzResource Description: Output from snpEff program using the SNPs from the full USB481.vcf file as input.Resource Title: Soja_SNP_calls.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.Soja_.vcf.gzResource Description: Genotype information in vcf format for 45 Soja lines from USB-funded project.Resource Title: Soy106.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.Soy106.vcf.gzResource Description: Genotype information in VCF format for 106 accessions from USB-funded project; from Valliyodan et al Sci Rep 2016.Resource Title: USB481_index.vcf.gz.tbi.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_list.txt.gzResource Description: Binary indexed USB481.vcf.gz produced using tabix.Resource Title: USB-40x.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB-40x.vcf.gzResource Description: Genotype information in VCF format for 46 accessions sequenced at 40x coverage from USB-funded project.Resource Title: SnpEff_predictions_Gmax_Accessions.gff.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff_Gmax.gff_.gzResource Description: SnpEff results in GFF format using the USB481_nosoja.vcf file as input.Resource Title: SnpEff_predictions_Gsoja_Accessions.gff.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.snpEff_Gsoja.gff_.gzResource Description: SnpEff output in GFF format using Soja_SNP_Calls.vcf.gz as an input.Resource Title: USB-15x.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB-15x.vcf.gzResource Description: Genotype information in VCF format for 284 accessions sequenced at 15x coverage from USB-funded project.Resource Title: USB481.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481.vcf.gzResource Description: Genotype information in VCF format for all 481 accessions from USB-funded project https://soybase.org/data/public/Glycine_max/Wm82.gnm2.div.G787/glyma.Wm82.gnm2.div.G787.USB481.vcf.gz Title: USB481_nosoja.vcf.gz.File Name: glyma.Wm82.gnm2_.div_.Valliyodan_Brown_2021.USB481_nosoja.vcf.gzResource Description: Combined genotype information, in VCF format, for all USB lines excluding the Sjoa lines from USB funded project. https://soybase.org/data/public/Glycine_max/Wm82.gnm2.div.G787/glyma.Wm82.gnm2.div.G787.USB481_nosoja.vcf.gz

Access & Use Information

Public: This dataset is intended for public access and use. License: us-pd

Downloads & Resources

Dates

Metadata Created Date March 30, 2024
Metadata Updated Date April 21, 2025

Metadata Source

Harvested from USDA JSON

Additional Metadata

Resource Type Dataset
Metadata Created Date March 30, 2024
Metadata Updated Date April 21, 2025
Publisher Agricultural Research Service
Maintainer
Identifier 10.15482/USDA.ADC/1518301
Data Last Modified 2024-02-21
Public Access Level public
Bureau Code 005:18
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 3902ed6e-6876-47cb-8ec5-69cf356a6e13
Harvest Source Id d3fafa34-0cb9-48f1-ab1d-5b5fdc783806
Harvest Source Title USDA JSON
License https://www.usa.gov/publicdomain/label/1.0/
Program Code 005:040
Source Datajson Identifier True
Source Hash e4b56c57d1ef3de5dd088ac698245a0c6b765ce23901500dad9d16a074af94b6
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.