Data and code from: The Impacts of Parental Choice and Intrapopulation Selection for Seed Size on the Uprightness of Progeny Derived from Interspecific Hybridization between Glycine max and Glycine soja

Metadata Updated: December 2, 2025

This dataset contains all data and code necessary to reproduce the analysis described under the heading "Experiment 3" in the manuscript:

Taliercio, E., Eickholt, D., Read, Q. D., Carter, T., Waldeck, N., & Fallen, B. (2023). Parental choice and seed size impact the uprightness of progeny from interspecific Glycine hybridizations. Crop Science. https://doi.org/10.1002/csc2.21015

The attached files are:

G_max_G_soja_seedweight_seedcolor_analysis.Rmd: RMarkdown notebook containing all analysis code. The CSV data files should be placed in a subdirectory called data within the working directory from which the notebook is rendered.
G_max_G_soja_seedweight_seedcolor_analysis.html: Rendered HTML output from RMarkdown notebook, including figures, tables, and explanatory text.
counts_seedwt.csv: CSV file containing the number of progeny selected and average 100-seed weight data for each combination of cross, size class, and replicate. Columns are:
- F3_location: text identifier of F3 nursery location, either "CLA" or "FF"
- plot: numeric ID of plot
- pop: numeric ID of population
- max: name of G. max parent
- soja: name of G. soja parent
- F2_location: text identifier of F2 nursery location, either "Caswell" or "Hugo"
- n_planted: number of seeds planted (raw)
- n_selected: number of progeny selected
- size_ordered: seed size class, to be converted to an ordered factor
- size_combined: seed size class aggregated to fewer unique levels
- ave_100sw: average 100-seed weight for the given size class
- n_planted_trials: number of seeds planted rounded to nearest integer
seedcolor.csv: CSV file with additional data on number of seeds of each color by population. Columns are:
- cross: text identifier of cross
- line: text identifier of line
- light: number of light seeds
- mid: number of mid-green seeds
- brown: number of brown seeds
- dark: number of dark or black seeds
- population: identifier of population type (F2 derived or selected)
- max: name of G. max parent
- n_total: sum of the light, mid, brown, and dark columns
- soja: name of G. soja parent

The data processing and analysis pipeline in the RMarkdown notebook includes:

Importing the data (slightly cleaned version is provided)
Creating boxplots of proportion selected by cross, nursery location, and size class
Fitting logistic GLMM to estimate the probability of selection as a function of parent, 100-seed weight, and their interactions
Extracting and plotting random effect estimates from model
Calculating and plotting estimated marginal means from model
Taking contrasts between pairs of estimated marginal means and trends
Calculating Bayes Factors associated with the contrasts
Generating figures and tables for all above results
Additional seed color analysis: importing data (slightly cleaned version is provided)
Additional seed color analysis: drawing exploratory bar plot
Additional seed color analysis: fitting multinomial GLM modeling the proportion of seeds with each color as a function of population
Additional seed color analysis: generating expected value predictions from GLM and taking contrasts
Additional seed color analysis: creating figures and tables for model results

This research was funded by CRIS 6070-21220-069-00D, United Soybean Board Project # 2333-203-0101, and falls under National Program NP301.

Resources in this dataset:

Resource Title: RMarkdown document with all analysis code.

File Name: G_max_G_soja_seedweight_seedcolor_analysis.Rmd

Resource Title: Rendered HTML version of notebook.

File Name: G_max_G_soja_seedweight_seedcolor_analysis.html

Resource Title: Progeny counts and seed weight data.

File Name: counts_seedwt.csv

Resource Title: Seed color counts data.

File Name: seedcolor.csv

Access & Use Information

Public: This dataset is intended for public access and use. License: us-pd

Downloads & Resources

Dates

Metadata Created Date	April 10, 2024
Metadata Updated Date	December 2, 2025

Metadata Source

Data.json Data.json Metadata
Download Metadata

Harvested from USDA JSON

Additional Metadata

Resource Type	Dataset
Metadata Created Date	April 10, 2024
Metadata Updated Date	December 2, 2025
Publisher	Agricultural Research Service
Maintainer	Read, Quentin
Identifier	10.15482/USDA.ADC/1528604
Data Last Modified	2025-11-21
Public Access Level	public
Bureau Code	005:18
Metadata Context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Schema Version	https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby	https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id	37bb9895-ab9a-47ab-aa37-5f8ac9a871cf
Harvest Source Id	d3fafa34-0cb9-48f1-ab1d-5b5fdc783806
Harvest Source Title	USDA JSON
License	https://www.usa.gov/publicdomain/label/1.0/
Old Spatial	{"type": "MultiPoint", "coordinates": -77.58, 35.26, -77.79, 35.95, -78.46, 35.65, -67, 18.45}
Program Code	005:040
Source Datajson Identifier	True
Source Hash	7427787c704c593ab2c4919329c50601ca4faaa581eb160b5c894dd330f1de81
Source Schema Version	1.1
Spatial	{"type": "MultiPoint", "coordinates": -77.58, 35.26, -77.79, 35.95, -78.46, 35.65, -67, 18.45}
Temporal	2013-01-01/2021-12-31

Didn't find what you're looking for? Suggest a dataset here.

Data Catalog