Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

On the species of origin: diagnosing the source of symbiotic transcripts

Metadata Updated: September 6, 2025

Background Most organisms have developed ways to recognize and interact with other species. Symbiotic interactions range from pathogenic to mutualistic. Some molecular mechanisms of interspecific interaction are well understood, but many remain to be discovered. Expressed sequence tags (ESTs) from cultures of interacting symbionts can help identify transcripts that regulate symbiosis, but present a unique challenge for functional analysis. Given a sequence expressed in an interaction between two symbionts, the challenge is to determine from which organism the transcript originated. For high-throughput sequencing from interaction cultures, a reliable computational approach is needed. Previous investigations into GC nucleotide content and comparative similarity searching provide provisional solutions, but a comparative lexical analysis, which uses a likelihood-ratio test of hexamer counts, is more powerful.

      Results
      Validation with genes whose origin and function are known yielded 94% accuracy. Microbial (non-plant) transcripts comprised 75% of a Phytophthora sojae-infected soybean (Glycine max cv Harasoy) library, contrasted with 15% or less in root tissue libraries of Medicago truncatula from axenic, Phytophthora medicaginis-infected, mycorrhizal, and rhizobacterial treatments. Mycorrhizal libraries contained about 23% microbial transcripts; an axenic plant library contained a similar proportion of putative microbial transcripts.


      Conclusions
      Comparative lexical analysis offers numerous advantages over alternative approaches. Many of the transcripts isolated from mixed cultures were of unknown function, suggesting specificity to symbiotic metabolism and therefore candidates likely to be interesting for further functional investigation. Future investigations will determine whether the abundance of non-plant transcripts in a pure plant library indicates procedural artifacts, horizontally transferred genes, or other phenomena.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date July 24, 2025
Metadata Updated Date September 6, 2025

Metadata Source

Harvested from Healthdata.gov

Additional Metadata

Resource Type Dataset
Metadata Created Date July 24, 2025
Metadata Updated Date September 6, 2025
Publisher National Institutes of Health
Maintainer
NIH
Identifier https://healthdata.gov/api/views/6x5r-pvds
Data First Published 2025-07-14
Data Last Modified 2025-09-06
Category NIH
Public Access Level public
Bureau Code 009:25
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://healthdata.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 855b1f6e-66b4-4ee8-bc06-183e4f2265ba
Harvest Source Id 651e43b2-321c-4e4c-b86a-835cfc342cb0
Harvest Source Title Healthdata.gov
Homepage URL https://healthdata.gov/d/6x5r-pvds
Program Code 009:033
Source Datajson Identifier True
Source Hash a52ca77a2f4e97f5969d1335598e2d8003fc38798519a5280b282c8f85b4cb48
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.