Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

New feature subset selection procedures for classification of expression profiles

Metadata Updated: September 7, 2025

Background Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier.

      Results
      We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes.


      Conclusion
      When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date July 24, 2025
Metadata Updated Date September 7, 2025

Metadata Source

Harvested from Healthdata.gov

Additional Metadata

Resource Type Dataset
Metadata Created Date July 24, 2025
Metadata Updated Date September 7, 2025
Publisher National Institutes of Health
Maintainer
NIH
Identifier https://healthdata.gov/api/views/udb4-f4rp
Data First Published 2025-07-14
Data Last Modified 2025-09-06
Category NIH
Public Access Level public
Bureau Code 009:25
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://healthdata.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id 8fde52d5-6b87-44cd-b330-710a0ed21618
Harvest Source Id 651e43b2-321c-4e4c-b86a-835cfc342cb0
Harvest Source Title Healthdata.gov
Homepage URL https://healthdata.gov/d/udb4-f4rp
Program Code 009:033
Source Datajson Identifier True
Source Hash 472422682add7ffbdea0cc79bbc2f7eb9cc5d381642a262e7d9f0ecfbc919ed7
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.