Model-based cluster analysis of microarray gene-expression data
Background
Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic.
Results
The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels.
Conclusions
Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.
Find Related Datasets
Search by Tags
Click any tag below to search for similar datasets
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| bureauCode |
[ "009:25" ] |
| contactPoint |
{ "fn": "NIH", "@type": "vcard:Contact", "hasEmail": "mailto:info@nih.gov" } |
| description | Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data. |
| distribution |
[ { "@type": "dcat:Distribution", "title": "Official Government Data Source", "mediaType": "text/html", "description": "Visit the original government dataset for complete information, documentation, and data access.", "downloadURL": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65687/" } ] |
| identifier | https://healthdata.gov/api/views/yh42-xkaf |
| issued | 2025-07-14 |
| keyword |
[ "cluster-analysis", "gene-expression", "microarray-data", "nih", "pneumococcal-infection" ] |
| landingPage | https://healthdata.gov/d/yh42-xkaf |
| modified | 2025-09-06 |
| programCode |
[ "009:033" ] |
| publisher |
{ "name": "National Institutes of Health", "@type": "org:Organization" } |
| theme |
[ "NIH" ] |
| title | Model-based cluster analysis of microarray gene-expression data |