Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

View/ Open
Author(s)
Novak, Jaroslav
Kim, Seon-Young
Xu, Jun
Modlich, Olga
Volsky, David
Honys, David
Slonczewski, Joan
Bell, Douglas
Blattner, Fred
Blumwald, Eduardo
Boerma, Marjan
Cosio, Manuel
Gatalica, Zoran
Hajduch, Marian
Hidalgo, Juan
McInnes, Roderick
Miller III, Merrill
Penkowa, Milena
Rolph, Michael
Sottosanto, Jordan
St-Arnaud, Rene
Szego, Michael
Twell, David
Wang, Charles
Griffith University Author(s)
Year published
2006
Metadata
Show full item recordAbstract
Background DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results Here we examine the expression data obtained from 682 Affymetrix GeneChipsith 22 different types and we demonstrate ...
View more >Background DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results Here we examine the expression data obtained from 682 Affymetrix GeneChipsith 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Ka coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Ka distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.
View less >
View more >Background DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results Here we examine the expression data obtained from 682 Affymetrix GeneChipsith 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Ka coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Ka distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.
View less >
Journal Title
Biology Direct
Volume
1
Copyright Statement
© 2006 Novak et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Subject
Gene Expression (incl. Microarray and other genome-wide approaches)
Biological Sciences
Medical and Health Sciences