Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification

R Simon, MD Radmacher, K Dobbin… - Journal of the …, 2003 - academic.oup.com
Journal of the National Cancer Institute, 2003academic.oup.com
DNA microarrays have made it possible to estimate the level of expression of thousands of
genes for a sample of cells. Although biomedical investigators have been quick to adopt this
powerful new research tool, accurate analysis and interpretation of the data have provided
unique challenges. Indeed, many investigators are not experienced in the analytical steps
needed to convert tens of thousands of noisy data points into reliable and interpretable
biologic information. Although some investigators recognize the importance of collaborating …
DNA microarrays have made it possible to estimate the level of expression of thousands of genes for a sample of cells. Although biomedical investigators have been quick to adopt this powerful new research tool, accurate analysis and interpretation of the data have provided unique challenges. Indeed, many investigators are not experienced in the analytical steps needed to convert tens of thousands of noisy data points into reliable and interpretable biologic information. Although some investigators recognize the importance of collaborating with experienced biostatisticians to analyze microarray data, the number and availability of experienced biostatisticians is inadequate. Consequently, investigators are using available software to analyze their data, many seemingly without knowledge of potential pitfalls. Because of serious problems associated with the analysis and reporting of some DNA microarray studies, there is great interest in guidance on valid and effective methods for analysis of DNA microarray data.
The design and analysis strategy for a DNA microarray experiment should be determined in light of the overall objectives of the study. Because DNA microarrays are used for a wide variety of objectives, it is not feasible to address the entire range of design and analysis issues in this commentary. Here, we address statistical issues that arise from the use of DNA microarrays for an important group of objectives that has been called “class prediction”(1). Class prediction includes derivation of predictors of prognosis, response to therapy, or any phenotype or genotype defined independently of the gene expression profile.
Oxford University Press