TY - UNPB
T1 - Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma
AU - Page, David
AU - Zhan, Fenghuang
AU - Cussens, James
AU - Waddell, Michael
AU - Hardin, Johanna
AU - Barlogie, Bart
AU - Shaughnessy, Jr., John
PY - 2002/11/1
Y1 - 2002/11/1
N2 - Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.
AB - Supervised machine learning and data mining tools have become popular for the analysis of gene expression microarray data. They have the potential to uncover new therapeutic targets for diseases, to predict how patients will respond to specific treatments, and to uncover regulatory relationships among genes in normal and disease situations. Comparative experiments are needed to identify the advantages of the leading supervised learning algorithms for microarray data, as well as to give direction in methodological decisions. This paper compares support vector machines, Bayesian networks, decision trees, boosted decision trees, and voting (ensembles of decision stumps) on a new microarray data set for cancer with over 100 samples. The paper provides evidence for several important lessons for mining microarray data, including: (1) Bayes nets and ensembles perform at least as well as other approaches but arguably provide more direct insight; (2) the common practice of throwing out low or negative average differences, or those accompanied by an absent call, is a mistake; (3) looking for consistent differences in expression may be more important than large differences.
M3 - Working paper
BT - Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma
ER -