84 0.56 0.54 0.54 0.57 Figure 2 Comparison of classification performance for different datasets. The y-axis shows the average error and the x-axis indicates the gene selection methods: PAM, SDDA, SLDA and SCRDA. Error bars (± 1.96 SE) are provided for the classification methods. Discussion Microarrays are capable of determining the expression levels of thousands of genes simultaneously and hold
great promise to facilitate the discovery of new biological knowledge [20]. One feature of microarray data is that the number of variables p (genes) far exceeds the number of samples N. In statistical terms, it is called ‘large p, small N ‘ problem. Standard statistical methods in classification do
not work well or even at all, so improvement or modification of existing statistical methods is needed RG-7388 to prevent over-fitting and produce more reliable estimations. Some ad-hoc shrinkage methods have been proposed to utilize the shrinkage ideas and prove to be useful in OSI906 empirical studies [21–23]. Distinguishing normal samples from tumor samples is essential for successful diagnosis or treatment of cancer. And, another important problem is in characterizing multiple types of tumors. The problem of multiple classifications has recently received more attention in the context of DNA microarrays. In the present study, we first presented an evaluation of the performance of LDA and its modification methods for classification with 6 public microarray datasets. The RVX-208 gene selection method [6, 24, 25], the number of selected genes and the classification method are three critical issues for the performance of a sample classification. Feature selection techniques can be organized into three categories, filter methods, wrapper methods and embedded methods. LDA and its modification methods
belong to wrapper methods which embed the model hypothesis search within the feature subset search. In the present study, different numbers of gene have been selected by different LDA modification methods. There is no theoretical estimation of the optimal number of selected genes and the optimal gene set can vary from data to data [26]. So we did not focus on the combination of the optimal gene set by one feature gene selection method and one classification algorithm. In this paper we just describe the performance of LDA and its modification methods under the same selection method in different microarray dataset. Various statistical and machine ISRIB price learning methods have been used to analyze the high dimensional data for cancer classification. These methods have been shown to have statistical and clinical relevance in cancer detection for a variety of tumor types. In this study, it has been shown that LDA modification methods have better performance than traditional LDA under the same gene selection criterion.