Feature subset selection using Nomogram in Type II Diabetes databases
Balakrishnan, Sarojini; Narayanasamy, Ramaraj and Savarimuthu, Nickolas (2009) Feature subset selection using Nomogram in Type II Diabetes databases. Indian Journal of Medical Informatics, 4 (1). pp. 5-5. ISSN 0973-9254
Full text available as:
Advancement in data mining and machine learning has promoted computer-based approaches such as Computer-aided diagnosis, Expert systems and Prognostic studies in medical applications. Medical data are processed and analyzed using data mining techniques to derive useful knowledge. These data are multidimensional, and represented by a large number of features. The irrelevant and redundant features among them may negatively impact the performance of the data mining algorithms. Feature selection identifies the features that improve the predictive accuracy of the classifiers. The proposed work focuses on identifying the significant features that influence the predictive accuracy of the Naïve Bayes Classifier using the visualization tool, Nomogram. The effect of each feature on the performance of the classifier is analyzed using nomogram and an optimal feature subset that enhances the predictive accuracy is derived. The proposed method, Nomogram-RFE, is experimented with Pima Indians Diabetes Dataset and the performance of the classifier is evaluated on five criteria: classification accuracy, sensitivity, specificity, the area under the receiver operating characteristic and Brier score. The experimental results show that the optimal feature subset derived enhances the predictive power of a classifier and reduces false positive and false negative rates as measured by the sensitivity and specificity of the classifier. A low Brier score for the optimal feature subset indicates lower deviation between the predicted probability and the actual outcome.
Archive Staff Only: edit this record