Feature subset selection using Nomogram in Type II Diabetes databases

Balakrishnan, Sarojini; Narayanasamy, Ramaraj and Savarimuthu, Nickolas (2009) Feature subset selection using Nomogram in Type II Diabetes databases. Indian Journal of Medical Informatics, 4 (1). pp. 5-5. ISSN 0973-9254

Full text available as:

HTML
2 Kb

Abstract

Advancement in data mining and machine learning has promoted computer-based approaches such as Computer-aided diagnosis, Expert systems and Prognostic studies in medical applications. Medical data are processed and analyzed using data mining techniques to derive useful knowledge. These data are multidimensional, and represented by a large number of features. The irrelevant and redundant features among them may negatively impact the performance of the data mining algorithms. Feature selection identifies the features that improve the predictive accuracy of the classifiers. The proposed work focuses on identifying the significant features that influence the predictive accuracy of the Naïve Bayes Classifier using the visualization tool, Nomogram. The effect of each feature on the performance of the classifier is analyzed using nomogram and an optimal feature subset that enhances the predictive accuracy is derived. The proposed method, Nomogram-RFE, is experimented with Pima Indians Diabetes Dataset and the performance of the classifier is evaluated on five criteria: classification accuracy, sensitivity, specificity, the area under the receiver operating characteristic and Brier score. The experimental results show that the optimal feature subset derived enhances the predictive power of a classifier and reduces false positive and false negative rates as measured by the sensitivity and specificity of the classifier. A low Brier score for the optimal feature subset indicates lower deviation between the predicted probability and the actual outcome.

EPrint Type:Article
Uncontrolled Keywords:Feature selection; nomogram; Naïve Bayes Classifier; accuracy; sensitivity; specificity; AUC; Brier score
Subjects:Investigative Techniques > Epidemiologic Methods > Statistics as Topic > Probability > Bayes Theorem
-Journal Repositories > Indian Journal of Medical Informatics
Information Science > Medical Informatics
Information Science > Classification
Information Science > Pattern Recognition
Information Science > Computing Methodologies > Artificial Intelligence > Knowledge Bases
ID Code:3447
Deposited By:Dr. S N Sarbadhikari
Deposited On:10 November 2009

Archive Staff Only: edit this record