In bioinformatics, we are interested in new techniques and advances in classification of biomedical documents for the hope of extracting useful biomedical knowledge out of the classification task. In this paper we introduce a feature weighting method for improving biomedical text classification. The method is effective in inducing weighted features from text data for classification. The weight of a feature is induced by the difference in class probability with versus without the feature. Specifically, in a simple inductive learning setting, the difference in class probability in the presence vs absence of feature fj can be a good metric for the contribution of fj in predicting the class. This technique is suitable for biomedical text mining in particular because it gives rise to terms with low per-document frequency and such terms play a good role in predicting the class in biomedical texts. The technique gives weight for each term feature based on the distribution of the class in presence vs absence of the term without considering the term frequency in each document. The evaluation is conducted using six biomedical text datasets and compared to the tfidf technique and baseline with encouraging results. We further examined the predictiveness of low average frequency terms and their effectiveness in classification accuracy.
For further information or queries to archive a Peer-Reviewed Journal or Proceeding, e-mail to admin @ searchdl.org