18th IEEE Signal Processing and Communications Applications Conference, SIU 2010, Diyarbakır, Turkey, 22 - 24 April 2010, pp.929-932
Although protien classification for Drug design is one of the most widely studied area in the past few years, it is difficult to obtain high accuracy. We used a feature weighting algorithm in order to represent the whole needed feature set. Because of scarce labeled data and high computational complexity of supervised learning methods, a new semi-supervised learning algorithm extended from Gaussian Random Field methodology combined with active query learning is developed. The proposed approach is applied to newly extracted data from DrugBank database contains nearly 4800 drug entries including FDA approved drugs and synthetic drug and 2640 non-drug proteins. We found that our new approach has better accuracy then the other traditional semi-supervised methods and lower computational complexity than the supervised methods. ©2010 IEEE.