Tuesday, August 20, 2013

"Mismatch String Kernels" in action

If you are just lucky enough and the input points to your classification / discrimination problem admit a natural encoding as strings over some finite alphabet A (this is usually the case in protein discrimination, certain problems in NLP, etc.), then Mismatch String Kernels are good for your business. They were first introduced by C. Leslie and colleagues in their pioneering article.

I've working implementations both in C++ and Python. Checkout the code on my github and have fun. The code fits (learns) such a kernel. In a subsequent update (time is scarce), I'll implement code for prediction using the fitted kernel.