Monday, June 11, 2012

Practical Hidden Markov Models

Clone the code from here.

Load the hmm.R script in an R interpreter and run the martian() bot. It learns some basic english language from the Brown Corpus (corpus.R) using the Baum-Welch algorithm. It then uses the Viterbi algorithm to classify letters. Cool :)

[snip]



$loglikelihood
[1] -17836.16
$relativegain
[1] 9.147438e-10
[1] "CONVERGED."
[1] "< done."
[1] "Hi, I'm the Martian. This is what I've learnt about your language:"
[1] "First of all, there are two kinds of letters: I'll call them vowels and consonants."
[1] "a is a vowel"
[1] "b is a consonant"
[1] "c is a consonant"
[1] "d is a consonant"
[1] "e is a vowel"
[1] "f is a consonant"
[1] "g is a consonant"
[1] "h is a consonant"
[1] "i is a vowel"
[1] "j is a consonant"
[1] "k is a consonant"
[1] "l is a consonant"
[1] "m is a consonant"
[1] "n is a consonant"
[1] "o is a vowel"
[1] "p is a consonant"
[1] "q is a consonant"
[1] "r is a consonant"
[1] "s is a consonant"
[1] "t is a consonant"
[1] "u is a vowel"
[1] "v is a consonant"
[1] "w is a consonant"
[1] "x is a consonant"
[1] "y is a vowel"
[1] "z is a consonant"
>

[snip]

Gimme feedbacks, please.
-d0p

2 comments:

  1. Checkout C++ quick-and-dirty implementation (still in dev.) at https://github.com/half-jiffie/HMM

    ReplyDelete
  2. likelihood = -17437.1
    relative gain = 9.80367e-10
    CONVERGED.

    Final HMM:
    transition = [2,2]((0.29027,0.70973),(0.841361,0.158639))

    emission = [2,26]((1.6928e-15,0.0301392,0.068183,0.0725573,3.31442e-31,0.0290229,0.0192472,0.0485363,2.41306e-16,0.00390693,0.00900282,0.0870687,0.0471622,0.125859,4.27611e-10,0.0431378,0.00279066,0.121952,0.121542,0.100047,2.27059e-09,0.0226044,0.0301392,0.00362786,0.00956665,0.00390693),(0.180649,1.17482e-36,0.000628889,4.1488e-11,0.298202,8.95009e-22,0.0240477,0.00265756,0.17464,4.29656e-110,0.00741355,1.32341e-11,1.16758e-43,1.82914e-30,0.15511,0.00616722,6.58218e-95,4.68384e-37,0.0174519,0.0294804,0.0713582,1.07888e-90,5.28468e-41,4.38001e-49,0.0321935,2.48544e-13))

    pi = [2](0.786165,0.213835)

    Viterbi classification of the 26 symbols (cf. letters of the english alphabet):
    A is a vowel
    B is a consonant
    C is a consonant
    D is a consonant
    E is a vowel
    F is a consonant
    G is a consonant
    H is a consonant
    I is a vowel
    J is a consonant
    K is a consonant
    L is a consonant
    M is a consonant
    N is a consonant
    O is a vowel
    P is a consonant
    Q is a consonant
    R is a consonant
    S is a consonant
    T is a consonant
    U is a vowel
    V is a consonant
    W is a consonant
    X is a consonant
    Y is a consonant
    Z is a consonant

    ReplyDelete