Load the hmm.R script in an R interpreter and run the martian() bot. It learns some basic english language from the Brown Corpus (corpus.R) using the Baum-Welch algorithm. It then uses the Viterbi algorithm to classify letters. Cool :)
[snip]
[snip]
$loglikelihood
[1] -17836.16
$relativegain
[1] 9.147438e-10
[1] "CONVERGED."
[1] "< done."
[1] "Hi, I'm the Martian. This is what I've learnt about your language:"
[1] "First of all, there are two kinds of letters: I'll call them vowels and consonants."
[1] "a is a vowel"
[1] "b is a consonant"
[1] "c is a consonant"
[1] "d is a consonant"
[1] "e is a vowel"
[1] "f is a consonant"
[1] "g is a consonant"
[1] "h is a consonant"
[1] "i is a vowel"
[1] "j is a consonant"
[1] "k is a consonant"
[1] "l is a consonant"
[1] "m is a consonant"
[1] "n is a consonant"
[1] "o is a vowel"
[1] "p is a consonant"
[1] "q is a consonant"
[1] "r is a consonant"
[1] "s is a consonant"
[1] "t is a consonant"
[1] "u is a vowel"
[1] "v is a consonant"
[1] "w is a consonant"
[1] "x is a consonant"
[1] "y is a vowel"
[1] "z is a consonant"
>
[snip]
Gimme feedbacks, please.
-d0p
Checkout C++ quick-and-dirty implementation (still in dev.) at https://github.com/half-jiffie/HMM
ReplyDeletelikelihood = -17437.1
ReplyDeleterelative gain = 9.80367e-10
CONVERGED.
Final HMM:
transition = [2,2]((0.29027,0.70973),(0.841361,0.158639))
emission = [2,26]((1.6928e-15,0.0301392,0.068183,0.0725573,3.31442e-31,0.0290229,0.0192472,0.0485363,2.41306e-16,0.00390693,0.00900282,0.0870687,0.0471622,0.125859,4.27611e-10,0.0431378,0.00279066,0.121952,0.121542,0.100047,2.27059e-09,0.0226044,0.0301392,0.00362786,0.00956665,0.00390693),(0.180649,1.17482e-36,0.000628889,4.1488e-11,0.298202,8.95009e-22,0.0240477,0.00265756,0.17464,4.29656e-110,0.00741355,1.32341e-11,1.16758e-43,1.82914e-30,0.15511,0.00616722,6.58218e-95,4.68384e-37,0.0174519,0.0294804,0.0713582,1.07888e-90,5.28468e-41,4.38001e-49,0.0321935,2.48544e-13))
pi = [2](0.786165,0.213835)
Viterbi classification of the 26 symbols (cf. letters of the english alphabet):
A is a vowel
B is a consonant
C is a consonant
D is a consonant
E is a vowel
F is a consonant
G is a consonant
H is a consonant
I is a vowel
J is a consonant
K is a consonant
L is a consonant
M is a consonant
N is a consonant
O is a vowel
P is a consonant
Q is a consonant
R is a consonant
S is a consonant
T is a consonant
U is a vowel
V is a consonant
W is a consonant
X is a consonant
Y is a consonant
Z is a consonant