1. create indonesian common word list : http://indodic.com/IndMostComList.html,
save to "indocommonlist.txt"
output:
5. run new model data
output >>a k u
input >> memakan
output >> m e m a k a n
input >> makanan
output >> m a k a n a n
repository:
https://github.com/ardhimaarik/lexiconBIndo
save to "indocommonlist.txt"
ada adalah agar air akan akibat aku anak anda antara apa atas bagi bagian bagus bahkan bahwa baik banyak barang baru beberapa begitu belum benar bersama besar biasa bila bisa boleh buah bukan cara cepat cukup dalam dan dapat dari datang dengan depan di dia dilakukan diri dulu hal hanya harga hari harus hati hidup hingga ia ingin ini jadi jalan jangan jauh jelas jika juga jumlah juta kalau kali kami kamu karena kata ke kecil kecuali kembali kemudian kepada kepala kerja ketika khusus kini kita ku kurang lagi lain lalu lama langsung luar maka makan makanan makin malam mampu mana masalah masih masuk mata mau maupun melakukan melalui memang memberi memberikan membuat memiliki mencari mengatakan menjadi menurut merasa mereka merupakan mudah mulai mungkin nama namun nanti oleh orang pada paling perlu pernah pertama pulang punya pusat saat saja salah sama sambil sampai sangat saya sebab sebagai sebelum sebuah secara sedang sedikit segera sehingga sejak sekali sekalipun sekarang sekitar selalu selama seluruh sementara semua sendiri seperti sering serta sesuai sesuatu setelah setiap siap sini suatu sudah tahu tak tanpa tapi telah tempat tengah tentang terhadap terjadi terlalu termasuk tersebut terus tetapi tiba tidak tinggi uang untuk waktu yaitu yang2. sperate each word. save to newcommonword_lex.dic
ada a d a adalah a d a l a h agar a g a r : yang y a ng3. install g2p-seq2seq : https://github.com/cmusphinx/g2p-seq2seq
$ git clone https://github.com/cmusphinx/g2p-seq2seq.git $ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp27-none-linux_x86_64.whl $ sudo python setup.py install4. train new model
$ g2p-seq2seq --train newcommonword_lex.dic --model commonwordmodeldicwait until finish
output:
Preparing G2P data Creating vocabulary commonwordmodeldic/vocab.phoneme Creating vocabulary commonwordmodeldic/vocab.grapheme Reading development and training data. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. Creating model with parameters: Learning rate: 0.5 LR decay factor: 0.99 Max gradient norm: 5.0 Batch size: 64 Size of layer: 64 Number of layers: 2 Steps per checkpoint: 200 Max steps: 0 Optimizer: sgd Created model with fresh parameters. global step 200 learning rate 0.5000 step-time 0.08 perplexity 6.41 eval: perplexity 2.20 global step 400 learning rate 0.5000 step-time 0.08 perplexity 2.10 eval: perplexity 1.66 global step 600 learning rate 0.5000 step-time 0.08 perplexity 1.18 eval: perplexity 1.55 global step 800 learning rate 0.5000 step-time 0.05 perplexity 1.02 eval: perplexity 1.60 No improvement over last 1 times. Training will stop after -1iterations if no improvement was seen. Training done. Loading vocabularies from commonwordmodeldic Creating 2 layers of 64 units. Reading model parameters from commonwordmodeldic Beginning calculation word error rate (WER) on test sample. Words: 20 Errors: 16 WER: 0.800 Accuracy: 0.200
5. run new model data
$ g2p-seq2seq --interactive --model commonwordmodeldicinput >> aku
output >>a k u
input >> memakan
output >> m e m a k a n
input >> makanan
output >> m a k a n a n
repository:
https://github.com/ardhimaarik/lexiconBIndo
No comments:
Post a Comment