============================Setup Environment==============================
1. download file in http://cmusphinx.sourceforge.net/wiki/tutorialoverview
a. https://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/
b. https://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/
c. https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/
d. https://sourceforge.net/projects/cmusphinx/files/sphinxtrain/5prealpha/
2. put all of file in one folder (root), and extract all
3. create folder project in the same folder root.
4. install all package and install all lib needed
$ sudo apt-get install autoconf
$ sudo apt-get install libtool-bin
$ sudo apt-get autoremove automake
$ sudo apt-get install automake
$ sudo apt-get install bison
$ sudo apt-get install swig
5. go to the folder sphinxbase
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install6. go to the folder pocketsphinx
$ ./configure
$ make
$ sudo make install7. go to the folder sphinxtrain
$ ./configure
$ make
$ sudo make install8. setting path library
$ export LD_LIBRARY_PATH=/usr/local/lib
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig9. test your sphinx lib, make sure the app can running well
$ pocketsphinx_continuous -inmic yes
_________________________________________________________________________________
note:
if you get problem with audio device
if your app running well, the terminal will appear
note:
if you get problem with audio device
failed to open audio deviceyou can install it:
$ sudo apt-get install pulseaudio $ sudo apt-get install libpulse-dev $ sudo apt-get install osspd_________________________________________________________________________________
if your app running well, the terminal will appear
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134723 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Dec 6 2016, AT: 11:17:29
INFO: continuous.c(252): Ready....
INFO: continuous.c(261): Listening...
========================Instalation Complete Setup Environment===================
============================Building dictionary with Phonetisaurus================
https://sourceforge.net/p/kaldi/mailman/message/29344614/
http://www.openfst.org/twiki/bin/view/FST/FstDownload
extract; ./configure; make; sudo make install
https://github.com/AdolfVonKleist/Phonetisaurus
1. download openfst
$ download: http://www.openfst.org/twiki/bin/view/FST/FstDownload
2. extract openfst$ tar -xvzf openfst-1.5.4.tar.gz
3. install openfst$ cd openfst-1.5.4
$ ./configure --enable-static --enable-shared --enable-far --enable-lookahead-fsts --enable-const-fsts --enable-pdt --enable-ngram-fsts --enable-linear-fsts
note: file will installed in /user/local/include and /usr/local/lib$ make
$ sudo make install
4. if you want to install file in local derectory you can follow this step$ ./configure --prefix=/home/you/usr
note: tour file will install in /home/you/usr$ make
$ make install
5. install Phonetisaurus$ git clone https://github.com/AdolfVonKleist/Phonetisaurus/tree/openfst-1.5.3
$ cd Phonetisaurus-openfst-1.5.3
$ cd src
$ ./configure
$ cd .autoconf
$ autoconf -o ./configure
$ cd ../
$ make j2 all
$ sudo make install
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/libnote: "/usr/local/lib" you can change with your path
6. test un Phonetisaurus
$ bin/phonetisaurus-align --helpif your system installed correctly, tou can get this code
GitRevision: phonetisaurus-align --input=dictionary --ofile=corpus. Usage: --delim: type = string, default = " " Delimiter separating entry one and entry two in the input file. --eps: type = string, default = "<eps>" Epsilon symbol. --fb: type = bool, default = false Use forward-backward pruning for the alignment lattices. --input: type = string, default = "" Two-column input file to align. --iter: type = int32, default = 11 Maximum number of EM iterations to perform. --lattice: type = bool, default = false Write out the alignment lattices as an fst archive (.far). --load_model: type = bool, default = false Load a pre-trained model for use. --mbr: type = bool, default = false Use the LMBR decoder (not yet implemented). --model_file: type = string, default = "" FST-format alignment model to load. --nbest: type = int32, default = 1 Output the N-best alignments given the model. --ofile: type = string, default = "" Output file to write the aligned dictionary to. --penalize: type = bool, default = true Penalize scores. --penalize_em: type = bool, default = false Penalize links during EM training. --pthresh: type = double, default = -99 Pruning threshold. Use to prune unlikely N-best candidates when using multiple alignments. --restrict: type = bool, default = true Restrict links to M-1, 1-N during initialization. --s1_char_delim: type = string, default = "" Sequence one input delimeter. --s1s2_sep: type = string, default = "}" Token used to separate input-output subsequences in the g2p model. --s2_char_delim: type = string, default = " " Sequence two input delimeter. --seq1_del: type = bool, default = true Allow deletions in sequence one. --seq1_max: type = int32, default = 2 Maximum subsequence length for sequence one. --seq1_sep: type = string, default = "|" Multi-token separator for input tokens. --seq2_del: type = bool, default = true Allow deletions in sequence two. --seq2_max: type = int32, default = 2 Maximum subsequence length for sequence two. --seq2_sep: type = string, default = "|" Multi-token separator for output tokens. --skip: type = string, default = "_" Skip token used to represent null transitions. Distinct from epsilon. --thresh: type = double, default = 1e-10 Delta threshold for EM training termination. --write_model: type = string, default = "" Write out the alignment model in OpenFst format to filename. --help: type = bool, default = false show usage information --helpshort: type = bool, default = false show brief usage information --tmpdir: type = string, default = "/tmp" temporary directory --v: type = int32, default = 0 verbose level --help: type = bool, default = false show usage information================= Building Dictionary with g2p-seq2seq===========================
1. Clone this repo
$ git clone https://github.com/cmusphinx/g2p-seq2seq.git
$ cd g2p-seq2seq2. install tensor flow
$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl3. install g2p-seq2seq
$ sudo python setup.py install4. running g2p-seq2seq
$ wget -O g2p-seq2seq-cmudict.tar.gz https://sourceforge.net/projects/cmusphinx/files/G2P%20Models/g2p-seq2seq-cmudict.tar.gz/download
$ tar xf g2p-seq2seq-cmudict.tar.gz
$ g2p-seq2seq --interactive --model g2p-seq2seq-cmudictor
$ g2p-seq2seq --decode your_wordlist.txt --model g2p-seq2seq-cmudict
=========================Building language model=============================
1. after complete instalation to setup environment
2. install cmuclmtk, download source from:
https://sourceforge.net/projects/cmusphinx/files/cmuclmtk/0.7/cmuclmtk-0.7.tar.gz/download
$ tar -xvzf cmuclmtk-0.7.tar.gz
$ cd cmuclmtk-0.7/
$ ./configure
$ make
$ sudo make install
4. go to the root folder and init project. name of project is -> "coba"$ cd coba
$ sphinxtrain -t coba setup
5. create a file name is "coba.txt" and fill with text like this in directory coba/etc/<s> kalimat bahasa indonesia 1 </s> <s> kalimat bahasa indonesia 2 </s> <s> kalimat bahasa indonesia 3 </s> <s> kalimat bahasa indonesia 4 </s>6. convert to vocab
$ text2wfreq < coba.txt | wfreq2vocab > coba.vocab7. Generate the arpa format language model
$ text2idngram -vocab coba.vocab -idngram coba.idngram < coba.txt
$ idngram2lm -vocab_type 0 -idngram coba.idngram -vocab coba.vocab -arpa coba.lm8. Generate the CMU binary form (BIN)
$ sphinx_lm_convert -i coba.lm -o coba.lm.bin
$ sphinx_lm_convert -i coba.lm -o coba.lm.DMP
$ sphinx_lm_convert -i coba.lm.bin -ifmt bin -o coba.lm -ofmt arpa
=========================Create Acoustic Model==============================
1. go to coba/ directory
2. create "wav" folder
3. move all audio file to wav folder ( 16 bit; 16000MHz; mono)
4. do verification to file that want to used
$ sudo apt install sox
$ for i in *.wav; do play $i; done
for f in *.wav; do
sox $f -r 16000 $f.new.wav; mv $f.new.wav $f;
done
4. edit your file configuration (etc/sphinx_train.cfg)
========================================================================================================================================================================================================================
path="/home/kirra/Documents/projek/sphinx/suara"
path_wav=${path}/wav
path_etc=${path}/etc
name_project="suara"
train_trans=${name_project}"_train.transcription"
test_trans=${name_project}"_test.transcription"
train_fileids=${name_project}"_train.fileids"
test_fileids=${name_project}"_test.fileids"
DATA=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099 100)
TRAIN=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099)
TEST=(100)
# #convert file wav to 16000MHz
# ##check file if exist
# if [ -f listofsound123.txt ]; then
# echo "File listofsound123 found! remove it"
# rm listofsound123.txt
# fi
# ## convert audio
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep wav); do
# #rm ${path}/wav/$n/$i
# echo "$i" | tr --delete .wav >> listofsound123.txt;
# sox -S ${path}/wav/$n/$i -r 16000 ${path}/wav/$n/$i.new.wav;
# mv ${path}/wav/$n/$i.new.wav ${path}/wav/$n/$i;
# done
# done
# # make korpus
# ##make file exist
# # echo awal > $name_project".txt"
# ##deletefile
# # rm $name_project".txt"
# ##check file if exist
# if [ -f $name_project".txt" ]; then
# echo "File" $name_project".txt found! remove it"
# rm $name_project".txt"
# fi
# # convert text to corpus
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s>" >> $name_project".txt"
# done
# done
# ##check file if exist
# if [ -f $name_project".vocab" ]; then
# echo "File" $name_project".vocab found! remove it"
# rm $name_project".vocab"
# fi
# text2wfreq < $name_project".txt" | wfreq2vocab > $name_project".vocab"
# ##check file if exist
# if [ -f $name_project".idngram" ]; then
# echo "File" $name_project".idngram found! remove it"
# rm $name_project".idngram"
# fi
# text2idngram -vocab $name_project".vocab" -idngram $name_project".idngram" < $name_project".txt"
# ##check file if exist
# if [ -f $name_project".lm" ]; then
# echo "File" $name_project".lm found! remove it"
# rm $name_project".lm"
# fi
# idngram2lm -vocab_type 0 -idngram $name_project".idngram" -vocab $name_project".vocab" -arpa $name_project".lm"
# ##check file if exist
# if [ -f $name_project".lm.bin" ]; then
# echo "File" $name_project".lm.bin found! remove it"
# rm $name_project".lm.bin"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.bin"
# ##check file if exist
# if [ -f $name_project".lm.DMP" ]; then
# echo "File" $name_project".lm.DMP found! remove it"
# rm $name_project".lm.DMP"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.DMP"
# sphinx_lm_convert -i $name_project".lm.bin" -ifmt bin -o $name_project".lm" -ofmt arpa
=============================== Building Acoustic model ================== ========================================================================
#list name of sound train
#check file if exist
# if [ -f $train_trans ]; then
# echo "File" $train_trans" found! remove it"
# rm $train_trans
# fi
# for n in ${TRAIN[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $train_trans;
# echo $i "done"
# done
# done
# #list name of sound test
# ##check file if exist
# if [ -f $test_trans ]; then
# echo "File" $test_trans" found! remove it"
# rm $test_trans
# fi
# for n in ${TEST[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $test_trans
# echo $i "done"
# done
# done
# #list transcript of sound train with name of file
# ##check file if exist
if [ -f $train_fileids ]; then
echo "File" $train_fileids" found! remove it"
rm $train_fileids
fi
for n in ${TRAIN[@]}; do
for i in $(ls ../wav/$n | grep wav); do
value=$(<${path_wav}/$n/$i)
echo $n/$i | tr --delete .wav >> $train_fileids;
echo $i "done"
done
done
# #list transcript of sound test with name of file
# ##check file if exist
if [ -f $test_fileids ]; then
echo "File" $test_fileids" found! remove it"
rm $test_fileids
fi
for n in ${TEST[@]}; do
for i in $(ls ../wav/$n | grep wav); do
value=$(<${path_wav}/$n/$i)
echo $n/$i | tr --delete .wav >> $test_fileids;
echo $i "done"
done
done
========================================================================================================NOTE===========================================================================================================
HTK
JULIUS
SPHINX
http://cmusphinx.sourceforge.net/wiki/
http://cmusphinx.sourceforge.net/wiki/tutoriallm#keyword_lists
-> install cmusphing (text2wfreq, dll)
https://github.com/jasperproject/jasper-client/issues/231
membuat dict
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
audio.online-convert.com/convert-to-wav
menjalankan pocket sphinx dari file dict sendiri
pocketsphinx_continuous -inmic yes -lm 4171.lm -dict 4171.dic
training suara dari file luar dan dari file dict sendiri
pocketsphinx_continuous -infile pproject.wav -keyphrase "udah selesai tadi malem" -kws_threshold 1-e20f -time yes -lm 4171.lm -dict 4171.dic
pocketsphinx_continuous -hmm model_parameters/coba.cd_cont_200/ -lm etc/coba.lm.bin -dict etc/coba.dic -infile wav/project19.wav
create setup
sphinxtrain -t project setupsphinx_fe -argfile suara.cd_ptm_4000/feat.params -samprate 16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
./mllr_solve -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -outmllrfn mllr_matrix -accumdir .
./map_adapt -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -mixwfn suara.cd_ptm_4000/mixture_weights -tmatfn suara.cd_ptm_4000/transition_matrices -accumdir . -mapmeanfn suara.cd_ptm_4000_adapt/means -mapvarfn suara.cd_ptm_4000_adapt/variances -mapmixwfn suara.cd_ptm_4000_adapt/mixture_weights -maptmatfn suara.cd_ptm_4000_adapt/transition_matrices
./bw -hmmdir suara.cd_ptm_4000 -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn suara.dic -ctlfn arctic20.fileids -lsnfn arctic20.transcription -accumdir .
pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm suara.lm.DMP -dict suara.dic -hmm suara.cd_ptm_200 -hyp arctic20.hyp
python2.7-config --cflags
python2.7-config --ldflags
gcc pocketsphinx_wrap.c -o pocketsphinx_wrap -I/home/kirra/anaconda3/include/python3.5m -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include/pocketsphinx -I/usr/local/include/sphinxbase
ref:
https://ubuntuforums.org/showthread.php?t=2151421
https://lists.gnu.org/archive/html/automake/2007-05/msg00018.html
http://www.openfst.org/twiki/bin/view/FST/FstExtensions
https://github.com/flyfei/phonetisaurus/issues/38
.: Install Sphinx >>>>> Download Now
ReplyDelete>>>>> Download Full
.: Install Sphinx >>>>> Download LINK
>>>>> Download Now
.: Install Sphinx >>>>> Download Full
>>>>> Download LINK Fm