Thursday, December 22, 2016

install sphinx

============================Setup Environment==============================
1. download file in

2. put all of file in one folder (root), and extract all
3. create folder project in the same folder root.
4. install all package and install all lib needed
$ sudo apt-get install autoconf
$ sudo apt-get install libtool-bin
$ sudo apt-get autoremove automake
$ sudo apt-get install automake
$ sudo apt-get install bison
$ sudo apt-get install swig

5. go to the folder sphinxbase
$ ./
$ ./configure
$ make
$ sudo make install
6. go to the folder pocketsphinx
$ ./configure
$ make
$ sudo make install
7. go to the folder sphinxtrain
$ ./configure
$ make

$ sudo make install
8. setting path library
$ export LD_LIBRARY_PATH=/usr/local/lib
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
9. test your sphinx lib, make sure the app can running well
$ pocketsphinx_continuous -inmic yes
if you  get problem with audio device
failed to open audio device
you can install it:
$ sudo apt-get install pulseaudio

$ sudo apt-get install libpulse-dev

$ sudo apt-get install osspd

if your app running well, the terminal will appear
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134723 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Dec  6 2016, AT: 11:17:29

INFO: continuous.c(252): Ready....
INFO: continuous.c(261): Listening...

========================Instalation Complete Setup Environment===================

============================Building dictionary with Phonetisaurus================
extract; ./configure; make; sudo make install

1. download openfst
$ download:
2. extract openfst
$ tar -xvzf openfst-1.5.4.tar.gz
3. install openfst
$ cd openfst-1.5.4
$ ./configure --enable-static --enable-shared --enable-far --enable-lookahead-fsts --enable-const-fsts --enable-pdt --enable-ngram-fsts --enable-linear-fsts
note: file will installed in /user/local/include and /usr/local/lib

$ make
$ sudo make install
4. if you want to install file in local derectory you can follow this step
$ ./configure --prefix=/home/you/usr
note: tour file will install in /home/you/usr

$ make
$ make install
5. install Phonetisaurus

$ git clone
$ cd Phonetisaurus-openfst-1.5.3
$ cd src
$ ./configure
$ cd .autoconf
$ autoconf -o ./configure
$ cd ../
$ make j2 all
$ sudo make install
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib
note: "/usr/local/lib" you can change with your path

6. test un Phonetisaurus
$ bin/phonetisaurus-align --help
if your system installed correctly, tou can get this code
phonetisaurus-align --input=dictionary --ofile=corpus.

  --delim: type = string, default = " "
  Delimiter separating entry one and entry two in the input file.
  --eps: type = string, default = "<eps>"
  Epsilon symbol.
  --fb: type = bool, default = false
  Use forward-backward pruning for the alignment lattices.
  --input: type = string, default = ""
  Two-column input file to align.
  --iter: type = int32, default = 11
  Maximum number of EM iterations to perform.
  --lattice: type = bool, default = false
  Write out the alignment lattices as an fst archive (.far).
  --load_model: type = bool, default = false
  Load a pre-trained model for use.
  --mbr: type = bool, default = false
  Use the LMBR decoder (not yet implemented).
  --model_file: type = string, default = ""
  FST-format alignment model to load.
  --nbest: type = int32, default = 1
  Output the N-best alignments given the model.
  --ofile: type = string, default = ""
  Output file to write the aligned dictionary to.
  --penalize: type = bool, default = true
  Penalize scores.
  --penalize_em: type = bool, default = false
  Penalize links during EM training.
  --pthresh: type = double, default = -99
  Pruning threshold.  Use to prune unlikely N-best candidates when using multiple alignments.
  --restrict: type = bool, default = true
  Restrict links to M-1, 1-N during initialization.
  --s1_char_delim: type = string, default = ""
  Sequence one input delimeter.
  --s1s2_sep: type = string, default = "}"
  Token used to separate input-output subsequences in the g2p model.
  --s2_char_delim: type = string, default = " "
  Sequence two input delimeter.
  --seq1_del: type = bool, default = true
  Allow deletions in sequence one.
  --seq1_max: type = int32, default = 2
  Maximum subsequence length for sequence one.
  --seq1_sep: type = string, default = "|"
  Multi-token separator for input tokens.
  --seq2_del: type = bool, default = true
  Allow deletions in sequence two.
  --seq2_max: type = int32, default = 2
  Maximum subsequence length for sequence two.
  --seq2_sep: type = string, default = "|"
  Multi-token separator for output tokens.
  --skip: type = string, default = "_"
  Skip token used to represent null transitions.  Distinct from epsilon.
  --thresh: type = double, default = 1e-10
  Delta threshold for EM training termination.
  --write_model: type = string, default = ""
  Write out the alignment model in OpenFst format to filename.
  --help: type = bool, default = false
  show usage information
  --helpshort: type = bool, default = false
  show brief usage information
  --tmpdir: type = string, default = "/tmp"
  temporary directory
  --v: type = int32, default = 0
  verbose level
  --help: type = bool, default = false
  show usage information
================= Building Dictionary with g2p-seq2seq===========================
1. Clone this repo
$ git clone
$ cd g2p-seq2seq 
2. install tensor flow
$ sudo pip install --upgrade
3. install g2p-seq2seq
$ sudo python install
4. running g2p-seq2seq
$ wget -O g2p-seq2seq-cmudict.tar.gz
$ tar xf g2p-seq2seq-cmudict.tar.gz
$ g2p-seq2seq --interactive --model g2p-seq2seq-cmudict
$ g2p-seq2seq --decode your_wordlist.txt --model g2p-seq2seq-cmudict

=========================Building language model=============================
1. after complete instalation to setup environment
2. install cmuclmtk, download source from:

$ tar -xvzf cmuclmtk-0.7.tar.gz 
$ cd cmuclmtk-0.7/
$ ./configure
$ make
$ sudo make install
4. go to the root folder and init project.  name of project is -> "coba"
$ cd coba
$ sphinxtrain -t coba setup
5. create a file name is "coba.txt" and fill with text like this in directory coba/etc/
<s> kalimat bahasa indonesia 1 </s>
<s> kalimat bahasa indonesia 2 </s>
<s> kalimat bahasa indonesia 3 </s>
<s> kalimat bahasa indonesia 4 </s>
6. convert to vocab
$ text2wfreq < coba.txt | wfreq2vocab > coba.vocab
7. Generate the arpa format language model
$ text2idngram -vocab coba.vocab -idngram coba.idngram < coba.txt
$ idngram2lm -vocab_type 0 -idngram coba.idngram -vocab coba.vocab -arpa coba.lm
8. Generate the CMU binary form (BIN)
$ sphinx_lm_convert -i coba.lm -o coba.lm.bin
$ sphinx_lm_convert -i coba.lm -o coba.lm.DMP
$ sphinx_lm_convert -i coba.lm.bin -ifmt bin -o coba.lm -ofmt arpa

=========================Create Acoustic Model==============================
1. go to coba/ directory
2. create "wav" folder
3. move all audio file to wav folder ( 16 bit; 16000MHz; mono)
4. do verification to file that want to used
$ sudo apt install sox
$ for i in *.wav; do play $i; done
for f in *.wav; do 
     sox $f -r 16000 $; mv $ $f; 

4. edit your file configuration (etc/sphinx_train.cfg)



DATA=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099 100)
TRAIN=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099)

# #convert file wav to 16000MHz
# ##check file if exist
# if [ -f listofsound123.txt ]; then
#     echo "File listofsound123 found! remove it"
#     rm listofsound123.txt
# fi

# ## convert audio
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep wav); do
# #rm ${path}/wav/$n/$i
# echo "$i" | tr --delete .wav >> listofsound123.txt;
# sox -S ${path}/wav/$n/$i -r 16000 ${path}/wav/$n/$;
# mv ${path}/wav/$n/$ ${path}/wav/$n/$i;
# done
# done

# # make korpus
# ##make file exist
# # echo awal > $name_project".txt"
# ##deletefile
# # rm $name_project".txt"

# ##check file if exist
# if [ -f $name_project".txt" ]; then
#     echo "File" $name_project".txt found! remove it"
#     rm $name_project".txt"
# fi
# # convert text to corpus
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s>" >> $name_project".txt"
# done
# done

# ##check file if exist
# if [ -f $name_project".vocab" ]; then
#     echo "File" $name_project".vocab found! remove it"
#     rm $name_project".vocab"
# fi
# text2wfreq < $name_project".txt" | wfreq2vocab > $name_project".vocab"

# ##check file if exist
# if [ -f $name_project".idngram" ]; then
#     echo "File" $name_project".idngram found! remove it"
#     rm $name_project".idngram"
# fi
# text2idngram -vocab $name_project".vocab" -idngram $name_project".idngram" < $name_project".txt"

# ##check file if exist
# if [ -f $name_project".lm" ]; then
#     echo "File" $name_project".lm found! remove it"
#     rm $name_project".lm"
# fi
# idngram2lm -vocab_type 0 -idngram $name_project".idngram" -vocab $name_project".vocab" -arpa $name_project".lm"

# ##check file if exist
# if [ -f $name_project".lm.bin" ]; then
#     echo "File" $name_project".lm.bin found! remove it"
#     rm $name_project".lm.bin"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.bin"

# ##check file if exist
# if [ -f $name_project".lm.DMP" ]; then
#     echo "File" $name_project".lm.DMP found! remove it"
#     rm $name_project".lm.DMP"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.DMP"

# sphinx_lm_convert -i $name_project".lm.bin" -ifmt bin -o $name_project".lm" -ofmt arpa

=============================== Building Acoustic model ================== ========================================================================

#list name of sound train
#check file if exist
# if [ -f $train_trans ]; then
#     echo "File" $train_trans" found! remove it"
#     rm $train_trans
# fi
# for n in ${TRAIN[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $train_trans;
# echo $i "done"
# done
# done

# #list name of sound test
# ##check file if exist
# if [ -f $test_trans ]; then
#     echo "File" $test_trans" found! remove it"
#     rm $test_trans
# fi
# for n in ${TEST[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $test_trans
# echo $i "done"
# done
# done

# #list transcript of sound train with name of file
# ##check file if exist
if [ -f $train_fileids ]; then
    echo "File" $train_fileids" found! remove it"
    rm $train_fileids
for n in ${TRAIN[@]}; do
for i in $(ls ../wav/$n | grep wav); do
echo $n/$i | tr --delete .wav >> $train_fileids;
echo $i "done"

# #list transcript of sound test with name of file
# ##check file if exist
if [ -f $test_fileids ]; then
    echo "File" $test_fileids" found! remove it"
    rm $test_fileids
for n in ${TEST[@]}; do
for i in $(ls ../wav/$n | grep wav); do
echo $n/$i | tr --delete .wav >> $test_fileids;
echo $i "done"

-> install cmusphing (text2wfreq, dll)

membuat dict

menjalankan pocket sphinx dari file dict sendiri
pocketsphinx_continuous -inmic yes -lm 4171.lm -dict 4171.dic

training suara dari file luar dan dari file dict sendiri
pocketsphinx_continuous -infile pproject.wav -keyphrase "udah selesai tadi malem" -kws_threshold 1-e20f -time yes -lm 4171.lm -dict 4171.dic

 pocketsphinx_continuous -hmm model_parameters/coba.cd_cont_200/ -lm etc/coba.lm.bin -dict etc/coba.dic -infile wav/project19.wav

create setup
sphinxtrain -t project setup
sphinx_fe -argfile suara.cd_ptm_4000/feat.params -samprate 16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes

./mllr_solve -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -outmllrfn mllr_matrix -accumdir .

./map_adapt -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -mixwfn suara.cd_ptm_4000/mixture_weights -tmatfn suara.cd_ptm_4000/transition_matrices -accumdir . -mapmeanfn suara.cd_ptm_4000_adapt/means -mapvarfn suara.cd_ptm_4000_adapt/variances -mapmixwfn suara.cd_ptm_4000_adapt/mixture_weights -maptmatfn suara.cd_ptm_4000_adapt/transition_matrices

./bw -hmmdir suara.cd_ptm_4000 -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn suara.dic -ctlfn arctic20.fileids -lsnfn arctic20.transcription -accumdir .

pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm suara.lm.DMP -dict suara.dic -hmm suara.cd_ptm_200 -hyp arctic20.hyp

python2.7-config --cflags
python2.7-config --ldflags
gcc pocketsphinx_wrap.c -o pocketsphinx_wrap -I/home/kirra/anaconda3/include/python3.5m  -Wsign-compare  -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include/pocketsphinx -I/usr/local/include/sphinxbase


