.: install sphinx

============================Setup Environment==============================
1. download file in http://cmusphinx.sourceforge.net/wiki/tutorialoverview
a. https://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/
b. https://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/
c. https://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/
d. https://sourceforge.net/projects/cmusphinx/files/sphinxtrain/5prealpha/

2. put all of file in one folder (root), and extract all
3. create folder project in the same folder root.
4. install all package and install all lib needed

$ sudo apt-get install autoconf

$ sudo apt-get install libtool-bin

$ sudo apt-get autoremove automake

$ sudo apt-get install automake

$ sudo apt-get install bison

$ sudo apt-get install swig

5. go to the folder sphinxbase

$ ./autogen.sh

$ ./configure

$ make

$ sudo make install

6. go to the folder pocketsphinx

$ ./configure

$ make

$ sudo make install

7. go to the folder sphinxtrain

$ ./configure

$ make

$ sudo make install

8. setting path library

$ export LD_LIBRARY_PATH=/usr/local/lib

$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

9. test your sphinx lib, make sure the app can running well

$ pocketsphinx_continuous -inmic yes

_________________________________________________________________________________
note:
if you get problem with audio device

failed to open audio device

you can install it:

$ sudo apt-get install pulseaudio

$ sudo apt-get install libpulse-dev

$ sudo apt-get install osspd

_________________________________________________________________________________

if your app running well, the terminal will appear

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
INFO: dict.c(336): 134723 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Dec  6 2016, AT: 11:17:29

INFO: continuous.c(252): Ready....
INFO: continuous.c(261): Listening...

========================Instalation Complete Setup Environment===================

============================Building dictionary with Phonetisaurus================

https://sourceforge.net/p/kaldi/mailman/message/29344614/
http://www.openfst.org/twiki/bin/view/FST/FstDownload
extract; ./configure; make; sudo make install
https://github.com/AdolfVonKleist/Phonetisaurus

1. download openfst

$ download: http://www.openfst.org/twiki/bin/view/FST/FstDownload

2. extract openfst

$ tar -xvzf openfst-1.5.4.tar.gz

3. install openfst

$ cd openfst-1.5.4

$ ./configure --enable-static --enable-shared --enable-far --enable-lookahead-fsts --enable-const-fsts --enable-pdt --enable-ngram-fsts --enable-linear-fsts

note: file will installed in /user/local/include and /usr/local/lib

$ make

$ sudo make install

4. if you want to install file in local derectory you can follow this step

$ ./configure --prefix=/home/you/usr

note: tour file will install in /home/you/usr

$ make

$ make install

5. install Phonetisaurus

$ git clone https://github.com/AdolfVonKleist/Phonetisaurus/tree/openfst-1.5.3

$ cd Phonetisaurus-openfst-1.5.3

$ cd src

$ ./configure

$ cd .autoconf

$ autoconf -o ./configure

$ cd ../

$ make j2 all

$ sudo make install

$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib

note: "/usr/local/lib" you can change with your path

6. test un Phonetisaurus

$ bin/phonetisaurus-align --help

if your system installed correctly, tou can get this code

GitRevision: 
phonetisaurus-align --input=dictionary --ofile=corpus.

 Usage: 
  --delim: type = string, default = " "
  Delimiter separating entry one and entry two in the input file.
  --eps: type = string, default = "<eps>"
  Epsilon symbol.
  --fb: type = bool, default = false
  Use forward-backward pruning for the alignment lattices.
  --input: type = string, default = ""
  Two-column input file to align.
  --iter: type = int32, default = 11
  Maximum number of EM iterations to perform.
  --lattice: type = bool, default = false
  Write out the alignment lattices as an fst archive (.far).
  --load_model: type = bool, default = false
  Load a pre-trained model for use.
  --mbr: type = bool, default = false
  Use the LMBR decoder (not yet implemented).
  --model_file: type = string, default = ""
  FST-format alignment model to load.
  --nbest: type = int32, default = 1
  Output the N-best alignments given the model.
  --ofile: type = string, default = ""
  Output file to write the aligned dictionary to.
  --penalize: type = bool, default = true
  Penalize scores.
  --penalize_em: type = bool, default = false
  Penalize links during EM training.
  --pthresh: type = double, default = -99
  Pruning threshold.  Use to prune unlikely N-best candidates when using multiple alignments.
  --restrict: type = bool, default = true
  Restrict links to M-1, 1-N during initialization.
  --s1_char_delim: type = string, default = ""
  Sequence one input delimeter.
  --s1s2_sep: type = string, default = "}"
  Token used to separate input-output subsequences in the g2p model.
  --s2_char_delim: type = string, default = " "
  Sequence two input delimeter.
  --seq1_del: type = bool, default = true
  Allow deletions in sequence one.
  --seq1_max: type = int32, default = 2
  Maximum subsequence length for sequence one.
  --seq1_sep: type = string, default = "|"
  Multi-token separator for input tokens.
  --seq2_del: type = bool, default = true
  Allow deletions in sequence two.
  --seq2_max: type = int32, default = 2
  Maximum subsequence length for sequence two.
  --seq2_sep: type = string, default = "|"
  Multi-token separator for output tokens.
  --skip: type = string, default = "_"
  Skip token used to represent null transitions.  Distinct from epsilon.
  --thresh: type = double, default = 1e-10
  Delta threshold for EM training termination.
  --write_model: type = string, default = ""
  Write out the alignment model in OpenFst format to filename.
  --help: type = bool, default = false
  show usage information
  --helpshort: type = bool, default = false
  show brief usage information
  --tmpdir: type = string, default = "/tmp"
  temporary directory
  --v: type = int32, default = 0
  verbose level
  --help: type = bool, default = false
  show usage information

================= Building Dictionary with g2p-seq2seq===========================
1. Clone this repo

$ git clone https://github.com/cmusphinx/g2p-seq2seq.git

$ cd g2p-seq2seq

2. install tensor flow

$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl

3. install g2p-seq2seq

$ sudo python setup.py install

4. running g2p-seq2seq

$ wget -O g2p-seq2seq-cmudict.tar.gz https://sourceforge.net/projects/cmusphinx/files/G2P%20Models/g2p-seq2seq-cmudict.tar.gz/download

$ tar xf g2p-seq2seq-cmudict.tar.gz

$ g2p-seq2seq --interactive --model g2p-seq2seq-cmudict

$ g2p-seq2seq --decode your_wordlist.txt --model g2p-seq2seq-cmudict

=========================Building language model=============================
1. after complete instalation to setup environment
2. install cmuclmtk, download source from:
https://sourceforge.net/projects/cmusphinx/files/cmuclmtk/0.7/cmuclmtk-0.7.tar.gz/download

$ tar -xvzf cmuclmtk-0.7.tar.gz

$ cd cmuclmtk-0.7/

$ ./configure

$ make

$ sudo make install

4. go to the root folder and init project. name of project is -> "coba"

$ cd coba

$ sphinxtrain -t coba setup

5. create a file name is "coba.txt" and fill with text like this in directory coba/etc/

<s> kalimat bahasa indonesia 1 </s>
<s> kalimat bahasa indonesia 2 </s>
<s> kalimat bahasa indonesia 3 </s>
<s> kalimat bahasa indonesia 4 </s>

6. convert to vocab

$ text2wfreq < coba.txt | wfreq2vocab > coba.vocab

7. Generate the arpa format language model

$ text2idngram -vocab coba.vocab -idngram coba.idngram < coba.txt

$ idngram2lm -vocab_type 0 -idngram coba.idngram -vocab coba.vocab -arpa coba.lm

8. Generate the CMU binary form (BIN)

$ sphinx_lm_convert -i coba.lm -o coba.lm.bin

$ sphinx_lm_convert -i coba.lm -o coba.lm.DMP

$ sphinx_lm_convert -i coba.lm.bin -ifmt bin -o coba.lm -ofmt arpa

=========================Create Acoustic Model==============================
1. go to coba/ directory
2. create "wav" folder
3. move all audio file to wav folder ( 16 bit; 16000MHz; mono)
4. do verification to file that want to used

$ sudo apt install sox

$ for i in *.wav; do play $i; done

for f in *.wav; do 
     sox $f -r 16000 $f.new.wav; mv $f.new.wav $f; 
 done

4. edit your file configuration (etc/sphinx_train.cfg)

========================================================================================================================================================================================================================

path="/home/kirra/Documents/projek/sphinx/suara"
path_wav=${path}/wav
path_etc=${path}/etc
name_project="suara"
train_trans=${name_project}"_train.transcription"
test_trans=${name_project}"_test.transcription"
train_fileids=${name_project}"_train.fileids"
test_fileids=${name_project}"_test.fileids"

DATA=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099 100)
TRAIN=(001 002 003 004 005 006 007 008 009 010 090 091 092 093 094 095 096 097 098 099)
TEST=(100)

# #convert file wav to 16000MHz
# ##check file if exist
# if [ -f listofsound123.txt ]; then
# echo "File listofsound123 found! remove it"
# rm listofsound123.txt
# fi

# ## convert audio
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep wav); do
# #rm ${path}/wav/$n/$i
# echo "$i" | tr --delete .wav >> listofsound123.txt;
# sox -S ${path}/wav/$n/$i -r 16000 ${path}/wav/$n/$i.new.wav;
# mv ${path}/wav/$n/$i.new.wav ${path}/wav/$n/$i;
# done
# done

# # make korpus
# ##make file exist
# # echo awal > $name_project".txt"
# ##deletefile
# # rm $name_project".txt"

# ##check file if exist
# if [ -f $name_project".txt" ]; then
# echo "File" $name_project".txt found! remove it"
# rm $name_project".txt"
# fi
# # convert text to corpus
# for n in ${DATA[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s>" >> $name_project".txt"
# done
# done

# ##check file if exist
# if [ -f $name_project".vocab" ]; then
# echo "File" $name_project".vocab found! remove it"
# rm $name_project".vocab"
# fi
# text2wfreq < $name_project".txt" | wfreq2vocab > $name_project".vocab"

# ##check file if exist
# if [ -f $name_project".idngram" ]; then
# echo "File" $name_project".idngram found! remove it"
# rm $name_project".idngram"
# fi
# text2idngram -vocab $name_project".vocab" -idngram $name_project".idngram" < $name_project".txt"

# ##check file if exist
# if [ -f $name_project".lm" ]; then
# echo "File" $name_project".lm found! remove it"
# rm $name_project".lm"
# fi
# idngram2lm -vocab_type 0 -idngram $name_project".idngram" -vocab $name_project".vocab" -arpa $name_project".lm"

# ##check file if exist
# if [ -f $name_project".lm.bin" ]; then
# echo "File" $name_project".lm.bin found! remove it"
# rm $name_project".lm.bin"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.bin"

# ##check file if exist
# if [ -f $name_project".lm.DMP" ]; then
# echo "File" $name_project".lm.DMP found! remove it"
# rm $name_project".lm.DMP"
# fi
# sphinx_lm_convert -i $name_project".lm" -o $name_project".lm.DMP"

# sphinx_lm_convert -i $name_project".lm.bin" -ifmt bin -o $name_project".lm" -ofmt arpa

=============================== Building Acoustic model ================== ========================================================================

#list name of sound train
#check file if exist
# if [ -f $train_trans ]; then
# echo "File" $train_trans" found! remove it"
# rm $train_trans
# fi
# for n in ${TRAIN[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $train_trans;
# echo $i "done"
# done
# done

# #list name of sound test
# ##check file if exist
# if [ -f $test_trans ]; then
# echo "File" $test_trans" found! remove it"
# rm $test_trans
# fi
# for n in ${TEST[@]}; do
# for i in $(ls ../wav/$n | grep txt); do
# value=$(<${path_wav}/$n/$i)
# echo "<s> "$value" </s> ("$n"/"$i")" | tr --delete .txt >> $test_trans
# echo $i "done"
# done
# done

# #list transcript of sound train with name of file
# ##check file if exist
if [ -f $train_fileids ]; then
echo "File" $train_fileids" found! remove it"
rm $train_fileids
fi
for n in ${TRAIN[@]}; do
for i in $(ls ../wav/$n | grep wav); do
value=$(<${path_wav}/$n/$i)
echo $n/$i | tr --delete .wav >> $train_fileids;
echo $i "done"
done
done

# #list transcript of sound test with name of file
# ##check file if exist
if [ -f $test_fileids ]; then
echo "File" $test_fileids" found! remove it"
rm $test_fileids
fi
for n in ${TEST[@]}; do
for i in $(ls ../wav/$n | grep wav); do
value=$(<${path_wav}/$n/$i)
echo $n/$i | tr --delete .wav >> $test_fileids;
echo $i "done"
done
done

========================================================================================================NOTE===========================================================================================================
HTK
JULIUS
SPHINX

http://cmusphinx.sourceforge.net/wiki/

http://cmusphinx.sourceforge.net/wiki/tutoriallm#keyword_lists
-> install cmusphing (text2wfreq, dll)
https://github.com/jasperproject/jasper-client/issues/231

membuat dict
http://www.speech.cs.cmu.edu/tools/lmtool-new.html

audio.online-convert.com/convert-to-wav

menjalankan pocket sphinx dari file dict sendiri
pocketsphinx_continuous -inmic yes -lm 4171.lm -dict 4171.dic

training suara dari file luar dan dari file dict sendiri
pocketsphinx_continuous -infile pproject.wav -keyphrase "udah selesai tadi malem" -kws_threshold 1-e20f -time yes -lm 4171.lm -dict 4171.dic

pocketsphinx_continuous -hmm model_parameters/coba.cd_cont_200/ -lm etc/coba.lm.bin -dict etc/coba.dic -infile wav/project19.wav

create setup

sphinxtrain -t project setup

sphinx_fe -argfile suara.cd_ptm_4000/feat.params -samprate 16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes

./mllr_solve -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -outmllrfn mllr_matrix -accumdir .

./map_adapt -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -meanfn suara.cd_ptm_4000/means -varfn suara.cd_ptm_4000/variances -mixwfn suara.cd_ptm_4000/mixture_weights -tmatfn suara.cd_ptm_4000/transition_matrices -accumdir . -mapmeanfn suara.cd_ptm_4000_adapt/means -mapvarfn suara.cd_ptm_4000_adapt/variances -mapmixwfn suara.cd_ptm_4000_adapt/mixture_weights -maptmatfn suara.cd_ptm_4000_adapt/transition_matrices

./bw -hmmdir suara.cd_ptm_4000 -moddeffn suara.cd_ptm_4000/mdef.txt -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn suara.dic -ctlfn arctic20.fileids -lsnfn arctic20.transcription -accumdir .

pocketsphinx_batch -adcin yes -cepdir wav -cepext .wav -ctl test.fileids -lm suara.lm.DMP -dict suara.dic -hmm suara.cd_ptm_200 -hyp arctic20.hyp

python2.7-config --cflags

python2.7-config --ldflags

gcc pocketsphinx_wrap.c -o pocketsphinx_wrap -I/home/kirra/anaconda3/include/python3.5m  -Wsign-compare  -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include/pocketsphinx -I/usr/local/include/sphinxbase

ref:
https://ubuntuforums.org/showthread.php?t=2151421
https://lists.gnu.org/archive/html/automake/2007-05/msg00018.html
http://www.openfst.org/twiki/bin/view/FST/FstExtensions
https://github.com/flyfei/phonetisaurus/issues/38

Search what you want

Labels

Thursday, December 22, 2016

install sphinx

1 comment:

.