Kaldi是基於C++開發並遵守Apache License v2.0的一款語音識別工具包,是目前最流行的ASR工具之一,本文基於Ubuntu 18.04 LTS介紹瞭如何安裝Kaldi。

首先按照官網kaldi-asr.org/doc/tutor提示,將Kaldi項目克隆至本地:

~$ git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden

進入kaldi-trunk:

~$ cd kaldi-trunk
~/kaldi-trunk$

查看INSTALL:

~/kaldi-trunk$ cat INSTALL
This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[for native Windows install, see windows/INSTALL]

(1)
go to tools/ and follow INSTALL instructions there.

(2)
go to src/ and follow INSTALL instructions there.

所以先進入tools目錄按提示安裝,再進入src目錄按提示安裝。

進入tools目錄查看INSTALL:

~/kaldi-trunk$ cd tools
~/kaldi-trunk/tools$ cat INSTALL
To check the prerequisites for Kaldi, first run

extras/check_dependencies.sh

and see if there are any system-level installations you need to do. Check the
output carefully. There are some things that will make your life a lot easier
if you fix them at this stage. If your system default C++ compiler is not
supported, you can do the check with another compiler by setting the CXX
environment variable, e.g.

CXX=g++-4.8 extras/check_dependencies.sh

Then run

make

which by default will install ATLAS headers, OpenFst, SCTK and sph2pipe.
OpenFst requires a relatively recent C++ compiler with C++11 support, e.g.
g++ >= 4.7, Apple clang >= 5.0 or LLVM clang >= 3.3. If your system default
compiler does not have adequate support for C++11, you can specify a C++11
compliant compiler as a command argument, e.g.

make CXX=g++-4.8

If you have multiple CPUs and want to speed things up, you can do a parallel
build by supplying the "-j" option to make, e.g. to use 4 CPUs

make -j 4

In extras/, there are also various scripts to install extra bits and pieces that
are used by individual example scripts. If an example script needs you to run
one of those scripts, it will tell you what to do.

所以首先需要進入extras目錄運行腳本check_dependencies.sh來檢查各種依賴是否安裝。

進入extras並運行check_dependencies.sh:

~/kaldi-trunk/tools$ cd extras/
~/kaldi-trunk/tools/extras$ ./check_dependencies.sh
./check_dependencies.sh: all OK.

運行check_dependencies.sh後出現任何提示表明某些庫未安裝,都應按照提示解決,直到運行check_dependencies.sh後出現如上所示」./check_dependencies.sh: all OK.」。

然後進入上一級,進行編譯:

~/kaldi-trunk/tools/extras$ cd ..
~/kaldi-trunk/tools$ make

如果是在虛擬機上,建議使用make而非make -j 4,否則很容易內存不夠導致編譯失敗,之後在src目錄下的編譯也一樣。

make完成後可能會提示irstlm未安裝,此時可以運行extras/install_irstlm.sh安裝irstlm,但沒安上也可以先不用管,先繼續完成整個kaldi的安裝再說。

進入src目錄並查看INSTALL:

~/kaldi-trunk/tools$ cd ../src
~/kaldi-trunk/src$ cat INSTALL

These instructions are valid for UNIX-like systems (these steps have
been run on various Linux distributions; Darwin; Cygwin). For native Windows
compilation, see ../windows/INSTALL.

You must first have completed the installation steps in ../tools/INSTALL
(compiling OpenFst; getting ATLAS and CLAPACK headers).

The installation instructions are

./configure --shared
make depend -j 8
make -j 8

Note that we added the "-j 8" to run in parallel because "make" takes a long
time. 8 jobs might be too many for a laptop or small desktop machine with not
many cores.

For more information, see documentation at http://kaldi-asr.org/doc/
and click on "The build process (how Kaldi is compiled)".

運行configure --shared:

~/kaldi-trunk/src$ ./configure
Configuring ...
Backing up kaldi.mk to kaldi.mk.bak ...
Checking compiler g++ ...
Checking OpenFst library in /home/zillyrex/kaldi-trunk/tools/openfst ...
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Atlas found in /usr/lib/x86_64-linux-gnu
Validating presence of ATLAS libs in /usr/lib/x86_64-linux-gnu
Using library /usr/lib/x86_64-linux-gnu/liblapack.so as ATLASs CLAPACK library.
CUDA will not be used! If you have already installed cuda drivers
and cuda toolkit, try using --cudatk-dir=... option. Note: this is
only relevant for neural net experiments
Info: configuring Kaldi not to link with Speex (dont worry, its only needed if you
intend to use compress-uncompress-speex, which is very unlikely)
Successfully configured for Linux [dynamic libraries] with ATLASLIBS =/usr/lib/x86_64-linux-gnu/liblapack.so /usr/lib/x86_64-linux-gnu/libcblas.so /usr/lib/x86_64-linux-gnu/libatlas.so /usr/lib/x86_64-linux-gnu/libf77blas.so
SUCCESS
To compile: make clean -j; make depend -j; make -j
... or e.g. -j 10, instead of -j, to use a specified number of CPUs

務必仔細閱讀運行configure後顯示的提示,它可能和上文所示的內容有所區別,其中提醒了你有哪些東西沒安裝好,並給出了指導,遵循那些執導完成相關依賴的安裝,直到運行configure後出現如上文所示的提示,提示的最後顯示」SUCCESS To compile: ……」,此時才能進行後面的步驟,否則長時間的make後會報錯。

執行最後的步驟,編譯kaldi的源碼:

~/kaldi-trunk/src$ make depend
...
...
~/kaldi-trunk/src$ make
...
...
...
Done

make的時間較長,大約半個小時到一個小時,如果編譯過程中未出現紅色的error,最後出現」Done」,表明編譯成功。

最後運行一個常式來檢驗安裝是否成功,運行egs/yesno/s5目錄下的run.sh:

~/kaldi-trunk/src$ cd ../egs/yesno/s5/
~/kaldi-trunk/egs/yesno/s5$ ./run.sh
Preparing train and test data
Dictionary preparation succeeded
utils/prepare_lang.sh --position-dependent-phones false data/local/dict <SIL> data/local/lang data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/silence_phones.txt is OK
...
...
...
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] exp/mono0a/decode_test_yesno/wer_10_0.0

出現如上結果,表明kaldi安裝成功。


推薦閱讀:
相關文章