Research

A Language Modeling Method based on Subspace Gaussian Mixture Models in Speech Recognition

 2017.10.5.

The great leader Comrade Kim Il Sung said as follows.

"Long-term research should be conducted with a view to opening up new scientific fields and introducing the latest developments in science and technology widely in the national economy." ("KIM IL SUNG WORKS" Vol.35, 313p)

For extension of application fields and tasks of speech recognition and recognition tasks, recently, researches for language modeling in continuous space have been widely performing. The subspace gaussian mixture models (SGMMs) are a kind of continuous space language model, such as the language models based on Gaussian mixture model(GMLMs) and the language models based on recurrent neural networks(RNNLMs). The SGMLMs have significantly improved the recognition accuracy for the open vocabulary – Korean speech recognizer《Ryongnamsan》with efficient learning under less training texts and successfully applied for specific small vocabulary tasks, such as filling of the forms including a cahier, with various effective adaptation methods. Now, this approach is the promising one that it could be applied for the tasks involving the natural language processing as well as the language modeling of speech recognition.

It is shown above that the way of building the model of a word is similar with the model building for a speaker in the GMM based speaker recognition. For example, the distributions of the morphemes such as "school", "Asia" and "at" could be spanned by simple linear transforms over a universal background model. We built N-GRAM, GMLM and SGMLM on about 60000 vocabularies and obtained over evaluation text (4500 sentences) the perplexity values of 106, 90 and 60, respectively. In order to compare with the recognition rates, we performed recognition experiments on test set consisted of 400 utterances. The obtained syllable recognition rates are 97.78%, 97.89%, 98. 34%, respectively. As in case of GMLM, SGMLM were used in second pass during decoding. Also, we performed the adaptation experiments for N-Gram, GMLM, SGMLM according to various size of adaptation texts. As a resut, the performance of GMLM(FMLLR) and SGMLM(FMLLR) was improved 22.5%, 65.39% relatively over N-Gram(MAP), respectively.