Products of the same type
Products of the same team

System for discrete speech recognition

System for discrete speech recognition

First version: 1994 Last version: 1997

Application:  
  Recognition of dictation-like speech, with limited dictionary, in noisy conditions. The prototype can be used for development of oral control systems, dictation systems, etc.  
Description:  
  At the input, the system accepts oral speech, live or recorded in a file of RIFF (WAV) format. The system is able to recognize "discrete" speech, i.e. speech consisting of separate words, with pauses between words (not less than 200 ms). The mode of pronunciation must be relatively slow, and all words must be pronounced entirely (without truncation). The system recognizes about 500 words in all grammatical forms using a built-in dictionary of wordforms (about 15 000). In order to improve the recognition the system takes into account contextual characteristics of sound realisation and uses a built-in dictionary of sound segments. At the post-processing phase, a grammatical (syntactical) analyser is used.The system is speaker-independent, but there is a possibility to train it to recognize individual voice. The training is made by using a special text (which was created specially for this purpose). Original methods and algorithms: division of vocal segments into "frames", i.e., almost periodical segments; segmentation into fragments consisting of whole number of phonemes (this allows to reduce the number of variants of analysis); supple spectral representation of speech signals (this allows to increase the precision of formant trajectory detection).  
Languages processed:  
  Currently the system is used to process Russian and English speech, but it may be also applied to other languages.
Interface – in Russian and in English
 
Type of processing:  
  Neural nets; original methods of speech signal segmentation and formant selection; speech recognition based on triphones; method of formant estimation; syntactic analyser.  
Hardware/software requirements:
  IBM PC x86, Pentium, RAM 32 Mb, Windows 3.11, 95/98/NT. Sound Blaster. Microphone must be fixed on speaker’s head and connected directly to digitising unit.  
Distribution: Prototype  

Developer team: Contact person:  
Cognitive Technologies Vladimir L. Arlazarov
Phone: (+7 095) 135 50 88, 135 55 10
Fax: (+7 095) 135 50 88
E-mail: arl@cgntv.dol.ru