System for discrete speech recognition
System for discrete speech recognition
| First version: | 1994 | Last version: | 1997 |
| Application: | ||||
| Recognition of dictation-like speech, with limited dictionary, in noisy conditions. The prototype can be used for development of oral control systems, dictation systems, etc. | ||||
| Description: | ||||
| At the input, the system accepts oral speech, live or recorded in a file of RIFF (WAV) format. The system is able to recognize "discrete" speech, i.e. speech consisting of separate words, with pauses between words (not less than 200 ms). The mode of pronunciation must be relatively slow, and all words must be pronounced entirely (without truncation). The system recognizes about 500 words in all grammatical forms using a built-in dictionary of wordforms (about 15 000). In order to improve the recognition the system takes into account contextual characteristics of sound realisation and uses a built-in dictionary of sound segments. At the post-processing phase, a grammatical (syntactical) analyser is used.The system is speaker-independent, but there is a possibility to train it to recognize individual voice. The training is made by using a special text (which was created specially for this purpose). Original methods and algorithms: division of vocal segments into "frames", i.e., almost periodical segments; segmentation into fragments consisting of whole number of phonemes (this allows to reduce the number of variants of analysis); supple spectral representation of speech signals (this allows to increase the precision of formant trajectory detection). | ||||
| Languages processed: | ||||
| Currently the system is used to process
Russian and English speech, but it may be also applied to
other languages. Interface in Russian and in English |
||||
| Type of processing: | ||||
| Neural nets; original methods of speech signal segmentation and formant selection; speech recognition based on triphones; method of formant estimation; syntactic analyser. | ||||
| Hardware/software requirements: | ||||
| IBM PC x86, Pentium, RAM 32 Mb, Windows 3.11, 95/98/NT. Sound Blaster. Microphone must be fixed on speakers head and connected directly to digitising unit. | ||||
| Distribution: | Prototype | |||
| Developer team: | Contact person: | |||
| Cognitive Technologies | Vladimir L. Arlazarov Phone: (+7 095) 135 50 88, 135 55 10 Fax: (+7 095) 135 50 88 E-mail: arl@cgntv.dol.ru |
|||