Another solution is to make more robust the parameters used to represent vocal forms. This solution does not solve problems such as the inclusion of an intruder word pronounced by another person or by the user himself during the learning phase. The European patent application 0,, describes a learning process in which a recognition test is run on the first pronunciation of the new word by the user.

If another word reference dictionary is recognized during this test, the user is notified that the word he has just said is too similar to another word from the dictionary. If the test does not lead to the recognition of another word in the dictionary, the user is asked to repeat the new word. The treatments carried out on rehearsals do not involve recognition test.

A rejection model "garbage model" is simply used to "explain" speech portions that are not part of the new previously formed word model. In other words, the model under development and release model used to achieve an appropriate segmentation to filter sounds possibly issued by a hesitant or awkward user.

With this segmentation, the current model of learning is updated and reviewed to see if the update took place in good conditions. Unlike the first test run on the pronunciation of the word, that verification of "good" update does not include any recognition test based on the entire dictionary, including previously learned words. An object of the present invention is to achieve a good quality learning from a relatively small number of pronunciations of words to memorize.

The invention thus provides a learning method for a speech recognition system using recognition tests, each test recognition by matching a segment of speech provided to the system with at least one set of parameters associated with a reference and stored in a reference dictionary. The learning method comprises, for a set of parameters to be stored in the dictionary in association with a reference, obtaining several speech segments successively uttered by a speaker and the processing of such speech segments to estimate said parameter set.

Use recognition tests during the learning phase enables discrimination net utterances and those affected by noise. These can be removed to perform the calculations of the parameters to be stored in the dictionary, but only if the user having to completely start learning for the word or speech segment. The number of pronunciations required can remain limited, the existing structure of the dictionary is taken into account to accept some variability of speech on the part of the user.

The update of the stored draft is thus carried out as and when additional segments of speech uttered by the speaker. The speech recognition system shown in Figure 1 has a user interface consisting of a microphone 4, a keypad 5, a display 6 and a speaker 7. The acoustic signals delivered by the microphone 4 are supplied to a unit 8 for signal processing which provides the relevant parameters to the recognition unit 9 and the learning unit The keyboard 5, the screen 6 and the speaker 7 are used in particular in the dialogue between the user and the learning unit The recognition system further comprises a memory 11 constituting a reference dictionary.

In this dictionary, the learning unit 10 stores models in combination with references. In the recognition phase, the unit 9 analyzes the parameters provided by the unit 8 and makes them match a pattern, that is to say a set of parameters stored in the dictionary 11, and outputs the reference associated as output of the recognition system. This reference can then be used by the equipment of which the recognition system. The microphone 4, the keyboard 5, the screen 6 and the speaker 7 may be the ones which the telephone terminal further comprises.

Various types of parameters representative of the speech patterns, calculated by the unit 8, can be used in the context of the present invention. For example, they may be cepstral coefficients root.

The unit 8 signal processing can then do the following:. Each sequence of consecutive frames detected as carrying a voice activity normally corresponds to a segment of speech uttered by a speaker. For each segment, the unit 8 delivers a sequence of cepstral vectors that can treat the units 9 and In the exemplary embodiment described herein, each reference in the dictionary 11 is associated with a hidden Markov model characterized by a number of states and, for each state, by an observation probability density function of the cepstral vectors.

Ces lois sont par exemple des lois gaussiennes de dimension p. These laws are for example Gaussian distributions of dimension p. They are then each defined by a mean vector and variance matrix. A recognition test performed by the unit 9, on the basis of a sequence of cepstral vectors obtained from a segment of speech processed by the unit 8 is to identify the patterns of the reference dictionary 11 which maximizes the likelihood of the observation of said sequence of cepstral vectors.

Un moyen classique pour effectuer cette identification est d'utiliser l'algorithme de Viterbi. A classic way to perform this identification is to use the Viterbi algorithm.

For different models of the dictionary, a Viterbi trellis to determine the sequence of states that maximizes the probability of observation of the sequence of cepstral vectors. The model for which maximized the probability is the largest is finally selected, and the associated reference is issued by the recognition unit 9.

Some of the models contained in the dictionary 11 are rejected models artificially constructed to be preferentially retained by the unit 9 when the speech segment submitted to the recognition test does not match any of the words listed in the dictionary. The learning phase is to calculate the probability distributions of the parameters associated with the hidden Markov models. For each reference to memorize, the user is invited to deliver several times the word associated to provide the unit 10 a sufficient statistical observation to allow it to reliably assess the parameters of probability distributions for the various states of the model.

The methods used to make these estimates are classics. Among the different pronunciations of the word, some may be disturbed by noise. It should be ignored in the estimates of the model parameters, without which these estimates are not reliable. The need for a large number of word repetitions should be avoided since it would tend to make use of the very tedious system. So to filter those affected parasites rehearsals, the invention uses similar recognition tests to those made in the recognition phase.

