Viseme Classification Using High-Frame-Rate Vision


Automated lip reading, the speech recognition based on visual domain information of human speech, is difficult but challenging problem. In recent researches, it is suggested that the classification framework using the adaboost algorithm is efficient method to improve the recognition rate.

We applied this learning algorithm to the Japanese consonant classification problem. As a weak classifier for the adaboost, we introduced features based on the high-speed vision capturing the object at a high-frame-rate such as 300 frames per second. Experiments showed the effectiveness of this framework and the features.

Ishikawa Group Laboratory, Data Science Research Division, Information Technology Center, University of Tokyo / Tokyo University of Science
