Active Perception: Human Interface

ChAff

Summary

Imagine a robot clumsily interrupts a meeting. This disturbance causes the speaker to vocally and angrily chastise the robot's behavior. Further imagine that the robot was able to react by apologizing and changing its behavior. To realize this scenario, the goal of the ChAff project is to design an FPGA to classify speech in real-time according to prosodic information.

The approach taken is to build upon existing features related to prosody. By performing simulations of real-time speech analysis we are able to find algorithms that are expedient. Following simulation, register transfer level representations of the prosody classifications are synthesized and run on a FPGA.

Currently, the system computes real-time estimates of speaking rate (syllables per second), pitch (fundamental frequency), and loudness (in dB). Future work centers on classifying the resulting trajectories in rate-pitch-loudness space.

Implementation of enrate speaking rate measure in simulink.

Reference

Reynolds, C., Ishikawa, M. and Tsujino, H. (2006) Realizing Affect in Speech Classification in Real-Time. Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, In conjunction with AAAI Fall Symposia, October 13 - 15, 2006, Washington, D.C., USA. [PDF]

ChAff

Summary

Reference

Research

Laboratory

Material

Page


Ishikawa Group Laboratory Research Institute for Science & Technology, Tokyo University of Science Ishikawa Group Laboratory WWW admin: contact Copyright © 2008 Ishikawa Group Laboratory. All rights reserved.