TR2010-016

Synthesizing Speech from Doppler Signals


Abstract:

It has long been considered a desirable goal to be able to construct an intelligible speech signal merely by observing the talker in the act of speaking. Past methods at performing this have been based on camera-based observations of the talker's face, combined with statistical methods that infer the speech signal from the facial motion captured by the camera. Other methods have included synthesis of speech from measurements taken by electro-myelo graphs and other devices that are tethered to the talker - an undesirable setup. In this paper we present a new device for synthesizing speech from characterizations of facial motion associated with speech - a Doppler sonar. Facial movement is characterized through Doppler frequency shifts in a tone that is incident on the talker's face. These frequency shifts are used to infer the underlying speech signal. The setup is farfield and untethered, with the sonar acting from the distance of a regular desktop microphone. Preliminary experimental evaluations show that the mechanism is very promising - we are able to synthesize reasonable speech signals, comparable to those obtained from tethered devices such as EMGs.

 

  • Related News & Events

    •  NEWS    ICASSP 2010: 9 publications by Anthony Vetro, Shantanu D. Rane and Petros T. Boufounos
      Date: March 14, 2010
      Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
      MERL Contacts: Anthony Vetro; Petros T. Boufounos
      Brief
      • The papers "Privacy and Security of Features Extracted from Minutiae Aggregates" by Nagar, A., Rane, S.D. and Vetro, A., "Hiding Information Inside Structured Shapes" by Das, S., Rane, S.D. and Vetro, A., "Ultrasonic Sensing for Robust Speech Recognition" by Srinivasan, S., Raj, B. and Ezzat, T., "Reconstruction of Sparse Signals from Distorted Randomized Measurements" by Boufounos, P.T., "Disparity Search Range Estimation: Enforcing Temporal Consistency" by Min, D., Yea, S., Arican, Z. and Vetro, A., "Synthesizing Speech from Doppler Signals" by Toth, A.R., Raj, B., Kalgaonkar, K. and Ezzat, T., "Spectrogram Dimensionality Reduction with Independence Constraints" by Wilson, K.W. and Raj, B., "Robust Regression using Sparse Learning for High Dimensional Parameter Estimation Problems" by Mitra, K., Veeraraghavan, A.N. and Chellappa, R. and "Subword Unit Approaches for Retrieval by Voice" by Gouvea, E., Ezzat, T. and Raj, B. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
    •