News & Events

TALK Bayesian Group Sparse Learning
Date & Time: Monday, January 28, 2013; 11:00 AM
Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
Research Area: Speech & Audio
Abstract
- Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
TALK Estimation of time-varying parameters in nonlinear systems: Application to building systems and Extremum seeking control
Date & Time: Tuesday, December 18, 2012; 12:00 PM
Speaker: Prof. Martin Guay, Queen's University
Abstract
- In this presentation, an adaptive estimation technique for the estimation of time-varying parameters for a class of continuous-time nonlinear system is proposed. In the first part of the talk, we present an application of the estimation routine for the estimation of unknown heat loads and heat sinks in building systems. The technique proposed is a set-based adaptive estimation that can be used to estimate the time-varying parameters along with an uncertainty set. The proposed method is such that the uncertainty set update is guaranteed to contain the true value of the parameters. Unlike existing techniques that rely on the use of polynomial approximations of the time-varying behaviour of the parameters, the proposed technique does not require a functional representation of the time-varying behaviour of the parameter estimates.
  
  In the second part of the talk, we consider the application of the estimation technique for the solution of a class of real-time optimization problems. It is assumed that the equations describing the dynamics of the nonlinear system and the cost function to be minimized are unknown and that the objective function is measured. The main contribution is to formulate the extremum-seeking problem as a time-varying estimation problem. The proposed approach is shown to avoid the need for averaging results which minimizes the impact of the choice of dither signal on the performance of the extremum seeking control system.
TALK Electromagnetic Remote Sensing for the Detection of Concealed Objects
Date & Time: Thursday, December 13, 2012; 12:00 PM
Speaker: Dr. Tomasz M. Grzegorczyk, Delpsi LLC
MERL Host: Anthony Vetro
Abstract
- Electromagnetic (EM) remote sensing is a well-established modality for the detection, tracking, and identification of concealed targets. The degree of freedom offered by the operating frequency (and the associated propagation or induction regimes) make EM waves sufficiently versatile to interrogate both large as well as small structures, metallic as well as dielectric objects, in close proximity or further away. This wide flexibility has made EM remote sensing a modality of choice in many applications. This presentation will focus on two implementations of non-destructive and non-contact EM sensing. The first is based on a tomographic approach, whereby EM waves are used to infer material properties within the volume of accessible structures. The two examples to be discussed are breast cancer detection, i.e. locating areas of high vascularity in otherwise healthy biological tissues, and inspection of concrete structures, i.e. identifying volumetric material property variations to locate rebars and cracks. The second area we will discuss is that of subsurface target detection, with again two very different applications. The first pertains to ground penetrating radars with frequencies in the GHz aimed at the detection of buried weak dielectric scatterers, whereas the second focuses on the detection of metallic targets in the magnetic induction regime, for which much lower frequencies are used. In all these applications, the data collected by the appropriate hardwares are processed by combining fundamental EM concepts with inverse methods for parameter estimation. We will discuss both a deterministic method -- Gauss-Newton -- and a stochastic method -- Kalman filters for real time target detection.
TALK Speech recognition for closed-captioning
Date & Time: Tuesday, December 11, 2012; 12:00 PM
Speaker: Takahiro Oku, NHK Science & Technology Research Laboratories
Research Area: Speech & Audio
Abstract
- In this talk, I will present human-friendly broadcasting research conducted in NHK and research on speech recognition for real-time closed-captioning. The goal of human-friendly broadcasting research is to make broadcasting more accessible and enjoyable for everyone, including children, elderly, and physically challenged persons. The automatic speech recognition technology that NHK has developed makes it possible to create captions for the hearing impaired in real-time automatically. For sports programs such as professional sumo wrestling, a closed-captioning system has already been implemented in which captions are created by using speech recognition on a captioning re-speaker. In 2011, NHK General Television started broadcasting of closed captions for the information program "Morning Market". After the introduction of the implemented closed-captioning system, I will talk about our recent improvement obtained by an adaptation method that creates a more effective acoustic model using error correction results. The method reflects recognition error tendencies more effectively.
TALK Sensitive Manipulation
Date & Time: Thursday, November 15, 2012; 12:00 PM
Speaker: Dr. Eduardo Torres-Jara, Worcester Polytechnic Institute
Research Area: Computer Vision
Abstract
- This talk presents an alternative approach to robotic manipulation. In this approach, manipulation is mainly guided by tactile feedback as opposed to vision. The motivation behind this approach stems from the fact that manipulating an object necessarily implies coming into contact with it. As a result, directly sensing physical contact seems more important than vision to control the interaction of the object and the robot. In this work, the traditional approach of a highly precise arm guided by a vision system is replaced by one that uses a low mechanical impedance arm with dense tactile sensing and exploration capabilities.
  
  The robots OBRERO and GoBot have been built to implement this approach. We have developed a novel tactile sensing technology and mounted our sensors on the robots' hands. These sensors are biologically inspired and present adequate features for manipulation. The success of this approach is shown by picking up objects in a poorly modeled environment. This task, simple for humans, has been a challenge for robots. The robot can deal with new, unmodeled objects. Specifically, OBRERO can gently contact, explore, lift, and place an object in a different location. It can also detect basic slippage and external forces acting on an object while it is held. These tasks can be performed successfully with very light objects, without fixtures, and on slippery surfaces. Similarly, GoBot is capable of manipulating small objects such as the stones in the game GO. Both OBRERO and GoBot perform all of their manipulations using tactile feedback.
TALK Robust Preconditioners for a boundary control elliptic problem
Date & Time: Wednesday, November 7, 2012; 12:00 PM
Speaker: Prof. Marcus Sarkis, Worcester Polytechnic Institute
Abstract
- We discuss the following problem: Given a target function on a domain, what is the Neumann data on the boundary so that its harmonic extension into the domain is the closest function to the target function in the L2 norm? For convex polygonal domains, we show that regularization is not needed in case the space for the Neumann data is chosen properly. In the second part of the talk we discuss solvers for the associated discrete Hessian which are robust with respect to regularization parameters and mesh sizes.
TALK Understanding Audition via Sound Analysis and Synthesis
Date & Time: Wednesday, October 24, 2012; 11:45 AM
Speaker: Josh McDermott, MIT, BCS
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Recognizing and Classifying Environmental Sounds
Date & Time: Wednesday, October 24, 2012; 11:00 AM
Speaker: Prof. Dan Ellis, Columbia University
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Self-Organizing Units (SOUs): Training Speech Recognizers Without Any Transcribed Audio
Date & Time: Wednesday, October 24, 2012; 2:15 PM
Speaker: Dr. Herb Gish, BBN - Raytheon
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Latent Topic Modeling of Conversational Speech
Date & Time: Wednesday, October 24, 2012; 1:30 PM
Speaker: Dr. Timothy J. Hazen and David Harwath, MIT Lincoln Labs / MIT CSAIL
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Zero-Resource Speech Pattern and Sub-Word Unit Discovery
Date & Time: Wednesday, October 24, 2012; 9:10 AM
Speaker: Prof. Jim Glass and Chia-ying Lee, MIT CSAIL
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK A new class of dynamical system models for speech and audio
Date & Time: Wednesday, October 24, 2012; 4:05 PM
Speaker: Dr. John R. Hershey, MERL
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Factorial Hidden Restricted Boltzmann Machines for Noise Robust Speech Recognition
Date & Time: Wednesday, October 24, 2012; 3:20 PM
Speaker: Dr. Steven J. Rennie, IBM Research
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Advances in Acoustic Modeling at IBM Research: Deep Belief Networks, Sparse Representations
Date & Time: Wednesday, October 24, 2012; 9:55 AM
Speaker: Dr. Tara Sainath, IBM Research
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Interactive Visual Analysis for Engineering Applications
Date & Time: Thursday, October 11, 2012; 12:00 PM
Speaker: Kresimir Matkovic, VRVis Research Center, Vienna
Abstract
- Increasing complexity and a large number of control parameters make the design and understanding of modern engineering systems impossible without simulation today. Advances in simulation technology and ability to run multiple simulations with different sets of parameters poses new challenges for analysis techniques. In this talk we will present our experiences in exploration and analysis of simulation ensembles realized in several projects with experts from automotive, meteorology, and medical domains. We tightly integrate simulation, numerical optimization, and interactive visual analysis in a unified framework. Our new data model supports families of curves and families of surfaces. Accompanying interactive visual analysis techniques offer new possibilities for data exploration and analysis. It is possible to start with a simple analysis, to continue with identifying hidden features, and finally to explore very complex dependencies using advanced interaction and on-the-fly data derivation and aggregation. All proposed techniques will be illustrated using a coordinated multiple views system and real-life data from various projects with scientists and engineers, including the optimization of an automotive rail injection system.
TALK Non-negative Hidden Markov Modeling of Audio
Date & Time: Thursday, October 11, 2012; 2:30 PM
Speaker: Dr. Gautham J. Mysore, Adobe
Research Area: Speech & Audio
Abstract
- Non-negative spectrogram factorization techniques have become quite popular in the last decade as they are effective in modeling the spectral structure of audio. They have been extensively used for applications such as source separation and denoising. These techniques however fail to account for non-stationarity and temporal dynamics, which are two important properties of audio. In this talk, I will introduce the non-negative hidden Markov model (N-HMM) and the non-negative factorial hidden Markov model (N-FHMM) to model single sound sources and sound mixtures respectively. They jointly model the spectral structure and temporal dynamics of sound sources, while accounting for non-stationarity. I will also discuss the application of these models to various applications such as source separation, denoising, and content based audio processing, showing why they yield improved performance when compared to non-negative spectrogram factorization techniques.
TALK Tensor representation of speaker space for arbitrary speaker conversion
Date & Time: Thursday, September 6, 2012; 12:00 PM
Speaker: Dr. Daisuke Saito, The University of Tokyo
Research Area: Speech & Audio
Abstract
- In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
TALK Challenges on shape acquisition of moving object
Date & Time: Friday, August 17, 2012; 12:00 PM
Speaker: Prof. Hiroshi Kawasaki, Kagoshima University
Research Area: Computer Vision
Abstract
- In this talk, I will introduce an overview of my research projects on 3D shape acquisition of moving object. The talk mainly focuses on two parts, the first one is about our 3D shape acquisition technique using projector and camera system and the second is entire shape acquisition using multi-view pro-cam system. I also briefly cover the following topics:
  
  -- Theory of shape from coplanarity technique
  -- Texture recovery method on pro-cam system
  -- Future plan on medical application of our scanner
  
  Those researches are jointly researched by Prof. Katushi Ikeuchi (Univ. of Tokyo), Prof. Ryo Furukawa (Hiroshima city Univ) and Prof. Ryusuke Sagawa (AIST).
TALK Communication Systems for Oilfield Applications
Date & Time: Tuesday, August 7, 2012; 12:00 PM
Speaker: Dr. Julius Kusuma, Schlumberger-Doll Research
MERL Host: Petros T. Boufounos
Abstract
- The oilfield is a rich area for research and engineering in communication and signal processing. Communication over non-standard channels, using constrained sources, noisy environments, and limited computational and energy resources, are some of the key challenges in this domain. In this talk I will give an introduction first on the role of science and technology, in particular communication and signal processing, in the oilfield. Due to its unique role in the industry, Schlumberger has a rich variety of communication systems over EM wireless, wired, acoustic, and even fluid pressure channels.
  
  In this talk we give a brief tour of some of the state-of-the-art and showcase how technology has revolutionized the practice of the industry, enabling innovations such as horizontal drilling, logging-while-drilling, and well-placement. At the same time, we give a tutorial on how the lifecycle of a reservoir is managed, including imaging, drilling, logging, sampling, testing, and completing. Throughout, we will show how communication has revolutionized the practice in the industry.
TALK Feedback Particle Filter and its Applications
Date & Time: Wednesday, August 1, 2012; 12:00 PM
Speaker: Prof. Prashant Mehta, University of Illinois at Urbana-Champaign
MERL Host: Scott A. Bortoff
Abstract
- In my talk, I will present a self-contained introduction to nonlinear filtering, and describe some recent developments. Specifically, I will introduce the feedback particle filter and show how it admits an innovations error-based feedback control structure. The control is chosen so that the posterior distribution of any particle matches the posterior distribution of the true state given the observations. The subject of my talk is a new formulation of nonlinear filter (for Bayesian inference) that is based on concepts from optimal control and mean-field game theory. Nonlinear filtering is important to many applications in engineering, biology, economics, atmospheric sciences and neuroscience. Several applications will be described to illustrate the theoretical concepts.
  
  This is joint work with Tao Yang and Sean Meyn at the University of Illinois.
TALK Nonparametric Bayesian Latent Variable Models
Date & Time: Friday, July 27, 2012; 12:00 PM
Speaker: Mingyuan Zhou, Duke University
MERL Host: Dehong Liu
Abstract
- Bayesian nonparametrics, using stochastic processes as prior distributions, is a relatively young and rapidly growing research area in statistics and machine learning. In this talk, we first briefly review completely random measures, a family of pure-jump non-negative stochastic processes that are simple to construct and amenable for posterior computation. We then present nonparametric Bayesian latent variable models based on the beta process, Bernoulli process, gamma process, Poisson process, and in particular, the negative binomial process. Specifically, for continuous data, we discuss dictionary learning with the beta-Bernoulli process and dependent hierarchical beta process, and for count data, we present the beta-negative binomial process and Poisson factor analysis. Furthermore, we discuss how the seeming disjoint count and mixture modelings can be united under the negative binomial processes framework, providing new opportunities to build mixture and hierarchical mixture models with better data fitting, more efficient inference and more flexible model constructions. We show successful applications of our nonparametric Bayesian latent variable models to image processing, topic modeling and count data analysis.
TALK A Pole-Placement Approach to the Design of Robust Linear Multivariable Control Systems
Date: Thursday, July 19, 2012
Speaker: Rick Vaccaro, University of Rhode Island
MERL Host: Scott A. Bortoff
Abstract
- The ability to directly specify the closed-loop poles of a multivariable control system is a major benefit of pole-placement algorithms for calculating state-feedback and observer gains. The drawback of these algorithms is the lack of any guarantee on the stability robustness of the resulting control system. The optimal control approach for calculating state-feedback gains (LQR) has a certain guaranteed robustness, but adding an observer (i.e. Kalman filter, LQG) can result in arbitrarily poor robustness. In this talk, a new pole-placement approach is introduced for calculating state-feedback and observer gains. The new approach optimizes robustness and gives impressive results, particularly for output feedback, observer-based control systems.
TALK Threat Assessment and Semi-Autonomous Control of Manned and Unmanned Vehicles
Date & Time: Monday, July 16, 2012; 2:00 PM
Speaker: Dr. Karl Iagnemma, Director, MIT Robotic Mobility Group
MERL Host: Stefano Di Cairano
Abstract
- Operator error is a significant factor in a majority of manned and unmanned vehicle accidents. In this talk, a framework for semi-autonomous vehicle accident avoidance will be presented that has been shown to effectively mitigate collisions caused by operator error. The framework analyzes sensor data (from vision and/or LIDAR data) to identify "no go" regions in the environment, and automatically synthesize constraints on vehicle position. An optimal trajectory and associated control inputs are then found via linear or nonlinear model predictive control. The "threat" to the vehicle is quantified from various metrics computed over the optimal trajectory. A number of approaches for arbitrating between operator and control system authority, based on the predicted threat, will be discussed. Extensive simulation and experimental testing will be described for both manned and unmanned scenarios. Future directions in threat assessment and semi-autonomous control, based on the integration of vision-based sensing and active steering control, will also be discussed.
TALK Applications of Mobile Augmented Reality and Pervasive Computing in Architecture, Engineering, and Construction
Date & Time: Tuesday, July 10, 2012; 11:00 AM
Speaker: Prof Vineet Kamat, University of Michigan
Research Area: Computer Vision
Abstract
- This talk will present ongoing research at the University of Michigan Laboratory for Interactive Visualization in Engineering (LIVE) that is exploring applications of mobile pervasive computing and visualization in design, engineering, and construction. Findings from three specific research projects will be presented: Interactive Visualization of Construction Operations in Mobile Outdoor Augmented Reality; Rapid Building Damage Evaluation using Augmented Reality and Structural Simulation; and Location-Aware Contextual Information Access and Retrieval for Rapid On-Site Decision Making. In each case, the development of fundamental algorithms, their implementation as reusable and modular software, and their implementation in the engineering applications will be described.
TALK Quadratic Gaussian Multiterminal Source Coding
Date & Time: Friday, July 6, 2012; 12:00 PM
Speaker: Zixiang Xiong, Texas A&M University
MERL Host: Anthony Vetro
Abstract
- Driven by a host of emerging applications, distributed source coding has assumed renewed interest in the past decade. Although the Slepian-Wolf theorem has been known for almost 40 years and progresses have been made recently on the rate region of quadratic Gaussian two-terminal source coding, finding the sum-rate bound of quadratic Gaussian multiterminal source coding with more than two terminals is still an open problem. In this talk, I'll briefly go over existing results on distributed source coding problems before describing a set of new results we obtained recently.