-   Date & Time: Thursday, May 7, 2020; 11:00 AM
 Speaker: Prof. Petar Popovski, Aalborg University, Denmark
 MERL Host: Toshiaki Koike-Akino
 Research Areas: Artificial Intelligence, Communications, Machine Learning, Signal Processing, Information Security
   Abstract    The wireless landscape evolves towards supporting a large population of connections for humans and machines with very diverse features and requirements. Perhaps the main motivation of 5G wireless systems is its flexibility to support heterogeneous connectivity requirements: enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultra-reliable low-latency communications (URLLC). However, this classification is rather limited and is currently undergoing a revision within the research community. The first part of this talk will discuss how this heterogeneity can be revised and which opportunities it opens with respect to spectrum usage. The second part of the talk will deal with performance guarantees of wireless services and, specifically, ultra-reliable communication and outline the importance of machine learning in that context. The final part of the talk will provide a broader view on the evolution of wireless connectivity, including aspects that are implied by the resistance to the deployment of 5G, but also the new opportunities that can transform the way we build and utilize connected systems. The wireless landscape evolves towards supporting a large population of connections for humans and machines with very diverse features and requirements. Perhaps the main motivation of 5G wireless systems is its flexibility to support heterogeneous connectivity requirements: enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultra-reliable low-latency communications (URLLC). However, this classification is rather limited and is currently undergoing a revision within the research community. The first part of this talk will discuss how this heterogeneity can be revised and which opportunities it opens with respect to spectrum usage. The second part of the talk will deal with performance guarantees of wireless services and, specifically, ultra-reliable communication and outline the importance of machine learning in that context. The final part of the talk will provide a broader view on the evolution of wireless connectivity, including aspects that are implied by the resistance to the deployment of 5G, but also the new opportunities that can transform the way we build and utilize connected systems.
 
 
-  
-   Date & Time: Thursday, May 7, 2020; 12:00 PM
 Speaker: Christopher Rackauckas, MIT
 MERL Host: Christopher R. Laughman
 Research Areas: Machine Learning, Multi-Physical Modeling, Optimization
   Abstract   - In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." Scientific models, such as Newtonian physics or biological gene regulatory networks, are human-driven simplifications of complex phenomena that serve as surrogates for the countless experiments that validated the models. Recently, machine learning has been able to overcome the inaccuracies of approximate modeling by directly learning the entire set of nonlinear interactions from data. However, without any predetermined structure from the scientific basis behind the problem, machine learning approaches are flexible but data-expensive, requiring large databases of homogeneous labeled training data. A central challenge is reco nciling data that is at odds with simplified models without requiring "big data". In this talk we discuss a new methodology, universal differential equations (UDEs), which augment scientific models with machine-learnable structures for scientifically-based learning. We show how UDEs can be utilized to discover previously unknown governing equations, accurately extrapolate beyond the original data, and accelerate model simulation, all in a time and data-efficient manner. This advance is coupled with open-source software that allows for training UDEs which incorporate physical constraints, delayed interactions, implicitly-defined events, and intrinsic stochasticity in the model. Our examples show how a diverse set of computationally-difficult modeling issues across scientific disciplines, from automatically discovering biological mechanisms to accelerating climate simulations by 15,000x, can be handled by training UDEs.
 
 
-  
-   Date & Time: Tuesday, July 16, 2019; 12:00 PM
 Speaker: Prof. Jeff Linderoth, University of Wisconsin-Madison
 MERL Host: Arvind Raghunathan
 Research Areas: Machine Learning, Optimization
   Abstract    Algorithms to solve mixed integer linear programs have made incredible progress in the past 20 years.  Key to these advances has been a mathematical analysis of the structure of the set of feasible solutions.  We argue that a similar analysis is required in the case of mixed integer quadratic programs, like those that arise in sparse optimization in machine learning.  One such analysis leads to the so-called perspective relaxation, which significantly improves solution performance on separable instances.  Extensions of the perspective reformulation can lead to algorithms that are equivalent to some of the most popular, modern, sparsity-inducing non-convex regularizations in variable selection. Based on joint work with Hongbo Dong (Washington State Univ. ), Oktay Gunluk (IBM), and Kun Chen (Univ. Connecticut). Algorithms to solve mixed integer linear programs have made incredible progress in the past 20 years.  Key to these advances has been a mathematical analysis of the structure of the set of feasible solutions.  We argue that a similar analysis is required in the case of mixed integer quadratic programs, like those that arise in sparse optimization in machine learning.  One such analysis leads to the so-called perspective relaxation, which significantly improves solution performance on separable instances.  Extensions of the perspective reformulation can lead to algorithms that are equivalent to some of the most popular, modern, sparsity-inducing non-convex regularizations in variable selection. Based on joint work with Hongbo Dong (Washington State Univ. ), Oktay Gunluk (IBM), and Kun Chen (Univ. Connecticut).
 
 
-  
-   Date & Time: Thursday, February 14, 2019; 1:30 -3:00 PM
 Speaker: Avishai Weiss, MERL
 MERL Hosts: Stefano Di Cairano; Avishai Weiss
 Research Area: Control
   Abstract   - Avishai Weiss from MERL's Control and Dynamical Systems group will give a talk at Stanford's Aeronautics and Astronautics department  titled: "Low-Thrust GEO Satellite Station Keeping, Attitude Control, and Momentum Management via Model Predictive Control". Electric propulsion for satellites is much more fuel efficient than conventional methods. The talk will describe MERL's solution to the satellite control problems deriving from the low thrust provided by electric propulsion.
 
 
-  
-   Date & Time: Tuesday, March 6, 2018; 12:00 PM
 Speaker: Scott Wisdom, Affectiva
 MERL Host: Jonathan Le Roux
 Research Area: Speech & Audio
   Abstract    Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn. Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.
 
 In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
 
 
-  
-   Date & Time: Friday, February 2, 2018; 12:00
 Speaker: Dr. David Kaeli, Northeastern University
 MERL Host: Abraham Goldsmith
 Research Areas: Control, Optimization, Machine Learning, Speech & Audio
   Abstract    GPU computing is alive and well! The GPU has allowed researchers to overcome a number of computational barriers in important problem domains. But still, there remain challenges to use a GPU to target more general purpose applications. GPUs achieve impressive speedups when compared to CPUs, since GPUs have a large number of compute cores and high memory bandwidth. Recent GPU performance is approaching 10 teraflops of single precision performance on a single device. In this talk we will discuss current trends with GPUs, including some advanced features that allow them exploit multi-context grains of parallelism. Further, we consider how GPUs can be treated as cloud-based resources, enabling a GPU-enabled server to deliver HPC cloud services by leveraging virtualization and collaborative filtering. Finally, we argue for for new heterogeneous workloads and discuss the role of the Heterogeneous Systems Architecture (HSA), a standard that further supports integration of the CPU and GPU into a common framework. We present a new class of benchmarks specifically tailored to evaluate the benefits of features supported in the new HSA programming model. GPU computing is alive and well! The GPU has allowed researchers to overcome a number of computational barriers in important problem domains. But still, there remain challenges to use a GPU to target more general purpose applications. GPUs achieve impressive speedups when compared to CPUs, since GPUs have a large number of compute cores and high memory bandwidth. Recent GPU performance is approaching 10 teraflops of single precision performance on a single device. In this talk we will discuss current trends with GPUs, including some advanced features that allow them exploit multi-context grains of parallelism. Further, we consider how GPUs can be treated as cloud-based resources, enabling a GPU-enabled server to deliver HPC cloud services by leveraging virtualization and collaborative filtering. Finally, we argue for for new heterogeneous workloads and discuss the role of the Heterogeneous Systems Architecture (HSA), a standard that further supports integration of the CPU and GPU into a common framework. We present a new class of benchmarks specifically tailored to evaluate the benefits of features supported in the new HSA programming model.
 
 
-  
-   Date & Time: Wednesday, February 1, 2017; 12:00-13:00
 Speaker: Dr. Heiga ZEN, Google
 MERL Host: Chiori Hori
 Research Area: Speech & Audio
   Abstract    Recent progress in generative modeling has improved the naturalness of synthesized speech significantly. In this talk I will summarize these generative model-based approaches for speech synthesis such as WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems. Recent progress in generative modeling has improved the naturalness of synthesized speech significantly. In this talk I will summarize these generative model-based approaches for speech synthesis such as WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems.
 See https://deepmind.com/blog/wavenet-generative-model-raw-audio/ for further details.
 
 
-  
-   Date & Time: Tuesday, December 13, 2016; Noon
 Speaker: Yue M. Lu, John A. Paulson School of Engineering and Applied Sciences, Harvard University
 MERL Host: Petros T. Boufounos
 Research Areas: Computational Sensing, Machine Learning
   Abstract    In this talk, we will present a framework for analyzing, in the high-dimensional limit, the exact dynamics of several stochastic optimization algorithms that arise in signal and information processing. For concreteness, we consider two prototypical problems: sparse principal component analysis and regularized linear regression (e.g. LASSO). For each case, we show that the time-varying estimates given by the algorithms will converge weakly to a deterministic "limiting process" in the high-dimensional limit. Moreover, this limiting process can be characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithms. For example, performance metrics such as the MSE, the cosine similarity and the misclassification rate in sparse support recovery can all be obtained by examining the deterministic limiting process. A steady-state analysis of the nonlinear PDE also reveals interesting phase transition phenomena related to the performance of the algorithms. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions. In this talk, we will present a framework for analyzing, in the high-dimensional limit, the exact dynamics of several stochastic optimization algorithms that arise in signal and information processing. For concreteness, we consider two prototypical problems: sparse principal component analysis and regularized linear regression (e.g. LASSO). For each case, we show that the time-varying estimates given by the algorithms will converge weakly to a deterministic "limiting process" in the high-dimensional limit. Moreover, this limiting process can be characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithms. For example, performance metrics such as the MSE, the cosine similarity and the misclassification rate in sparse support recovery can all be obtained by examining the deterministic limiting process. A steady-state analysis of the nonlinear PDE also reveals interesting phase transition phenomena related to the performance of the algorithms. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions.
 
 
-  
-   Date & Time: Monday, December 12, 2016; 12:00 PM
 Speaker: Yanlai Chen, Department of Mathematics at the University of Massachusetts Dartmouth
 Research Areas: Control, Dynamical Systems
   Abstract   - Models of reduced computational complexity is indispensable in scenarios where a large number of numerical solutions to a parametrized problem are desired in a fast/real-time fashion. These include simulation-based design, parameter optimization, optimal control, multi-model/scale analysis, uncertainty quantification. Thanks to an offline-online procedure and the recognition that the parameter-induced solution manifolds can be well approximated by finite-dimensional spaces, reduced basis method (RBM) and reduced collocation method (RCM) can improve efficiency by several orders of magnitudes. The accuracy of the RBM solution is maintained through a rigorous a posteriori error estimator whose efficient development is critical and involves fast eigensolves. 
 
 In this talk, I will give a brief introduction of the RBM/RCM, and explain how they can be used for data compression, face recognition, and significantly delaying the curse of dimensionality for uncertainty quantification.
 
 
-  
-   Date & Time: Friday, December 2, 2016; 11:00 AM
 Speaker: Prof. Waheed Bajwa, Rutgers University
 MERL Host: Petros T. Boufounos
 Research Area: Computational Sensing
   Abstract    While distributed information processing has a rich history, relatively less attention has been paid to the problem of collaborative learning of nonlinear geometric structures underlying data distributed across sites that are connected to each other in an arbitrary topology. In this talk, we discuss this problem in the context of collaborative dictionary learning from big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and they are interested in collaboratively learning a low-dimensional geometric structure underlying these data. In contrast to some of the previous works on subspace-based data representations, we focus on the geometric structure of a union of subspaces (UoS). In this regard, we propose a distributed algorithm, termed cloud K-SVD, for collaborative learning of a UoS structure underlying distributed data of interest. The goal of cloud K-SVD is to learn an overcomplete dictionary at each individual site such that every sample in the distributed data can be represented through a small number of atoms of the learned dictionary. Cloud K-SVD accomplishes this goal without requiring communication of individual data samples between different sites. In this talk, we also theoretically characterize deviations of the dictionaries learned at individual sites by cloud K-SVD from a centralized solution. Finally, we numerically illustrate the efficacy of cloud K-SVD in the context of supervised training of nonlinear classsifiers from distributed, labaled training data. While distributed information processing has a rich history, relatively less attention has been paid to the problem of collaborative learning of nonlinear geometric structures underlying data distributed across sites that are connected to each other in an arbitrary topology. In this talk, we discuss this problem in the context of collaborative dictionary learning from big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and they are interested in collaboratively learning a low-dimensional geometric structure underlying these data. In contrast to some of the previous works on subspace-based data representations, we focus on the geometric structure of a union of subspaces (UoS). In this regard, we propose a distributed algorithm, termed cloud K-SVD, for collaborative learning of a UoS structure underlying distributed data of interest. The goal of cloud K-SVD is to learn an overcomplete dictionary at each individual site such that every sample in the distributed data can be represented through a small number of atoms of the learned dictionary. Cloud K-SVD accomplishes this goal without requiring communication of individual data samples between different sites. In this talk, we also theoretically characterize deviations of the dictionaries learned at individual sites by cloud K-SVD from a centralized solution. Finally, we numerically illustrate the efficacy of cloud K-SVD in the context of supervised training of nonlinear classsifiers from distributed, labaled training data.
 
 
-  
-   Date & Time: Friday, September 23, 2016; 12:00 PM- 1:00 PM
 Speaker: Dr. Earl McCune, Eridan Communications
 Research Areas: Communications, Signal Processing
   Abstract    To maximize the operating energy efficiency of any wireless communication link requires a global optimization not only across the entire block diagram, but also including the selected signal modulation and aspects of the link operating protocol.  Achieving this global optimization is first examined for the transmitter, receiver, and baseband circuitry.  Then the important aspects of signal modulation necessary to access these circuit optimizations, with examples, are presented, followed by the correspondingly important protocol aspects needed.  A metric called modulation-available energy efficiency (MAEE) compares proposed signals for compatibility with high energy efficiency objectives. To maximize the operating energy efficiency of any wireless communication link requires a global optimization not only across the entire block diagram, but also including the selected signal modulation and aspects of the link operating protocol.  Achieving this global optimization is first examined for the transmitter, receiver, and baseband circuitry.  Then the important aspects of signal modulation necessary to access these circuit optimizations, with examples, are presented, followed by the correspondingly important protocol aspects needed.  A metric called modulation-available energy efficiency (MAEE) compares proposed signals for compatibility with high energy efficiency objectives.
 
 
-  
-   Date & Time: Wednesday, August 17, 2016; 1 PM
 Speaker: Gilles Zerah, Centre Francais en Calcul Atomique et Moleculaire-Ile-de-France (CFCAM-IdF)
 Research Areas: Applied Physics, Electronic and Photonic Devices
   Abstract   - The first part of the talk is a high-level review of modern technologies for atomic-level modelling of materials. The second part discusses band gap calculations and MERL results for semi-conductors.
 
 
-  
-   Date & Time: Wednesday, July 13, 2016; 2:30 PM - 3:30
 Speaker: Richard Lehoucq, Sandia National Laboratories
 Research Areas: Computer Vision, Digital Video, Machine Learning
   Abstract    My presentation considers the research question of whether existing algorithms and software for the large-scale sparse eigenvalue problem can be applied to problems in spectral graph theory. I first provide an introduction to several problems involving spectral graph theory. I then provide a review of several different algorithms for the large-scale eigenvalue problem and briefly introduce the Anasazi package of eigensolvers. My presentation considers the research question of whether existing algorithms and software for the large-scale sparse eigenvalue problem can be applied to problems in spectral graph theory. I first provide an introduction to several problems involving spectral graph theory. I then provide a review of several different algorithms for the large-scale eigenvalue problem and briefly introduce the Anasazi package of eigensolvers.
 
 
-  
-   Date & Time: Thursday, July 7, 2016; 2:00 PM
 Speaker: Dr. Sonja Glavaski, Program Director, ARPA-E
 MERL Host: Arvind Raghunathan
 Research Area: Electric Systems
   Abstract    The evolution of the grid faces significant challenges if it is to integrate and accept more energy from renewable generation and other Distributed Energy Resources (DERs). To maintain grid's reliability and turn intermittent power sources into major contributors to the U.S. energy mix, we have to think about the grid differently and design it to be smarter and more flexible. The evolution of the grid faces significant challenges if it is to integrate and accept more energy from renewable generation and other Distributed Energy Resources (DERs). To maintain grid's reliability and turn intermittent power sources into major contributors to the U.S. energy mix, we have to think about the grid differently and design it to be smarter and more flexible.
 
 ARPA-E is interested in disruptive technologies that enable increased integration of DERs by real-time adaptation while maintaining grid reliability and reducing cost for customers with smart technologies. The potential impact is significant, with projected annual energy savings of more than 3 quadrillion BTU and annual CO2 emissions reductions of more than 250 million metric tons.
 
 This talk will identify opportunities in developing next generation control technologies and grid operation paradigms that address these challenges and enable secure, stable, and reliable transmission and distribution of electrical power.  Summary of newly announced ARPA-E NODES (Network Optimized Distributed Energy Systems) Program funding development of these technologies will be presented.
 
 
-  
-   Date & Time: Friday, June 3, 2016; 1:30PM - 3:00PM
 Speaker: Nobuaki Minematsu and Daisuke Saito, The University of Tokyo
 Research Area: Speech & Audio
   Abstract    Speech signals covey various kinds of information, which are grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech and speaker recognizers extract only speaker identity. Here, irrelevant features are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant features are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic information and extra-linguistic information. Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our answer to that question is introduced, called speech structure. Extra-linguistic variation can be modeled as feature space transformation and our speech structure is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. Speech structure has been applied to accent clustering, speech recognition, and language identification. These applications are also explained in the talk. Speech signals covey various kinds of information, which are grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech and speaker recognizers extract only speaker identity. Here, irrelevant features are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant features are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic information and extra-linguistic information. Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our answer to that question is introduced, called speech structure. Extra-linguistic variation can be modeled as feature space transformation and our speech structure is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. Speech structure has been applied to accent clustering, speech recognition, and language identification. These applications are also explained in the talk.
 
 
-  
-   Date & Time: Friday, May 13, 2016; 12:00 PM
 Speaker: Oleg Iliev, Fraunhofer Institute for Industrial Mathematics, ITWM
 Research Area: Dynamical Systems
   Abstract    Li-ion batteries are widely used in automotive industry, in electronic devices, etc. In this talk we will discuss challenges related to the multiscale nature of batteries, mainly the understanding of processes in the porous electrodes at pore scale and at macroscale. A software tool for simulation of isothermal and non-isothermal electrochemical processes in porous electrodes will be presented. The pore scale simulations are done on 3D images of porous electrodes, or on computer generated 3D microstructures, which have the same characterization as real porous electrodes. Finite Volume and Finite Element algorithms for the highly nonlinear problems describing processes at pore level will be shortly presented. Model order reduction, MOR, empirical interpolation method, EIM-MOR algorithms for acceleration of the computations will be discussed, as well as the reduced basis method for studying parameters dependent problems. Next, homogenization of the equations describing the electrochemical processes at the pore scale will be presented, and the results will be compared to the engineering approach based on Newman's 1D+1D model. Simulations at battery cell level will also be addressed. Finally, the challenges in modeling and simulation of degradation processes in the battery will be discussed and our first simulation results in this area will be presented. Li-ion batteries are widely used in automotive industry, in electronic devices, etc. In this talk we will discuss challenges related to the multiscale nature of batteries, mainly the understanding of processes in the porous electrodes at pore scale and at macroscale. A software tool for simulation of isothermal and non-isothermal electrochemical processes in porous electrodes will be presented. The pore scale simulations are done on 3D images of porous electrodes, or on computer generated 3D microstructures, which have the same characterization as real porous electrodes. Finite Volume and Finite Element algorithms for the highly nonlinear problems describing processes at pore level will be shortly presented. Model order reduction, MOR, empirical interpolation method, EIM-MOR algorithms for acceleration of the computations will be discussed, as well as the reduced basis method for studying parameters dependent problems. Next, homogenization of the equations describing the electrochemical processes at the pore scale will be presented, and the results will be compared to the engineering approach based on Newman's 1D+1D model. Simulations at battery cell level will also be addressed. Finally, the challenges in modeling and simulation of degradation processes in the battery will be discussed and our first simulation results in this area will be presented.
 
 This is joint work with A.Latz (DLR), M.Taralov, V.Taralova, J.Zausch, S.Zhang from Fraunhofer ITWM, Y.Maday  from LJLL, Paris 6 and Y.Efendiev from Texas A&M.
 
 
-  
-   Date & Time: Friday, April 29, 2016; 12:00 PM - 1:00 PM
 Speaker: Yu Zhang, MIT
 Research Area: Speech & Audio
   Abstract    A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions. A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions.
 
 
-  
-   Date & Time: Tuesday, March 15, 2016; 12:00 PM - 12:45 PM
 Speaker: Prof. Kazuya Takeda, Nagoya University
 Research Area: Speech & Audio
   Abstract    Thanks to advanced "internet of things" (IoT) technologies, situation-specific human behavior has become an area of development for practical applications involving signal processing. One important area of development of such practical applications is driving behavior research. Since 1999, I have been collecting driving behavior data in a wide range of signal modalities, including speech/sound, video, physical/physiological sensors, CAN bus, LIDAR and GNSS. The objective of this data collection is to evaluate how well signal models can represent human behavior while driving. In this talk, I would like to summarize our 10 years of study of driving behavior signal processing, which has been based on these signal corpora. In particular, statistical signal models of interactions between traffic contexts and driving behavior, i.e., stochastic driver modeling, will be discussed, in the context of risky lane change detection. I greatly look forward to discussing the scalability of such corpus-based approaches, which could be applied to almost any traffic situation. Thanks to advanced "internet of things" (IoT) technologies, situation-specific human behavior has become an area of development for practical applications involving signal processing. One important area of development of such practical applications is driving behavior research. Since 1999, I have been collecting driving behavior data in a wide range of signal modalities, including speech/sound, video, physical/physiological sensors, CAN bus, LIDAR and GNSS. The objective of this data collection is to evaluate how well signal models can represent human behavior while driving. In this talk, I would like to summarize our 10 years of study of driving behavior signal processing, which has been based on these signal corpora. In particular, statistical signal models of interactions between traffic contexts and driving behavior, i.e., stochastic driver modeling, will be discussed, in the context of risky lane change detection. I greatly look forward to discussing the scalability of such corpus-based approaches, which could be applied to almost any traffic situation.
 
 
-  
-   Date & Time: Tuesday, March 15, 2016; 12:45 PM - 1:30 PM
 Speaker: Prof. Hirofumi Aoki, Nagoya University
 Research Area: Speech & Audio
   Abstract    Driving requires a complex skill that is involved with the vehicle itself (e.g., speed control and instrument operation), other road users (e.g., other vehicles, pedestrians), surrounding environment, and so on. During driving, visual cues are the main source to supply information to the brain. In order to stabilize the visual information when you are moving, the eyes move to the opposite direction based on the input to the vestibular system. This involuntary eye movement is called as the vestibulo-ocular reflex (VOR) and the physiological models have been studied so far.  Obinata et al. found that the VOR can be used to estimate mental workload.  Since then, our research group has been developing methods to quantitatively estimate mental workload during driving by means of reflex eye movement.  In this talk, I will explain the basic mechanism of the reflex eye movement and how to apply for mental workload estimation.  I also introduce the latest work to combine the VOR and OKR (optokinetic reflex) models for naturalistic driving environment. Driving requires a complex skill that is involved with the vehicle itself (e.g., speed control and instrument operation), other road users (e.g., other vehicles, pedestrians), surrounding environment, and so on. During driving, visual cues are the main source to supply information to the brain. In order to stabilize the visual information when you are moving, the eyes move to the opposite direction based on the input to the vestibular system. This involuntary eye movement is called as the vestibulo-ocular reflex (VOR) and the physiological models have been studied so far.  Obinata et al. found that the VOR can be used to estimate mental workload.  Since then, our research group has been developing methods to quantitatively estimate mental workload during driving by means of reflex eye movement.  In this talk, I will explain the basic mechanism of the reflex eye movement and how to apply for mental workload estimation.  I also introduce the latest work to combine the VOR and OKR (optokinetic reflex) models for naturalistic driving environment.
 
 
-  
-   Date & Time: Tuesday, February 16, 2016; 12:00 PM - 1:00 PM
 Speaker: Dr. Najim Dehak, MIT
 Research Area: Speech & Audio
   Abstract    Recently, there has been a great increase of interest in the field of emotion recognition based on different human modalities, such as speech, heart rate etc. Emotion recognition systems can be very useful in several areas, such as medical and telecommunications. In the medical field, identifying the emotions can be an important tool for detecting and monitoring patients with mental health disorder. In addition, the identification of the emotional state from voice provides opportunities for the development of automated dialogue system capable of producing reports to the physician based on frequent phone communication between the system and the patients. In this talk, we will describe a health related application of using emotion recognition system based on human voices in order to detect and monitor the emotion state of people. Recently, there has been a great increase of interest in the field of emotion recognition based on different human modalities, such as speech, heart rate etc. Emotion recognition systems can be very useful in several areas, such as medical and telecommunications. In the medical field, identifying the emotions can be an important tool for detecting and monitoring patients with mental health disorder. In addition, the identification of the emotional state from voice provides opportunities for the development of automated dialogue system capable of producing reports to the physician based on frequent phone communication between the system and the patients. In this talk, we will describe a health related application of using emotion recognition system based on human voices in order to detect and monitor the emotion state of people.
 
 
-  
-   Date & Time: Monday, November 23, 2015; 12:00 PM
 Speaker: Manuchehr Aminian, University of North Carolina, Chapel Hill  Abstract    The classic work by G.I. Taylor describes the enhanced longitudinal diffusivity of a passive tracer subjected to laminar pipe flow. Much work since then has gone into extending this result particularly in calculating the evolution of the scalar variance. However, less work has been done to describe the evolution of asymmetry in the distribution. We present the results from a modeling effort to understand how the higher moments of the tracer distribution depend on geometry based off of explicit results in the circular pipe. We do this via analysis of "channel-limiting" geometries (rectangular ducts and elliptical pipes parameterized by their aspect ratio), using both new analytical tools and Monte Carlo simulation, which have revealed a wealth of nontrivial behavior of the distributions at short and intermediate time. The classic work by G.I. Taylor describes the enhanced longitudinal diffusivity of a passive tracer subjected to laminar pipe flow. Much work since then has gone into extending this result particularly in calculating the evolution of the scalar variance. However, less work has been done to describe the evolution of asymmetry in the distribution. We present the results from a modeling effort to understand how the higher moments of the tracer distribution depend on geometry based off of explicit results in the circular pipe. We do this via analysis of "channel-limiting" geometries (rectangular ducts and elliptical pipes parameterized by their aspect ratio), using both new analytical tools and Monte Carlo simulation, which have revealed a wealth of nontrivial behavior of the distributions at short and intermediate time.
 
 
-  
-   Date & Time: Friday, October 18, 2013; 12:00 PM
 Speaker: Dr. Shreyas Sundaram, University of Waterloo  Abstract    This talk will describe a method to stabilize a plant with a network of resource-constrained wireless nodes.  As opposed to traditional networked control schemes where the nodes simply route information to and from a dedicated controller, our approach treats the network itself as the controller. Specifically, we formulate a strategy where each node repeatedly updates its state to be a linear combination of the states of neighboring nodes.  We show that this causes the entire network to behave as a linear dynamical system, with sparsity constraints imposed by the network topology.  We provide a numerical design procedure to determine the appropriate linear combinations for each node so that the transmissions of the nodes closest to the actuators are stabilizing.  We also make connections to decentralized control theory and the concept of fixed modes to provide topological conditions under which stabilization is possible.  We show that this "Wireless Control Network" requires low computational and communication overhead, simplifies transmission scheduling, and enables compositional design.  We also consider the issue of security in this control scheme.  Using structured system theory, we show that a certain number of malicious or misbehaving nodes can be detected and identified provided that the connectivity of the network is sufficiently high. This talk will describe a method to stabilize a plant with a network of resource-constrained wireless nodes.  As opposed to traditional networked control schemes where the nodes simply route information to and from a dedicated controller, our approach treats the network itself as the controller. Specifically, we formulate a strategy where each node repeatedly updates its state to be a linear combination of the states of neighboring nodes.  We show that this causes the entire network to behave as a linear dynamical system, with sparsity constraints imposed by the network topology.  We provide a numerical design procedure to determine the appropriate linear combinations for each node so that the transmissions of the nodes closest to the actuators are stabilizing.  We also make connections to decentralized control theory and the concept of fixed modes to provide topological conditions under which stabilization is possible.  We show that this "Wireless Control Network" requires low computational and communication overhead, simplifies transmission scheduling, and enables compositional design.  We also consider the issue of security in this control scheme.  Using structured system theory, we show that a certain number of malicious or misbehaving nodes can be detected and identified provided that the connectivity of the network is sufficiently high.
 
 
-  
-   Date & Time: Thursday, October 17, 2013; 12:00 PM
 Speaker: Prof. Laurent Daudet, Paris Diderot University, France
 MERL Host: Jonathan Le Roux
 Research Area: Speech & Audio
   Abstract    In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium. In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium.
 
 
-  
-   Date & Time: Friday, October 4, 2013; 12:00 PM
 Speaker: Dr. Goksel Dedeoglu, Texas Instruments
 Research Area: Computer Vision
   Abstract    There are growing needs to accelerate computer vision algorithms on  embedded processors for wide-ranging equipment including mobile phones, network cameras, robots, and automotive safety systems. In our Vision R&D group, we conduct various projects to understand how the vision requirements can be best addressed on Digital Signal Processors (DSP), where the compute bottlenecks are, and how we should evolve our hardware & software architectures to meet our customers' future needs. Towards this end, we build prototypes wherein we design and optimize embedded software for real-world application performance and robustness. In this talk, I will provide examples of vision problems that we have recently tackled. There are growing needs to accelerate computer vision algorithms on  embedded processors for wide-ranging equipment including mobile phones, network cameras, robots, and automotive safety systems. In our Vision R&D group, we conduct various projects to understand how the vision requirements can be best addressed on Digital Signal Processors (DSP), where the compute bottlenecks are, and how we should evolve our hardware & software architectures to meet our customers' future needs. Towards this end, we build prototypes wherein we design and optimize embedded software for real-world application performance and robustness. In this talk, I will provide examples of vision problems that we have recently tackled.
 
 
-  
-   Date & Time: Friday, September 6, 2013; 12:00 PM
 Speaker: Dr. Davide M. Raimondo, University of Pavia, Italy
 MERL Host: Stefano Di Cairano  Abstract    Although there are many fault diagnosis algorithms available, there has been very little work on the design or modification of control inputs with the aim of increasing the detectability and isolability of faults. The use of such inputs has clear potential for overcoming a central difficulty in fault detection, which is to distinguish the effects of faults from those of disturbances, process uncertainties, etc. Accordingly, the use of active inputs could be a transformative technology in industry, provided that such inputs can be computed reliably and efficiently. Although there are many fault diagnosis algorithms available, there has been very little work on the design or modification of control inputs with the aim of increasing the detectability and isolability of faults. The use of such inputs has clear potential for overcoming a central difficulty in fault detection, which is to distinguish the effects of faults from those of disturbances, process uncertainties, etc. Accordingly, the use of active inputs could be a transformative technology in industry, provided that such inputs can be computed reliably and efficiently.
 This presentation discusses new methods for computing active inputs that guarantee that the input-output data of a process will be sufficient to correctly identify a fault from a given library of possible faults. This problem is inherently nonconvex and has a combinatorial dependence on the number of faults considered. To address this, a new formulation is considered, along with related approximations, that is amenable to efficient solution using standard optimization packages (e.g. CPLEX). The theoretical contributions combine ideas from reachability analysis, set-based computations, and optimization theory to exploit detailed problem structure and thereby manage the problem complexity. Comparisons with an existing method show that the proposed formulation provides a dramatic reduction in the required computational effort.
 
 
-