Software Downloads
MERL software freely available for noncommercial use.
MERL is making some software available to the research community.

CFS — Cocktail Fork Separation
PyTorch implementation of the Multi Resolution CrossNet (MRX) model proposed in our ICASSP 2022 paper, "The Cocktail Fork Problem: ThreeStem Audio Separation for RealWorld Soundtracks." We include the weights for a model pretrained on the Divide and Remaster (DnR) dataset, which can separate the audio from a soundtrack (e.g., movie or commercial) into individual speech, music, and sound effects stems. A pytorch_lightning script for model training using the DnR dataset is also included.

PartialGCNN — Partial Group Convolutional Neural Networks
This software package provides the PyTorch implementation of Partial Group Convolutional Neural Networks described in the NeurIPS 2022 paper "Learning Partial Equivariances from Data". Partial GCNNs are able to learn layerwise levels of partial and full equivariance to discrete, continuous groups and combinations thereof, directly from data. Partial GCNNs retain full equivariance when beneficial, but adjust it whenever it becomes harmful. The software package also provides scripts to reproduce the results in the paper.

kscore — Nonparametric Score Estimators
PyTorch reimplementation of code from "Nonparametric Score Estimators" (Yuhao Zhou, Jiaxin Shi, Jun Zhu. https://arxiv.org/abs/2005.10099). See original Tensorflow implementation at https://github.com/miskcoo/kscore (MIT license).

SOCKET — SOurcefree Crossmodal KnowledgE Transfer
SOCKET allows transferring knowledge from neural networks trained on a source sensor modality (such as RGB) for one or more domains where large amount of annotated data may be available to an unannotated target dataset from a different sensor modality (such as infrared or depth). It makes use of taskirrelevant paired sourcetarget images in order to promote feature alignment between the two modalities as well as distribution matching between the source batch norm features (mean and variance) and the target features.

CISOR — Convergent Inverse Scattering using Optimization and Regularization
This software package implements the CISOR reconstruction algorithm along with other benchmark algorithms that attempt to recover the distribution of refractive indices of an object in a multiple scattering regime. The problem of reconstructing an object from the measurements of the light it scatters is common in numerous imaging applications. While the most popular formulations of the problem are based on linearizing the objectlight relationship, there is an increased interest in considering nonlinear formulations that can account for multiple light scattering. Our proposed algorithm for nonlinear diffractive imaging, called Convergent Inverse Scattering using Optimization and Regularization (CISOR), is based on our new variant of fast . . .

InSeGANICCV2021 — Instance Segmentation GAN
This package implements InSeGAN, an unsupervised 3D generative adversarial network (GAN) for segmenting (nearly) identical instances of rigid objects in depth images. For this task, we design a novel GAN architecture to synthesize a multipleinstance depth image with independent control over each instance. InSeGAN takes in a set of code vectors (e.g., random noise vectors), each encoding the 3D pose of an object that is represented by a learned implicit object template. The generator has two distinct modules. The first module, the instance feature generator, uses each encoded pose to transform the implicit template into a feature map representation of each object instance. The second module, the depth image renderer, aggregates all of the . . .

HMIS — Hierarchical Musical Instrument Separation
Many sounds that humans encounter are hierarchical in nature; a piano note is one of many played during a performance, which is one of many instruments in a band, which might be playing in a bar with other noises occurring. Inspired by this, we reframe the musical source separation problem as hierarchical, combining similar instruments together at certain levels and separating them at other levels. This allows us to deconstruct the same mixture in multiple ways, depending on the appropriate level of the hierarchy for a given application. In this software package, we present pytorch implementations of various methods for hierarchical musical instrument separation, with some methods focusing on separating specific instruments (like guitars) . . .

AVSGS — Audio Visual SceneGraph Segmentor
Stateoftheart approaches for visuallyguided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. These approaches often ignore the visual context of these sound sources or avoid modeling object interactions that may be useful to characterize the sources better, especially when the same object class may produce varied sounds from distinct interactions. To address this challenging problem, we propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs, each subgraph being associated with a unique sound obtained via cosegmenting the audio spectrogram. At its core, . . .

PyRoboCOP — Pythonbased Robotic Control & Optimization Package
PyRoboCOP is a lightweight Pythonbased package for control and optimization of robotic systems described by nonlinear Differential Algebraic Equations (DAEs). In particular, the package can handle systems with contacts that are described by complementarity constraints and provides a general framework for specifying obstacle avoidance constraints. The package performs direct transcription of the DAEs into a set of nonlinear equations by performing orthogonal collocation on finite elements. The resulting optimization problem belongs to the class of Mathematical Programs with Complementarity Constraints (MPCCs). MPCCs fail to satisfy commonly assumed constraint qualifications and require special handling of the complementarity constraints in . . .

MCPILCO — Monte Carlo Probabilistic Inference for Learning COntrol
This package implements a Modelbased Reinforcement Learning algorithm called Monte Carlo Probabilistic Inference for Learning and COntrol (MCPILCO), for modeling and control of dynamical system. The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient during optimization. The Monte Carlo approach is shown to be effective for policy optimization thanks to a proper cost function shaping and use of dropout. The possibility of using a Monte Carlo approach allows a more flexible framework for Gaussian Process Regression that leads to more structured and more data efficient kernels. The algorithm is also extended to work for Partially Measurable Systems and . . .

SafetyRL — Goal directed RL with Safety Constraints
In this paper, we consider the problem of building learning agents that can efficiently learn to navigate in constrained environments. The main goal is to design agents that can efficiently learn to understand and generalize to different environments using highdimensional inputs (a 2D map), while following feasible paths that avoid obstacles in obstaclecluttered environment. We test our proposed method in the recently proposed \textit{Safety Gym} suite that allows testing of safetyconstraints during training of learning agents. The provided python code base allows to reproduce the results from the IROS 2020 paper that was published last year.

Sound2Sight — Generating Visual Dynamics from Sound and Context
Learning associations across modalities is critical for robust multimodal reasoning, especially when a modality may be missing during inference. In this paper, we study this problem in the context of audioconditioned visual synthesis  a task that is important, for example, in occlusion reasoning. Specifically, our goal is to generate video frames and their motion dynamics conditioned on audio and a few past frames. To tackle this problem, we present Sound2Sight, a deep variational framework, that is trained to learn a per frame stochastic prior conditioned on a joint embedding of audio and past frames. This embedding is learned via a multihead attentionbased audiovisual transformer encoder. The learned prior is then sampled to . . .

TEAQC — Template Embeddings for Adiabatic Quantum Computation
Quantum Annealing (QA) can be used to quickly obtain nearoptimal solutions for Quadratic Unconstrained Binary Optimization (QUBO) problems. In QA hardware, each decision variable of a QUBO should be mapped to one or more adjacent qubits in such a way that pairs of variables defining a quadratic term in the objective function are mapped to some pair of adjacent qubits. However, qubits have limited connectivity in existing QA hardware. This software Python codes implementing integer linear programs to search for an embedding of the problem graph into certain classes of minors of the QA hardware, which we call template embeddings. In particular, we consider the template embedding that are minors of the Chimera graph used in DWave . . .

ACOT — AdversariallyContrastive Optimal Transport
In this software release, we provide a PyTorch implementation of the adversariallycontrastive optimal transport (ACOT) algorithm. Through ACOT, we study the problem of learning compact representations for sequential data that captures its implicit spatiotemporal cues. To separate such informative cues from the data, we propose a novel contrastive learning objective via optimal transport. Specifically, our formulation seeks a lowdimensional subspace representation of the data that jointly (i) maximizes the distance of the data (embedded in this subspace) from an adversarial data distribution under a Wasserstein distance, (ii) captures the temporal order, and (iii) minimizes the data distortion. To generate the adversarial distribution, . . .

CME — Circular Maze Environment
In this package, we provide python code for a circular maze environment (CME) which Is a challenging environment for learning manipulation and control. The goal in this system is to tip and tilt the CME so as to drive one (or more) marble(s) from the outermost to the innermost ring. While this system is very intuitive and easy for humans to solve, it can be very difficult and inefficient for standard reinforcement learning algorithms to learn meaningful policies. Consequently, we provide codes to this environment so that it can be used as a benchmark for different algorithms that can learn meaningful policies in this environment. We also provide codes for iLQR which can be used to control the motion of marbles in the proposed environment.

LUVLi — Landmarks’ Location, Uncertainty, and Visibility Likelihood
Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained using our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the . . .

CAZSL — ContextAware Zero Shot Learning
Learning accurate models of the physical world is required for a lot of robotic manipulation tasks. However, during manipulation, robots are expected to interact with unknown workpieces so that building predictive models which can generalize over a number of these objects is highly desirable. We provide codes for contextaware zero shot learning (CAZSL) models, an approach utilizing a Siamese network architecture, embedding space masking and regularization based on context variables which allows us to learn a model that can generalize to different parameters or features of the interacting objects. The proposed learning algorithm on the recently released Omnipush data set that allows testing of metalearning capabilities using . . .

CITO — ContactImplicit Trajectory Optimization
This package provides a generalized solution for planning dynamic contactinteraction trajectories. The software package leverages existing opensource code for [Contact Implicit Trajectory Optimization](https://github.com/aykutonol/cito) based on a variable smooth contact model and a successive convexification algorithm for the trajectory optimization. This software package adds a penalty loop that adjusts the penalty on the virtual forces automatically and a postprocess stage that improves solutions through a forward pass by exploiting the contact information implied by the utilization of the virtual forces.
Underactuated dynamics with frictional rigidbody contacts is modeled using [MuJoCo](http://mujoco.org/). The convex . . . 
OFENet — Online Feature Extractor Network
This Python code implements an online feature extractor network (OFENet) that uses neural nets to produce good representations to be used as inputs to deep RL algorithms. Even though the high dimensionality of input is usually supposed to make learning of RL agents more difficult, by using this network, we show that the RL agents in fact learn more efficiently with the highdimensional representation than with the lowerdimensional state observations. We believe that stronger feature propagation together with larger networks (and thus larger search space) allows RL agents to learn more complex functions of states and thus improves the sample efficiency. The code also contains several test problems. Through numerical experiments on these . . .

MotionNet
The ability to reliably perceive the environmental states, particularly the existence of objects and their motion behavior, is crucial for autonomous driving. In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds. MotionNet takes a sequence of LiDAR sweeps as input and outputs a bird's eye view (BEV) map, which encodes the object category and motion information in each grid cell. The backbone of MotionNet is a novel spatiotemporal pyramid network, which extracts deep spatial and temporal features in a hierarchical fashion. To enforce the smoothness of predictions over both space and time, the training of MotionNet is further regularized with novel . . .

FMIEstimationDemo — Functional Mockup Interface Estimation Demo
This software demonstrates the use of the Functional Mockup Interface (FMI) to construct extended Kalman filters (EKF) and ensemble Kalman filters (EnKF) for state estimation in Modelica using the Dymola compiler.
One of the key advantages of Modelica is that it enables users to create largescale physical system models via the interconnection of simpler subsystem or component models, and thereby manage the complexity inherent in describing these large systems. While one candidate use for such models is in using data to estimate unmeasured variables of a complex system, the equation compilation process can make it difficult to work with the state vector directly when implementing state estimation methods. Functional mockup . . . 
FoldingNet++
This software is the pytorch implementation of FoldingNet++, which is a novel endtoend graphbased deep autoencoder to achieve compact representations of unorganized 3D point clouds in an unsupervised manner.
The encoder of the proposed networks adopts similar architectures as in PointNet, which is a wellacknowledged method for supervised learning of 3D point clouds, such as recognition and segmentation. The decoder of the proposed networks involves three novel modules: folding module, graphtopologyinference module, and graphfiltering module. The folding module folds a canonical 2D lattice to the underlying surface of a 3D point cloud, achieving coarse reconstruction; the graphtopologyinference module learns a graph . . . 
QNTRPO — QuasiNewton Trust Region Policy Optimization
We propose a trust region method for policy optimization that employs QuasiNewton approximation for the Hessian, called QuasiNewton Trust Region Policy Optimization (QNTRPO). Gradient descent has become the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithms has achieved stateoftheart performance on wide variety of tasks and resulted in several improvements in performance of reinforcement learning algorithms across a wide range of systems. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, slow convergence, and dependence on problem scaling. We investigate the use of a dogleg method with a QuasiNewton approximation for the Hessian to . . .

RIDE — Robust Iterative Data Estimation
Recent studies have demonstrated that as classifiers, deep neural networks (e.g., CNNs) are quite vulnerable to adversarial attacks that only add quasiimperceptible perturbations to the input data but completely change the predictions of the classifiers. To defend classifiers against such adversarial attacks, here we focus on the whitebox adversarial defense where the attackers are granted full access to not only the classifiers but also defenders to produce as strong attack as possible. We argue that a successful whitebox defender should prevent the attacker from not only direct gradient calculation but also a gradient approximation. Therefore we propose viewing the defense from the perspective of a functional, a highorder function . . .

GNI — Gradientbased NikaidoIsoda
Computing Nash equilibrium (NE) of multiplayer games has witnessed renewed interest due to recent advances in generative adversarial networks (GAN). However, computing equilibrium efficiently is challenging. To this end, we introduce the Gradientbased NikaidoIsoda (GNI) function which serves as a merit function, vanishing only at the firstorder stationary points of each player’s optimization problem. Gradient descent is shown to converge sublinearly to a firstorder stationary point of the GNI function. For the particular case of bilinear minmax games and multiplayer quadratic games, the GNI function is convex. Hence, the application of gradient descent in this case yields linear convergence to an NE (when one exists).
. . . 
DSP — Discriminative Subspace Pooling
Human action recognition from video sequences is one of the fundamental problems in computer vision. In this research, we investigate and propose representation learning approaches towards solving this problem, which we call discriminative subspace pooling. Specifically, we combine recent deep learning approaches with techniques for generating adversarial perturbations into learning novel representations that can summarize long video sequences into compact descriptors – these descriptors capture essential properties of the input videos that are sufficient to achieve good recognition rates. We make two contributions. First, we propose a subspacebased discriminative classifier, similar to a nonlinear SVM, but having piecewiselinear . . .

SSTL — SemiSupervised Transfer Learning
Successful stateoftheart machine learning techniques rely on the existence of large well sampled and labeled datasets. Today it is easy to obtain a finely sampled dataset because of the decreasing cost of connected lowenergy devices. However, it is often difficult to obtain a large number of labels. The reason for this is twofold. First, labels are often provided by people whose attention span is limited. Second, even if a person was able to label perpetually, this person would need to be shown data in a large variety of conditions. One approach to addressing these problems is to combine labeled data collected in different sessions through transfer learning. Still even this approach suffers from dataset limitations.
This . . . 
1bCRB — OneBit CRB
Massive multipleinput multipleoutput (MIMO) systems can significantly increase the spectral efficiency, mitigate propagation loss by exploiting large array gain, and reduce interuser interference with highresolution spatial beamforming. To reduce complexity and power consumption, several transceiver architectures have been proposed for mmWave massive MIMO systems: 1) an analog architecture, 2) a hybrid analog/digital architecture, and 3) a fully digital architecture with lowresolution ADCs.
To this end, we derive the CramerRao bound (CRB) on estimating angulardomain channel parameters including anglesofdeparture (AoDs), anglesofarrival (AoAs), and associated channel path gains. Our analysis provides a simple tool . . . 
Kernel Correlation Network
Unlike on images, semantic learning on 3D point clouds using a deep network is challenging due to the naturally unordered data structure. Among existing works, PointNet has achieved promising results by directly learning on point sets. However, it does not take full advantage of a point's local neighborhood that contains finegrained structural information which turns out to be helpful towards better semantic learning. In this regard, we present two new operations to improve PointNet with a more efficient exploitation of local structures. The first one focuses on local 3D geometric structures. In analogy to a convolution kernel for images, we define a pointset kernel as a set of learnable 3D points that jointly respond to a set of . . .

FoldingNet
Recent deep networks that directly handle points in a point set, e.g., PointNet, have been stateoftheart for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel endtoend deep autoencoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graphbased enhancement is enforced to promote local structures on top of PointNet. Then, a novel foldingbased decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fullyconnected neural networks, yet leads to a more . . .

FRPC — Fast Resampling on Point Clouds via Graphs
We propose a randomized resampling strategy to reduce the cost of storing, processing and visualizing a largescale point cloud, that selects a representative subset of points while preserving applicationdependent features. The strategy is based on graphs, which can represent underlying surfaces and lend themselves well to efficient computation. We use a general featureextraction operator to represent applicationdependent features and propose a general reconstruction error to evaluate the quality of resampling; by minimizing the error, we obtain a general form of optimal resampling distribution. The proposed resampling distribution is guaranteed to be shift, rotation and scaleinvariant in the 3D space.

PCQM — Point Cloud Quality Metric
It is challenging to measure the geometry distortion of point cloud introduced by point cloud compression. Conventionally, the errors between point clouds are measured in terms of pointtopoint or pointtosurface distances, that either ignores the surface structures or heavily tends to rely on specific surface reconstructions. To overcome these drawbacks, we propose using pointtoplane distances as a measure of geometric distortions on point cloud compression. The intrinsic resolution of the point clouds is proposed as a normalizer to convert the mean square errors to PSNR numbers. In addition, the perceived local planes are investigated at different scales of the point cloud. Finally, the proposed metric is independent of the size of . . .

ROSETA — Robust Online Subspace Estimation and Tracking Algorithm
This script implements a revised version of the robust online subspace estimation and tracking algorithm (ROSETA) that is capable of identifying and tracking a timevarying low dimensional subspace from incomplete measurements and in the presence of sparse outliers. The algorithm minimizes a robust l1 norm cost function between the observed measurements and their projection onto the estimated subspace. The projection coefficients and sparse outliers are computed using a LASSO solver and the subspace estimate is updated using a proximal point iteration with adaptive parameter selection.

CASENet — Deep CategoryAware Semantic Edge Detection
Boundary and edge cues are highly beneficial in improving a wide variety of vision tasks such as semantic segmentation, object recognition, stereo, and object proposal generation. Recently, the problem of edge detection has been revisited and significant progress has been made with deep learning. While classical edge detection is a challenging binary problem in itself, the categoryaware semantic edge detection by nature is an even more challenging multilabel problem. We model the problem such that each edge pixel can be associated with more than one class as they appear in contours or junctions belonging to two or more semantic classes. To this end, we propose a novel endtoend deep semantic edge learning architecture based on ResNet . . .

NDS — Nonnegative Dynamical System model
Nonnegative data arise in a variety of important signal processing domains, such as power spectra of signals, pixels in images, and count data. We introduce a novel nonnegative dynamical system model for sequences of such data. The model we propose is called nonnegative dynamical system (NDS), and bridges two active fields, dynamical systems and nonnegative matrix factorization (NMF). Its formulation follows that of linear dynamical systems, but the observation and the latent variables are assumed nonnegative, the linear transforms are assumed to involve nonnegative coefficients, and the additive random innovations both for the observation and the latent variables are replaced by multiplicative random innovations. The software . . .

JGU — Joint Geodesic Upsampling
We develop an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image. Specifically, it computes depth for each pixel in the high resolution image using geodesic paths to the pixels whose depths are known from the low resolution one. Though this is closely related to the allpairshortestpath problem which has O(n2 log n) complexity, we develop a novel approximation algorithm whose complexity grows linearly with the image size and achieve realtime performance. We compare our algorithm with the state of the art on the benchmark dataset and show that our approach provides more accurate depth upsampling with fewer artifacts. In addition, we show that the proposed . . .

EBAD — ExemplarBased Anomaly Detection
Anomaly detection in realvalued time series has important applications in many diverse areas. We have developed a general algorithm for detecting anomalies in realvalued time series that is computationally very efficient. Our algorithm is exemplarbased which means a set of exemplars are first learned from a normal time series (i.e. not containing any anomalies) which effectively summarizes all normal windows in the training time series. Anomalous windows of a testing time series can then be efficiently detected using the exemplarbased model.
The provided code implements our hierarchical exemplar learning algorithm, our exemplarbased anomaly detection algorithm, and a baseline bruteforce Euclidean distance anomaly . . . 
PEAC — Plane Extraction using Agglomerative Clustering
Realtime plane extraction in 3D point clouds is crucial to many robotics applications. We present a novel algorithm for reliably detecting multiple planes in real time in organized point clouds obtained from devices such as Kinect sensors. By uniformly dividing such a point cloud into nonoverlapping groups of points in the image space, we first construct a graph whose node and edge represent a group of points and their neighborhood respectively. We then perform an agglomerative hierarchical clustering on this graph to systematically merge nodes belonging to the same plane until the plane fitting mean squared error exceeds a threshold. Finally we refine the extracted planes using pixelwise region growing. Our experiments demonstrate that . . .

PQP — Parallel Quadratic Programming
An iterative multiplicative algorithm is proposed for the fast solution of quadratic programming (QP) problems that arise in the realtime implementation of Model Predictive Control (MPC). The proposed algorithm—Parallel Quadratic Programming (PQP)—is amenable to finegrained parallelization. Conditions on the convergence of the PQP algorithm are given and proved. Due to its extreme simplicity, even serial implementations offer considerable speed advantages. To demonstrate, PQP is applied to several simulation examples, including a standalone QP problem and two MPC examples. When implemented in MATLAB using singlethread computations, numerical simulations of PQP demonstrate a 5  10x speedup compared to the MATLAB activeset . . .