# MLSP 2020

IEEE International Workshop on
MACHINE LEARNING FOR SIGNAL PROCESSING

September 21–24, 2020 Aalto University, Espoo, Finland (virtual conference)

## Thursday, September 24, 2020

#### Lecture Session 6: Robust and High-Dimensional Statistical Learning for Signal Processing Applications

Chair: Esa Ollila, Aalto University

09:15

Robust reconstruction with nonconvex subset constraints: A polynomial optimization approach
Arthur Marmin, Marc Castella, Jean-Christophe Pesquet
Abstract

In this paper, we are interested in the recovery of an unknown signal corrupted by a linear operator, a nonlinear function, and an additive Gaussian noise. In addition, some of the observations contain outliers. Many robust data fit functions which alleviate sensitivity to outliers can be expressed as piecewise rational functions. Based on this fact, we reformulate the robust inverse problem as a rational optimization problem. The considered framework allows us to incorporate nonconvex constraints such as unions of subsets. The rational problem is then solved using recent optimization techniques which offer guarantees for global optimality. Finally, experimental results illustrate the validity of the recovered global solutions and the good quality of the reconstructed signals despite the presence of outliers.

09:35

Performance-complexity trade-off in large dimensional statistics
Tayeb Zarrouk, Romain Couillet, Florent Chatelain, Nicolas Le Bihan
Abstract

This article introduces a random matrix framework for the analysis of the performance-complexity trade-off in a class of machine learning algorithms, under a large dimensional data regime. Specifically, we analyze the spectral properties of a kernel random matrix K upon which a sparsity mask B is applied; this reduces the number of entries to evaluate, thereby reducing complexity, while weakening the power of statistical inference on K, thereby impeding performance. We exhibit a phase transition phenomenon below which spectral methods must fail and which is a function of the sparsity structure of B. This finds immediate applications to the fundamental limits of complexity-reduced spectral clustering as well as principal component analysis.

09:55

Robust semiparametric joint estimators of location and scatter in elliptical distributions
Stefano Fortunati, Alexandre Renaux, Frédéric Pascal
Abstract

This paper focuses on the joint estimation of the location vector and the shape matrix of a set of Complex Elliptically Symmetric (CES) distributed observations. This well-known estimation problem is framed in the original context of semiparametric models allowing us to handle the (generally unknown) density generator as an infinite-dimensional nuisance parameter. A joint estimator, relying on the Tyler's $$M$$-estimator of location and on a new $$R$$-estimator of shape matrix, is proposed and its Mean Squared Error (MSE) performance compared with the Semiparametric Cramér-Rao Bound (CSCRB).

10:15

Parameter estimation in non-linear state-space models by automatic differentiation of non-linear Kalman filters
Ajinkya Gorad, Zheng Zhao, Simo Särkkä
Abstract

In this article, we propose automatic differentiation based methods for parameter estimation in non-linear state-space models. We use extended Kalman filter and cubature Kalman filters for approximating the negative log-likelihood (i.e., the energy function) of the parameter posterior distribution and compute the gradients and Hessians of this function by using automatic differentiation of the filter recursions. The proposed approach enables computing MAP estimates and forming Laplace approximations for the parameter posterior without a need for implementing complicated derivative recursions or manual computation of Jacobians. The methods are demonstrated in parameter estimation problems on a pendulum model and coordinated turn model.

10:35

Block-wise minimization-majorization algorithm for Huber's criterion: Sparse learning and applications
Esa Ollila, Ammar Mian
Abstract

Huber's criterion can be used for robust joint estimation of regression and scale parameters in the linear model. Huber's (Huber, 1981) motivation for introducing the criterion stemmed from non-convexity of the joint maximum likelihood objective function as well as non-robustness (unbounded influence function) of the associated ML-estimate of scale. In this paper, we illustrate how the original algorithm proposed by Huber can be set within the block-wise minimization majorization framework. In addition, we propose novel data-adaptive step sizes for both the location and scale, which are further improving the convergence. We then illustrate how Huber's criterion can be used for sparse learning of underdetermined linear model using the iterative hard thresholding approach. We illustrate the usefulness of the algorithms in an image denoising application and simulation studies.

10:55

Globally optimal robust matrix completion based on M-estimation
Felicia Ruppel, Michael Muma, Abdelhak M Zoubir
Abstract

Robust matrix completion allows for estimating a low-rank matrix based on a subset of its entries, even in presence of impulsive noise and outliers. We explore recent progress in the theoretical analysis of non-convex low-rank factorization problems to develop a robust approach that is based on a fast factorization method. We propose an algorithm that uses joint regression and scale estimation to compute the estimates. We prove that our algorithm converges to a global minimum with random initialization. An example function for which the guarantees hold is the pseudo-Huber function. In simulations, the proposed approach is compared to state-of the art robust and non-robust methods. In addition, its applicability to image inpainting and occlusion removal is demonstrated.

#### Lecture Session 7: Applications in Audio Processing and Bioinformatics

Chair: Roland Hostettler, Uppsala University

12:15

Sudo rm -rf: Efficient networks for universal audio source separation
Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
Abstract

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation. Specifically, the backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions. In this way, we are able to obtain high quality audio source separation with limited number of floating point operations, memory requirements, number of parameters and latency. Our experiments on both speech and environmental sound separation datasets show that SuDoRM-RF performs comparably and even surpasses various state-of-the-art approaches with significantly higher computational resource requirements.

12:35

Online supervised acoustic system identification exploiting prelearned local affine subspace models
Thomas Haubner, Andreas Brendel, Walter Kellermann
Abstract

We present a novel algorithm for improved block-online supervised acoustic system identification in adverse noise scenarios by exploiting prior knowledge about the space of Room Impulse Responses (RIRs). The method is based on the assumption that the variability of the unknown RIRs is controlled by only few physical parameters, describing, e.g., source position movements, and thus is confined to a low-dimensional manifold which is modelled by a union of affine subspaces. The offsets and bases of the affine subspaces are learned in advance from training data by unsupervised clustering followed by Principal Component Analysis. We suggest to denoise the parameter update of any supervised adaptive filter by projecting it onto an optimal affine subspace which is selected based on a computationally efficient approximation of the associated evidence. The proposed method significantly improves the system identification performance of state-of-the-art algorithms in adverse noise scenarios.

12:55

Unsupervised acoustic condition monitoring with riemannian geometry
Pavel Lifshits, Ronen Talmon
Abstract

In this paper, we present an unsupervised method for acoustic condition monitoring. Our method relies on the Riemannian geometry of symmetric and positive-definite (SPD) matrices. Specifically, SPD matrices enable us to build features for multi-channel data, which naturally encode the mutual relationships between the channels. By exploiting the Riemannian geometry of SPD matrices, we show that these features encompass informative comparisons. The proposed anomaly score is then based on a one-class SVM applied to the proposed features and their induced Riemannian distance. We test the proposed method on two benchmarks and show that it achieves state-of-the-art results. In addition, we demonstrate the robustness of the proposed method to noise and to low sampling rates.

13:15

A neural network-aided Viterbi receiver for joint equalization and decoding
Han-Mo Ou, Chieh-Fang Teng, Wen-Chiao Tsai
Abstract

In recent years, many works have been focusing on applying machine learning techniques to assist with communication system design. Instead of replacing the functional blocks of communication systems with neural networks, a hybrid manner of ViterbiNet symbol detection was proposed to combine the advantages of Viterbi algorithm and neural networks, which achieves guaranteed performance with reasonable complexity. However, this block-based design not only degrades the system performance but also increases hardware complexity. In this work, we propose a ViterbiNet receiver for joint equalization and channel decoding, which simultaneously considers both the code structure and channel effects, thus achieving global optimum with 3 dB gain. Furthermore, a dedicated neural network model is proposed to avoid the need for perfect channel state information (CSI). It is shown to be more robust under CSI uncertainty with 1.7 dB gain.

13:35

Bandit-based relay selection in cooperative networks over unknown stationary channels
Abstract

In this paper, we propose low-complexity relay scheduling mechanisms with the aid of a multi-armed bandit (MAB) framework. More specifically, this MAB framework is used for relay scheduling, based only on observing the acknowledgements/negative-acknowledgements (ACK/NACK) of packet transmissions. Hence, a bandit-based opportunistic relay selection (BB-ORS) mechanism is developed, recovering eventually the performance of classical opportunistic relay selection (ORS) when channel state information (CSI) is available without requiring any CSI. In addition, a distributed implementation of BB-ORS is presented, herein called d-BB-ORS, where distributed timers are used at the relays for relay selection, thus reducing the signaling overhead significantly. BB-ORS is compared to optimal scheduling with full CSI and the negligible performance gap is compensated by the low-complexity low-overhead implementation, while it surpasses the performance of ORS with outdated CSI.

13:55

Inference of protein-protein interaction networks from liquid-chromatography mass-spectrometry data by approximate Bayesian computation-sequential Monte Carlo sampling
Yukun Tan, Fernando Buarquede Lima Neto, Ulisses M. Braga-Neto
Abstract

We propose a new algorithm for inference of protein-protein interaction (PPI) networks from noisy time series of LiquidChromatography Mass-Spectrometry (LC-MS) proteomic expression data based on Approximate Bayesian Computation - Sequential Monte Carlo sampling (ABC-SMC). The algorithm is an extension of our previous framework PALLAS. The proposed algorithm can be easily modified to handle other complex models of expression data, such as LC-MS data, for which the likelihood function is intractable. Results based on synthetic time series of cytokine LC-MS measurements corresponding to a prototype immunomic network demonstrate that our algorithm is capable of inferring the network topology accurately.

#### Poster Session 4: Learning: Theory, Modelling and Applications

Chair: Simo Särkkä, Aalto University

14:15

Graph topology inference benchmarks for machine learning
Carlos Eduardo Rosar Kos Lassance, Vincent Gripon, Gonzalo Mateos
Abstract

Graphs are nowadays ubiquitous in the fields of signal processing and machine learning. As a tool used to express relationships between objects, graphs can be deployed to various ends: (i) clustering of vertices, (ii) semi-supervised classification of vertices, (iii) supervised classification of graph signals, and (iv) em denoising of graph signals. However, in many practical cases graphs are not explicitly available and must therefore be inferred from data. Validation is a challenging endeavor that naturally depends on the downstream task for which the graph is learnt. Accordingly, it has often been difficult to compare the efficacy of different algorithms. In this work, we introduce several ease-to-use and publicly released benchmarks specifically designed to reveal the relative merits and limitations of graph inference methods. We also contrast some of the most prominent techniques in the literature.

14:15

Visible light-based robust positioning under detector orientation uncertainty: A Gabor convolutional network-based approach extracting stable texture features
Bingpeng Zhou, Risto Wichman
Abstract

In this paper, we are interested in visible light-based positioning (VLP) of detectors with unknown orientations. Conventional VLP methods depend on a well-defined signal propagation model with perfectly known parameters. Thus, uncertainty of detector orientation degrades their VLP performances. To address this challenge, we propose a machine learning (ML)-based VLP solution, which comprises a Gabor convolutional neural network (GCNN) and a fully-connected neural network (FCNN). We observe spatial texture structures in received visible light signals, which depend on the detector location, and hence can be exploited to enhance VLP performance. GCNN extracts rotation-invariant features of visible light samples under uncertain detector orientations, using diverse Gabor kernels. FCNN captures informative clustering structures of obtained texture features. It is shown that the proposed ML-based VLP method outperforms the conventional VLP baselines.

14:15

Graph learning from multi-attribute smooth signals
Jitendra K Tugnait
Abstract

We consider the problem of estimating the structure of an undirected weighted graph underlying a set of smooth multi-attribute signals. Most existing methods for graph estimation are based on single-attribute models where one associates a scalar data variable with each node of the graph, and the problem is to infer the graph topology that captures the relationships between these variables. An example is image graphs for grayscale texture images for modeling dependence of a pixel on neighboring pixels. In multi-attribute graphical models, each node represents a vector, as for example, in color image graphs where one has three variables (RGB color components) per pixel node. In this paper, we extend the single attribute approach of Kalofolias (2016) to multi-attribute data. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function to infer the graph topology. Numerical results based on synthetic as well as real data are presented.

14:15

Distributed blind deconvolution of seismic signals under sparsity constraints in sensor networks
Ban-Sok Shin, Dmitriy Shutin
Abstract

This work proposes an iterative algorithm for distributed deconvolution of seismic signals for a reflectivity survey by a network of sensors. Distributed deconvolution is particularly relevant for a subsurface exploration by sensor networks or swarms of mobile robots. We envision such an exploration methodology by multiple mobile agents for future explorations of a planet's subsurface. The proposed scheme consists of two steps: distributed estimation of the seismic wavelet, followed by a local estimation of the reflectivity. Both steps are realized using alternating directions method of multipliers algorithms where we exploit sparsity in the reflectivity. The performance of the scheme is compared to state-of-the-art sparse multichannel blind deconvolution of seismic data and is found to be comparable or even superior.

14:15

Topology-aware joint graph filter and edge weight identification for network processes
Alberto Natali, Mario Coutino, Geert Leus
Abstract

Data defined over a network have been successfully modelled by means of graph filters. However, although in many scenarios the connectivity of the network is known, e.g., smart grids, social networks, etc., the lack of well-defined interaction weights hinders the ability to model the observed networked data using graph filters. Therefore, in this paper, we focus on the joint identification of coefficients and graph weights defining the graph filter that best models the observed input/output network data. While these two problems have been mostly addressed separately, we here propose an iterative method that exploits the knowledge of the support of the graph for the joint identification of graph filter coefficients and edge weights. We further show that our iterative scheme guarantees a non-increasing cost at every iteration, ensuring a globally-convergent behavior. Numerical experiments confirm the applicability of our proposed approach.

14:15

Secure federated averaging algorithm with differential privacy
Yiwei Li, Tsung-Hui Chang, Chong-Yung Chi
Abstract

Federated learning (FL), as a recent advance of distributed machine learning, is capable of learning a model over the network without directly accessing the client's raw data. Nevertheless, the clients' sensitive information can still be exposed to adversaries via differential attacks on messages exchanged between the parameter server and clients. In this paper, we consider the widely used federating averaging (FedAvg) algorithm and propose to enhance the data privacy by the differential privacy (DP) technique, which obfuscates the exchanged messages by adding proper Gaussian noise. We analytically show that the proposed secure FedAvg algorithm maintains a O(1/T) convergence rate, where T is the total number of steps of stochastic gradient descent (SGD) per client. Moreover, we demonstrate how various algorithm parameters can impact on the algorithm communication efficiency. Experiment results are presented to examine the practical performance of the proposed algorithm.

14:15

Adversarial attacks on hierarchical composite classifiers via convex programming
Ismail Alkhouri, George Atia
Abstract

Adversarial perturbation attacks were shown to inflict severe damage on simple one-stage classifiers. In this paper, we examine the vulnerability of Hierarchical Composite Classifiers to such attacks. We formulate a maximin program to generate perturbations attacking these models, and obtain an approximate solution based on a convex relaxation of the proposed program. With the proposed approach, the relative loss in classification accuracy for the super-labels decreases drastically in comparison to perturbations generated for One Stage Composite Classifiers. Additionally, we show that fooling a classifier about the `big picture' is generally more perceptible.

14:15

Learning general transformations of data for out-of-sample extensions
Matthew Amodio, Davidvan Dijk, Guy Wolf, Smita Krishnaswamy
Abstract

While generative models such as GANs have been successful at mapping from noise to specific distributions of data, or more generally from one distribution of data to another, they cannot isolate the transformation that is occurring and apply it to a new distribution not seen in training. Thus, they memorize the domain of the transformation, and cannot generalize the transformation out of sample. To address this, we propose an NTNet that isolates the signal representing the transformation itself from the other signals representing internal distribution variation. This signal can then be removed from a new dataset distributed differently from the original one trained on. We demonstrate the effectiveness of our NTNet on more than a dozen synthetic and biomedical single-cell RNA sequencing datasets, where the NTNet is able to learn the data transformation performed by genetic on one sample of cells and successfully apply it to another sample of cells to predict treatment outcome.

14:15

Online tensor low-rank representation for streaming data
Tong Wu
Abstract

This paper proposes a new streaming algorithm to learn low-rank structures of tensor data using the recently proposed tensor-tensor product (t-product) and tensor singular value decomposition (t-SVD) algebraic framework. It reformulates the tensor low-rank representation (TLRR) problem using the equivalent bifactor model of the tensor nuclear norm, where the tensor dictionary is updated based on the newly collected data and representations. Compared to TLRR, the proposed method processes tensor data in an online fashion and makes the memory cost independent of the data size. Experimental results on three benchmark datasets demonstrate the superior performance, efficiency and robustness of the proposed algorithm over state-of-the-art methods.