This paper is to consider a general low-rank signal plus noise model in high dimensional settings. Specifically, we consider the noise with a general covariance structure and the signal to be at the same magnitude as the noise. Our study focuses on exploring various asymptotic properties related to the spiked eigenvalues and eigenvectors. As applications, we propose a new criterion to estimate the number of clusters, and investigate the properties of spectral clustering.
In this paper, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variables are split into groups and processed with a shared sequence model with the aim of reducing the complexity of a simple model only containing one convolutional and one recurrent module. With GC we are able to reduce the size of the model by up to 83 % and the complexity up to 73 % compared to the model without GC, while mostly retaining performance. Even for the smallest model configuration, GCBFSnet matches the performance of a low-complexity TasNet baseline in most metrics despite the larger size and higher number of required operations of the baseline.
In this paper we explore the concept of sequential inductive prediction intervals using theory from sequential testing. We furthermore introduce a 3-parameter PAC definition of prediction intervals that allows us via simulation to achieve almost sharp bounds with high probability.
In this paper, we first introduce and define several new information divergences in the space of transition matrices of finite Markov chains which measure the discrepancy between two Markov chains. These divergences offer natural generalizations of classical information-theoretic divergences, such as the $f$-divergences and the R\'enyi divergence between probability measures, to the context of finite Markov chains. We begin by detailing and deriving fundamental properties of these divergences and notably gives a Markov chain version of the Pinsker's inequality and Chernoff information. We then utilize these notions in a few applications. First, we investigate the binary hypothesis testing problem of Markov chains, where the newly defined R\'enyi divergence between Markov chains and its geometric interpretation play an important role in the analysis. Second, we propose and analyze information-theoretic (Ces\`aro) mixing times and ergodicity coefficients, along with spectral bounds of these notions in the reversible setting. Examples of the random walk on the hypercube, as well as the connections between the critical height of the low-temperature Metropolis-Hastings chain and these proposed ergodicity coefficients, are highlighted.
As a surrogate for computationally intensive meso-scale simulation of woven composites, this article presents Recurrent Neural Network (RNN) models. Leveraging the power of transfer learning, the initialization challenges and sparse data issues inherent in cyclic shear strain loads are addressed in the RNN models. A mean-field model generates a comprehensive data set representing elasto-plastic behavior. In simulations, arbitrary six-dimensional strain histories are used to predict stresses under random walking as the source task and cyclic loading conditions as the target task. Incorporating sub-scale properties enhances RNN versatility. In order to achieve accurate predictions, the model uses a grid search method to tune network architecture and hyper-parameter configurations. The results of this study demonstrate that transfer learning can be used to effectively adapt the RNN to varying strain conditions, which establishes its potential as a useful tool for modeling path-dependent responses in woven composites.
The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
To understand high precision observations of exoplanets and brown dwarfs, we need detailed and complex general circulation models (GCMs) that incorporate hydrodynamics, chemistry, and radiation. For this study, we specifically examined the coupling between chemistry and radiation in GCMs and compared different methods for the mixing of opacities of different chemical species in the correlated-k assumption, when equilibrium chemistry cannot be assumed. We propose a fast machine learning method based on DeepSets (DS), which effectively combines individual correlated-k opacities (k-tables). We evaluated the DS method alongside other published methods such as adaptive equivalent extinction (AEE) and random overlap with rebinning and resorting (RORR). We integrated these mixing methods into our GCM (expeRT/MITgcm) and assessed their accuracy and performance for the example of the hot Jupiter HD~209458 b. Our findings indicate that the DS method is both accurate and efficient for GCM usage, whereas RORR is too slow. Additionally, we observed that the accuracy of AEE depends on its specific implementation and may introduce numerical issues in achieving radiative transfer solution convergence. We then applied the DS mixing method in a simplified chemical disequilibrium situation, where we modeled the rainout of TiO and VO, and confirmed that the rainout of TiO and VO would hinder the formation of a stratosphere. To further expedite the development of consistent disequilibrium chemistry calculations in GCMs, we provide documentation and code for coupling the DS mixing method with correlated-k radiative transfer solvers. The DS method has been extensively tested to be accurate enough for GCMs; however, other methods might be needed for accelerating atmospheric retrievals.
Sample selection models represent a common methodology for correcting bias induced by data missing not at random. It is well known that these models are not empirically identifiable without exclusion restrictions. In other words, some variables predictive of missingness do not affect the outcome model of interest. The drive to establish this requirement often leads to the inclusion of irrelevant variables in the model. A recent proposal uses adaptive LASSO to circumvent this problem, but its performance depends on the so-called covariance assumption, which can be violated in small to moderate samples. Additionally, there are no tools yet for post-selection inference for this model. To address these challenges, we propose two families of spike-and-slab priors to conduct Bayesian variable selection in sample selection models. These prior structures allow for constructing a Gibbs sampler with tractable conditionals, which is scalable to the dimensions of practical interest. We illustrate the performance of the proposed methodology through a simulation study and present a comparison against adaptive LASSO and stepwise selection. We also provide two applications using publicly available real data. An implementation and code to reproduce the results in this paper can be found at //github.com/adam-iqbal/selection-spike-slab
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding. Here understanding is intended as mapping data to an already existing representation which encodes an {\em a priori} organisation of the feature space. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined. We show that, when hidden units are binary variables, these two principles identify a unique model -- the Hierarchical Feature Model (HFM) -- which is fully solvable and provides a natural interpretation in terms of features. We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data, the possibility to control the level of compression and the ability to support functions that go beyond generalisation. We explore the behaviour of the model with extensive numerical experiments and argue that models where the internal representation is fixed reproduce a learning modality which is qualitatively different from that of traditional models such as Restricted Boltzmann Machines.
Among the commonly used non-destructive techniques, the Ground Penetrating Radar (GPR) is one of the most widely adopted today for assessing pavement conditions in France. However, conventional radar systems and their forward processing methods have shown their limitations for the physical and geometrical characterization of very thin layers such as tack coats. However, the use of Machine Learning methods applied to GPR with an inverse approach showed that it was numerically possible to identify the tack coat characteristics despite masking effects due to low timefrequency resolution noted in the raw B-scans. Thus, we propose in this paper to apply the inverse approach based on Machine Learning, already validated in previous works on numerical data, on two experimental cases with different pavement structures. The first case corresponds to a validation on known pavement structures on the Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel and the second case focuses on a new real road in Vend{\'e}e department (France). In both case studies, the performances of SVM/SVR methods showed the efficiency of supervised learning methods to classify and estimate the emulsion proportioning in the tack coats.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.