Recently, there has been a growing interest for mixed-categorical meta-models based on Gaussian process (GP) surrogates. In this setting, several existing approaches use different strategies either by using continuous kernels (e.g., continuous relaxation and Gower distance based GP) or by using a direct estimation of the correlation matrix. In this paper, we present a kernel-based approach that extends continuous exponential kernels to handle mixed-categorical variables. The proposed kernel leads to a new GP surrogate that generalizes both the continuous relaxation and the Gower distance based GP models. We demonstrate, on both analytical and engineering problems, that our proposed GP model gives a higher likelihood and a smaller residual error than the other kernel-based state-of-the-art models. Our method is available in the open-source software SMT.
We explore the power of the unbounded Fan-Out gate and the Global Tunable gates generated by Ising-type Hamiltonians in constructing constant-depth quantum circuits, with particular attention to quantum memory devices. We propose two types of constant-depth constructions for implementing Uniformly Controlled Gates. These gates include the Fan-In gates defined by $|x\rangle|b\rangle\mapsto |x\rangle|b\oplus f(x)\rangle$ for $x\in\{0,1\}^n$ and $b\in\{0,1\}$, where $f$ is a Boolean function. The first of our constructions is based on computing the one-hot encoding of the control register $|x\rangle$, while the second is based on Boolean analysis and exploits different representations of $f$ such as its Fourier expansion. Via these constructions, we obtain constant-depth circuits for the quantum counterparts of read-only and read-write memory devices -- Quantum Random Access Memory (QRAM) and Quantum Random Access Gate (QRAG) -- of memory size $n$. The implementation based on one-hot encoding requires either $O(n\log{n}\log\log{n})$ ancillae and $O(n\log{n})$ Fan-Out gates or $O(n\log{n})$ ancillae and $6$ Global Tunable gates. On the other hand, the implementation based on Boolean analysis requires only $2$ Global Tunable gates at the expense of $O(n^2)$ ancillae.
We propose a dynamic sensor selection approach for deep neural networks (DNNs), which is able to derive an optimal sensor subset selection for each specific input sample instead of a fixed selection for the entire dataset. This dynamic selection is jointly learned with the task model in an end-to-end way, using the Gumbel-Softmax trick to allow the discrete decisions to be learned through standard backpropagation. We then show how we can use this dynamic selection to increase the lifetime of a wireless sensor network (WSN) by imposing constraints on how often each node is allowed to transmit. We further improve performance by including a dynamic spatial filter that makes the task-DNN more robust against the fact that it now needs to be able to handle a multitude of possible node subsets. Finally, we explain how the selection of the optimal channels can be distributed across the different nodes in a WSN. We validate this method on a use case in the context of body-sensor networks, where we use real electroencephalography (EEG) sensor data to emulate an EEG sensor network. We analyze the resulting trade-offs between transmission load and task accuracy.
We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combines a non-autoregressive multi-speaker text-to-mel-spectrogram generator with a GAN-based enhancer to improve the spectrogram quality. The proposed system can generate a mel-spectrogram dynamically during training. It can be used to adapt the ASR model to a new domain by using text-only data from this domain. We demonstrate that the proposed training method significantly improves ASR accuracy compared to the system trained on transcribed speech only. It also surpasses cascade TTS systems with the vocoder in the adaptation quality and training speed.
Latitude on the choice of initialisation is a shared feature between one-step extended state-space and multi-step methods. The paper focuses on lattice Boltzmann schemes, which can be interpreted as examples of both previous categories of numerical schemes. We propose a modified equation analysis of the initialisation schemes for lattice Boltzmann methods, determined by the choice of initial data. These modified equations provide guidelines to devise and analyze the initialisation in terms of order of consistency with respect to the target Cauchy problem and time smoothness of the numerical solution. In detail, the larger the number of matched terms between modified equations for initialisation and bulk methods, the smoother the obtained numerical solution. This is particularly manifest for numerical dissipation. Starting from the constraints to achieve time smoothness, which can quickly become prohibitive for they have to take the parasitic modes into consideration, we explain how the distinct lack of observability for certain lattice Boltzmann schemes -- seen as dynamical systems on a commutative ring -- can yield rather simple conditions and be easily studied as far as their initialisation is concerned. This comes from the reduced number of initialisation schemes at the fully discrete level. These theoretical results are successfully assessed on several lattice Boltzmann methods.
Discrete latent space models have recently achieved performance on par with their continuous counterparts in deep variational inference. While they still face various implementation challenges, these models offer the opportunity for a better interpretation of latent spaces, as well as a more direct representation of naturally discrete phenomena. Most recent approaches propose to train separately very high-dimensional prior models on the discrete latent data which is a challenging task on its own. In this paper, we introduce a latent data model where the discrete state is a Markov chain, which allows fast end-to-end training. The performance of our generative model is assessed on a building management dataset and on the publicly available Electricity Transformer Dataset.
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen and improvements by Koml\'os, Major, and Tusn\'ady) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration of our theory, we derive asymptotic CSs for the average treatment effect using efficient estimators in observational studies (for which no nonasymptotic bounds can exist even in the fixed-time regime) as well as randomized experiments, enabling causal inference that can be continuously monitored and adaptively stopped.
Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPL), i.e., we present BART as a primitive that can be used as a component of a probabilistic model rather than as a standalone model. Specifically, we introduce the Python library PyMC-BART, which works by extending PyMC, a library for probabilistic programming. We showcase a few examples of models that can be built using PyMC-BART, discuss recommendations for the selection of hyperparameters, and finally, we close with limitations of our implementation and future directions for improvement.
Matrices are built and designed by applying procedures from lower order matrices. Matrix tensor products, direct sums or multiplication of matrices are such procedures and a matrix built from these is said to be a {\em separable} matrix. A {\em non-separable} matrix is a matrix which is not separable and is often referred to as {\em an entangled matrix}. The matrices built may retain properties of the lower order matrices or may also acquire new desired properties not inherent in the constituents. Here design methods for non-separable matrices of required types are derived. These can retain properties of lower order matrices or have new desirable properties. Infinite series of required non-separable matrices are constructible by the general methods. Non-separable matrices are required for applications and other uses; they can capture the structure in a unique way and thus perform much better than separable matrices. General new methods are developed with which to construct {\em multidimensional entangled paraunitary matrices}; these have applications for wavelet and filter bank design. The constructions are in addition used to design new systems of non-separable unitary matrices; these have applications in quantum information theory. Some consequences include the design of full diversity constellations of unitary matrices, which are used in MIMO systems, and methods to design infinite series of special types of Hadamard matrices.
We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $y$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $Y$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $Y_f = 1$, and where the variables $Y_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $Y_{e_1}, Y_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $Y_{e_1} Y_{e_2}$ is significantly below $y_{e_1} y_{e_2}$. This algorithm is a refinement of a dependent-rounding algorithm of Im \& Shadloo (2020) based on simulation of Poisson processes. Our algorithm allows greater flexibility, in particular, it allows ``irregular'' fractional assignments, and it gives more refined bounds on the negative correlation. Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.407$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023).
In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.