The guesswork of a classical-quantum channel quantifies the cost incurred in guessing the state transmitted by the channel when only one state can be queried at a time, maximized over any classical pre-processing and minimized over any quantum post-processing. For arbitrary-dimensional covariant classical-quantum channels, we prove the invariance of the optimal pre-processing and the covariance of the optimal post-processing. In the qubit case, we compute the optimal guesswork for the class of so-called highly symmetric informationally complete classical-quantum channels.
Bayesian linear mixed-effects models and Bayesian ANOVA are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can - if the sphericity assumption is violated - likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian linear mixed-effects models on non-aggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from \url{//osf.io/mjf47/}.
We consider discrete best approximation problems formulated and solved in the framework of tropical algebra that deals with semirings and semifields with idempotent addition. Given a set of samples each consisting of input and output of an unknown function defined on an idempotent semifield, the problems are to find a best approximation of the function by tropical Puiseux polynomial and rational functions. A new solution approach is proposed which involves the reduction of the problem of polynomial approximation to best approximate solution of a tropical linear vector equation with an unknown vector on one side (a one-sided equation). We derive a best approximate solution to the one-sided equation end evaluate the inherent approximation error in a direct analytical form. Furthermore, we reduce the rational approximation problem to the best approximate solution of an equation with unknown vectors on both sides (a two-sided equation). A best approximate solution to the two-sided equation is obtained in numerical form by using an iterative alternating algorithm. To illustrate the technique developed, we solve example approximation problems in terms of a real semifield where addition is defined as maximum and multiplication as arithmetic addition (max-plus algebra), which correspond to the best Chebyshev approximation by piecewise linear functions.
The nonnegative rank of nonnegative matrices is an important quantity that appears in many fields, such as combinatorial optimization, communication complexity, and information theory. In this paper, we study the asymptotic growth of the nonnegative rank of a fixed nonnegative matrix under Kronecker product. This quantity is called the asymptotic nonnegative rank, which is already studied in information theory. By applying the theory of asymptotic spectra of V. Strassen (J. Reine Angew. Math. 1988), we introduce the asymptotic spectrum of nonnegative matrices and give a dual characterization of the asymptotic nonnegative rank. As the opposite of nonnegative rank, we introduce the notion of the subrank of a nonnegative matrix and show that it is exactly equal to the size of the maximum induced matching of the bipartite graph defined on the support of the matrix (therefore, independent of the value of entries). Finally, we show that two matrix parameters, namely rank and fractional cover number, belong to the asymptotic spectrum of nonnegative matrices.
The work of Kalman and Bucy has established a duality between filtering and optimal estimation in the context of time-continuous linear systems. This duality has recently been extended to time-continuous nonlinear systems in terms of an optimization problem constrained by a backward stochastic partial differential equation. Here we revisit this problem from the perspective of appropriate forward-backward stochastic differential equations. This approach sheds new light on the estimation problem and provides a unifying perspective. It is also demonstrated that certain formulations of the estimation problem lead to deterministic formulations similar to the linear Gaussian case as originally investigated by Kalman and Bucy. Finally, optimal control of partially observed diffusion processes is discussed as an application of the proposed estimators.
Predictive maintenance plays a critical role in ensuring the uninterrupted operation of industrial systems and mitigating the potential risks associated with system failures. This study focuses on sensor-based condition monitoring and explores the application of deep learning techniques using a hydraulic system testbed dataset. Our investigation involves comparing the performance of three models: a baseline model employing conventional methods, a single CNN model with early sensor fusion, and a two-lane CNN model (2L-CNN) with late sensor fusion. The baseline model achieves an impressive test error rate of 1% by employing late sensor fusion, where feature extraction is performed individually for each sensor. However, the CNN model encounters challenges due to the diverse sensor characteristics, resulting in an error rate of 20.5%. To further investigate this issue, we conduct separate training for each sensor and observe variations in accuracy. Additionally, we evaluate the performance of the 2L-CNN model, which demonstrates significant improvement by reducing the error rate by 33% when considering the combination of the least and most optimal sensors. This study underscores the importance of effectively addressing the complexities posed by multi-sensor systems in sensor-based condition monitoring.
We study the online graph exploration problem proposed by Kalyanasundaram and Pruhs (1994) and prove a constant competitive ratio on minor-free graphs. This result encompasses and significantly extends the graph classes that were previously known to admit a constant competitive ratio. The main ingredient of our proof is that we find a connection between the performance of the particular exploration algorithm Blocking and the existence of light spanners. Conversely, we exploit this connection to construct light spanners of bounded genus graphs. In particular, we achieve a lightness that improves on the best known upper bound for genus g>0 and recovers the known tight bound for the planar case (g=0).
We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool. In this article, we extend SIR to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, named as federated sliced inverse regression (FSIR), facilitates collaborative estimation of the sufficient dimension reduction subspace among multiple clients, solely sharing local estimates to protect sensitive datasets from exposure. To guard against potential adversary attacks, FSIR further employs diverse perturbation strategies, including a novel vectorized Gaussian mechanism that guarantees differential privacy at a low cost of statistical accuracy. Additionally, FSIR naturally incorporates a collaborative variable screening step, enabling effective handling of high-dimensional client data. Theoretical properties of FSIR are established for both low-dimensional and high-dimensional settings, supported by extensive numerical experiments and real data analysis.
Robustness to adversarial attacks is typically evaluated with adversarial accuracy. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work, we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. We show that sparsity provides valuable insight into neural networks in multiple ways: for instance, it illustrates important differences between current state-of-the-art robust models them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.
Neural network approaches to approximate the ground state of quantum hamiltonians require the numerical solution of a highly nonlinear optimization problem. We introduce a statistical learning approach that makes the optimization trivial by using kernel methods. Our scheme is an approximate realization of the power method, where supervised learning is used to learn the next step of the power iteration. We show that the ground state properties of arbitrary gapped quantum hamiltonians can be reached with polynomial resources under the assumption that the supervised learning is efficient. Using kernel ridge regression, we provide numerical evidence that the learning assumption is verified by applying our scheme to find the ground states of several prototypical interacting many-body quantum systems, both in one and two dimensions, showing the flexibility of our approach.