In the study of sparse stochastic block models (SBMs) one often needs to analyze a distributional recursion, known as the belief propagation (BP) recursion. Uniqueness of the fixed point of this recursion implies several results about the SBM, including optimal recovery algorithms for SBM (Mossel et al. (2016)) and SBM with side information (Mossel and Xu (2016)), and a formula for SBM mutual information (Abbe et al. (2021)). The 2-community case corresponds to an Ising model, for which Yu and Polyanskiy (2022) established uniqueness for all cases. In this paper we analyze the $q$-ary Potts model, i.e., broadcasting of $q$-ary spins on a Galton-Watson tree with expected offspring degree $d$ through Potts channels with second-largest eigenvalue $\lambda$. We allow the intermediate vertices to be observed through noisy channels (side information). We prove that BP uniqueness holds with and without side information when $d\lambda^2 \ge 1 + C \max\{\lambda, q^{-1}\}\log q$ for some absolute constant $C>0$ independent of $q,\lambda,d$. For large $q$ and $\lambda = o(1/\log q)$, this is asymptotically achieving the Kesten-Stigum threshold $d\lambda^2=1$. These results imply mutual information formulas and optimal recovery algorithms for the $q$-community SBM in the corresponding ranges. For $q\ge 4$, Sly (2011); Mossel et al. (2022) showed that there exist choices of $q,\lambda,d$ below Kesten-Stigum (i.e. $d\lambda^2 < 1$) but reconstruction is possible. Somewhat surprisingly, we show that in such regimes BP uniqueness does not hold at least in the presence of weak side information. Our technical tool is a theory of $q$-ary symmetric channels, that we initiate here, generalizing the classical and widely-utilized information-theoretic characterization of BMS (binary memoryless symmetric) channels.
In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.
We propose a dynamic sensor selection approach for deep neural networks (DNNs), which is able to derive an optimal sensor subset selection for each specific input sample instead of a fixed selection for the entire dataset. This dynamic selection is jointly learned with the task model in an end-to-end way, using the Gumbel-Softmax trick to allow the discrete decisions to be learned through standard backpropagation. We then show how we can use this dynamic selection to increase the lifetime of a wireless sensor network (WSN) by imposing constraints on how often each node is allowed to transmit. We further improve performance by including a dynamic spatial filter that makes the task-DNN more robust against the fact that it now needs to be able to handle a multitude of possible node subsets. Finally, we explain how the selection of the optimal channels can be distributed across the different nodes in a WSN. We validate this method on a use case in the context of body-sensor networks, where we use real electroencephalography (EEG) sensor data to emulate an EEG sensor network. We analyze the resulting trade-offs between transmission load and task accuracy.
Living organisms need to acquire both cognitive maps for learning the structure of the world and planning mechanisms able to deal with the challenges of navigating ambiguous environments. Although significant progress has been made in each of these areas independently, the best way to integrate them is an open research question. In this paper, we propose the integration of a statistical model of cognitive map formation within an active inference agent that supports planning under uncertainty. Specifically, we examine the clone-structured cognitive graph (CSCG) model of cognitive map formation and compare a naive clone graph agent with an active inference-driven clone graph agent, in three spatial navigation scenarios. Our findings demonstrate that while both agents are effective in simple scenarios, the active inference agent is more effective when planning in challenging scenarios, in which sensory observations provide ambiguous information about location.
This research aims to develop kernel GNG, a kernelized version of the growing neural gas (GNG) algorithm, and to investigate the features of the networks generated by the kernel GNG. The GNG is an unsupervised artificial neural network that can transform a dataset into an undirected graph, thereby extracting the features of the dataset as a graph. The GNG is widely used in vector quantization, clustering, and 3D graphics. Kernel methods are often used to map a dataset to feature space, with support vector machines being the most prominent application. This paper introduces the kernel GNG approach and explores the characteristics of the networks generated by kernel GNG. Five kernels, including Gaussian, Laplacian, Cauchy, inverse multiquadric, and log kernels, are used in this study.
Uniform error estimates of a bi-fidelity method for a kinetic-fluid coupled model with random initial inputs in the fine particle regime are proved in this paper. Such a model is a system coupling the incompressible Navier-Stokes equations to the Vlasov-Fokker-Planck equations for a mixture of the flows with distinct particle sizes. The main analytic tool is the hypocoercivity analysis for the multi-phase Navier-Stokes-Vlasov-Fokker-Planck system with uncertainties, considering solutions in a perturbative setting near the global equilibrium. This allows us to obtain the error estimates in both kinetic and hydrodynamic regimes.
In this paper, we introduce a novel numerical approach for approximating the SIR model in epidemiology. Our method enhances the existing linearization procedure by incorporating a suitable relaxation term to tackle the transcendental equation of nonlinear type. Developed within the continuous framework, our relaxation method is explicit and easy to implement, relying on a sequence of linear differential equations. This approach yields accurate approximations in both discrete and analytical forms. Through rigorous analysis, we prove that, with an appropriate choice of the relaxation parameter, our numerical scheme is non-negativity-preserving and globally strongly convergent towards the true solution. These theoretical findings have not received sufficient attention in various existing SIR solvers. We also extend the applicability of our relaxation method to handle some variations of the traditional SIR model. Finally, we present numerical examples using simulated data to demonstrate the effectiveness of our proposed method.
Long-span bridges are subjected to a multitude of dynamic excitations during their lifespan. To account for their effects on the structural system, several load models are used during design to simulate the conditions the structure is likely to experience. These models are based on different simplifying assumptions and are generally guided by parameters that are stochastically identified from measurement data, making their outputs inherently uncertain. This paper presents a probabilistic physics-informed machine-learning framework based on Gaussian process regression for reconstructing dynamic forces based on measured deflections, velocities, or accelerations. The model can work with incomplete and contaminated data and offers a natural regularization approach to account for noise in the measurement system. An application of the developed framework is given by an aerodynamic analysis of the Great Belt East Bridge. The aerodynamic response is calculated numerically based on the quasi-steady model, and the underlying forces are reconstructed using sparse and noisy measurements. Results indicate a good agreement between the applied and the predicted dynamic load and can be extended to calculate global responses and the resulting internal forces. Uses of the developed framework include validation of design models and assumptions, as well as prognosis of responses to assist in damage detection and structural health monitoring.
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen and improvements by Koml\'os, Major, and Tusn\'ady) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration of our theory, we derive asymptotic CSs for the average treatment effect using efficient estimators in observational studies (for which no nonasymptotic bounds can exist even in the fixed-time regime) as well as randomized experiments, enabling causal inference that can be continuously monitored and adaptively stopped.
The research in this article aims to find conditions of an algorithmic nature that are necessary and sufficient to transform any Boolean function in conjunctive normal form into a specific form that guarantees the satisfiability of this function. To find such conditions, we use the concept of a special covering of a set introduced in [13], and investigate the connection between this concept and the notion of satisfiability of Boolean functions. As shown, the problem of existence of a special covering for a set is equivalent to the Boolean satisfiability problem. Thus, an important result is the proof of the existence of necessary and sufficient conditions that make it possible to find out if there is a special covering for the set under the special decomposition. This result allows us to formulate the necessary and sufficient algorithmic conditions for Boolean satisfiability, considering the function in conjunctive normal form as a set of clauses. In parallel, as a result of the aforementioned algorithmic procedure, we obtain the values of the variables that ensure the satisfiability of this function. The terminology used related to graph theory, set theory, Boolean functions and complexity theory is consistent with the terminology in [1], [2], [3], [4]. The newly introduced terms are not found in use by other authors and do not contradict to other terms.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.