Combining information both within and between sample realizations, we propose a simple estimator for the local regularity of surfaces in the functional data framework. The independently generated surfaces are measured with errors at possibly random discrete times. Non-asymptotic exponential bounds for the concentration of the regularity estimators are derived. An indicator for anisotropy is proposed and an exponential bound of its risk is derived. Two applications are proposed. We first consider the class of multi-fractional, bi-dimensional, Brownian sheets with domain deformation, and study the nonparametric estimation of the deformation. As a second application, we build minimax optimal, bivariate kernel estimators for the reconstruction of the surfaces.
In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the "Causal Zig-Zag sampler", that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art.
Results on the rational approximation of functions containing singularities are presented. We build further on the ''lightning method'', recently proposed by Trefethen and collaborators, based on exponentially clustering poles close to the singularities. Our results are obtained by augmenting the lightning approximation set with either a low-degree polynomial basis or poles clustering towards infinity, in order to obtain a robust approximation of the smooth behaviour of the function. This leads to a significant increase in the achievable accuracy as well as the convergence rate of the numerical scheme. For the approximation of $x^\alpha$ on $[0,1]$, the optimal convergence rate as shown by Stahl in 1993 is now achieved simply by least-squares fitting.
In this paper we propose polarized consensus-based dynamics in order to make consensus-based optimization (CBO) and sampling (CBS) applicable for objective functions with several global minima or distributions with many modes, respectively. For this, we ``polarize'' the dynamics with a localizing kernel and the resulting model can be viewed as a bounded confidence model for opinion formation in the presence of common objective. Instead of being attracted to a common weighted mean as in the original consensus-based methods, which prevents the detection of more than one minimum or mode, in our method every particle is attracted to a weighted mean which gives more weight to nearby particles. We prove that in the mean-field regime the polarized CBS dynamics are unbiased for Gaussian targets. We also prove that in the zero temperature limit and for sufficiently well-behaved strongly convex objectives the solution of the Fokker--Planck equation converges in the Wasserstein-2 distance to a Dirac measure at the minimizer. Finally, we propose a computationally more efficient generalization which works with a predefined number of clusters and improves upon our polarized baseline method for high-dimensional optimization.
Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
Evaluating environmental variables that vary stochastically is the principal topic for designing better environmental management and restoration schemes. Both the upper and lower estimates of these variables, such as water quality indices and flood and drought water levels, are important and should be consistently evaluated within a unified mathematical framework. We propose a novel pair of Orlicz regrets to consistently bound the statistics of random variables both from below and above. Here, consistency indicates that the upper and lower bounds are evaluated with common coefficients and parameter values being different from some of the risk measures proposed thus far. Orlicz regrets can flexibly evaluate the statistics of random variables based on their tail behavior. The explicit linkage between Orlicz regrets and divergence risk measures was exploited to better comprehend them. We obtain sufficient conditions to pose the Orlicz regrets as well as divergence risk measures, and further provide gradient descent-type numerical algorithms to compute them. Finally, we apply the proposed mathematical framework to the statistical evaluation of 31-year water quality data as key environmental indicators in a Japanese river environment.
In this paper, we propose a multiphysics finite element method for a quasi-static thermo-poroelasticity model with a nonlinear convective transport term. To design some stable numerical methods and reveal the multi-physical processes of deformation, diffusion and heat, we introduce three new variables to reformulate the original model into a fluid coupled problem. Then, we introduce an Newton's iterative algorithm by replacing the convective transport term with $\nabla T^{i}\cdot(\bm{K}\nabla p^{i-1})$, $\nabla T^{i-1}\cdot(\bm{K}\nabla p^{i})$ and $\nabla T^{i-1}\cdot(\bm{K}\nabla p^{i-1})$, and apply the Banach fixed point theorem to prove the convergence of the proposed method. Then, we propose a multiphysics finite element method with Newton's iterative algorithm, which is equivalent to a stabilized method, can effectively overcome the numerical oscillation caused by the nonlinear thermal convection term. Also, we prove that the fully discrete multiphysics finite element method has an optimal convergence order. Finally, we draw conclusions to summarize the main results of this paper.
Acceleration of gradient-based optimization methods is an issue of significant practical and theoretical interest, particularly in machine learning applications. Most research has focused on optimization over Euclidean spaces, but given the need to optimize over spaces of probability measures in many machine learning problems, it is of interest to investigate accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach that is analogous to moment-based approaches in Euclidean space. We demonstrate that algorithms based on this approach can achieve convergence rates of arbitrarily high order. Numerical examples illustrate our claim.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.