Stein's method for Gaussian process approximation can be used to bound the differences between the expectations of smooth functionals $h$ of a c\`adl\`ag random process $X$ of interest and the expectations of the same functionals of a well understood target random process $Z$ with continuous paths. Unfortunately, the class of smooth functionals for which this is easily possible is very restricted. Here, we prove an infinite dimensional Gaussian smoothing inequality, which enables the class of functionals to be greatly expanded -- examples are Lipschitz functionals with respect to the uniform metric, and indicators of arbitrary events -- in exchange for a loss of precision in the bounds. Our inequalities are expressed in terms of the smooth test function bound, an expectation of a functional of $X$ that is closely related to classical tightness criteria, a similar expectation for $Z$, and, for the indicator of a set $K$, the probability $\mathbb{P}(Z \in K^\theta \setminus K^{-\theta})$ that the target process is close to the boundary of $K$.
Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts.
Rational function approximations provide a simple but flexible alternative to polynomial approximation, allowing one to capture complex non-linearities without oscillatory artifacts. However, there have been few attempts to use rational functions on noisy data due to the likelihood of creating spurious singularities. To avoid the creation of singularities, we use Bernstein polynomials and appropriate conditions on their coefficients to force the denominator to be strictly positive. While this reduces the range of rational polynomials that can be expressed, it keeps all the benefits of rational functions while maintaining the robustness of polynomial approximation in noisy data scenarios. Our numerical experiments on noisy data show that existing rational approximation methods continually produce spurious poles inside the approximation domain. This contrasts our method, which cannot create poles in the approximation domain and provides better fits than a polynomial approximation and even penalized splines on functions with multiple variables. Moreover, guaranteeing pole-free in an interval is critical for estimating non-constant coefficients when numerically solving differential equations using spectral methods. This provides a compact representation of the original differential equation, allowing numeric solvers to achieve high accuracy quickly, as seen in our experiments.
The proximal Galerkin finite element method is a high-order, low iteration complexity, nonlinear numerical method that preserves the geometric and algebraic structure of pointwise bound constraints in infinite-dimensional function spaces. This paper introduces the proximal Galerkin method and applies it to solve free boundary problems, enforce discrete maximum principles, and develop a scalable, mesh-independent algorithm for optimal design problems with pointwise bound constraints. This paper also provides a derivation of the latent variable proximal point (LVPP) algorithm, an unconditionally stable alternative to the interior point method. LVPP is an infinite-dimensional optimization algorithm that may be viewed as having an adaptive barrier function that is updated with a new informative prior at each (outer loop) optimization iteration. One of its main benefits is witnessed when analyzing the classical obstacle problem. Therein, we find that the original variational inequality can be replaced by a sequence of partial differential equations (PDEs) that are readily discretized and solved with, e.g., high-order finite elements. Throughout this work, we arrive at several unexpected contributions that may be of independent interest. These include (1) a semilinear PDE we refer to as the entropic Poisson equation; (2) an algebraic/geometric connection between high-order positivity-preserving discretizations and certain infinite-dimensional Lie groups; and (3) a gradient-based, bound-preserving algorithm for two-field density-based topology optimization. The complete latent variable proximal Galerkin methodology combines ideas from nonlinear programming, functional analysis, tropical algebra, and differential geometry and can potentially lead to new synergies among these areas as well as within variational and numerical analysis.
A major interest in longitudinal neuroimaging studies involves investigating voxel-level neuroplasticity due to treatment and other factors across visits. However, traditional voxel-wise methods are beset with several pitfalls, which can compromise the accuracy of these approaches. We propose a novel Bayesian tensor response regression approach for longitudinal imaging data, which pools information across spatially-distributed voxels to infer significant changes while adjusting for covariates. The proposed method, which is implemented using Markov chain Monte Carlo (MCMC) sampling, utilizes low-rank decomposition to reduce dimensionality and preserve spatial configurations of voxels when estimating coefficients. It also enables feature selection via joint credible regions which respect the shape of the posterior distributions for more accurate inference. In addition to group level inferences, the method is able to infer individual-level neuroplasticity, allowing for examination of personalized disease or recovery trajectories. The advantages of the proposed approach in terms of prediction and feature selection over voxel-wise regression are highlighted via extensive simulation studies. Subsequently, we apply the approach to a longitudinal Aphasia dataset consisting of task functional MRI images from a group of subjects who were administered either a control intervention or intention treatment at baseline and were followed up over subsequent visits. Our analysis revealed that while the control therapy showed long-term increases in brain activity, the intention treatment produced predominantly short-term changes, both of which were concentrated in distinct localized regions. In contrast, the voxel-wise regression failed to detect any significant neuroplasticity after multiplicity adjustments, which is biologically implausible and implies lack of power.
The distribution-free chain ladder of Mack justified the use of the chain ladder predictor and enabled Mack to derive an estimator of conditional mean squared error of prediction for the chain ladder predictor. Classical insurance loss models, i.e. of compound Poisson type, are not consistent with Mack's distribution-free chain ladder. However, for a sequence of compound Poisson loss models indexed by exposure (e.g. number of contracts), we show that the chain ladder predictor and Mack's estimator of conditional mean squared error of prediction can be derived by considering large exposure asymptotics. Hence, quantifying chain ladder prediction uncertainty can be done with Mack's estimator without relying on the validity of the model assumptions of the distribution-free chain ladder.
Optimum distance flag codes (ODFCs), as special flag codes, have received a lot of attention due to its application in random network coding. In 2021, Alonso-Gonz\'{a}lez et al. constructed optimal $(n,\mathcal{A})$-ODFC for $\mathcal {A}\subseteq \{1,2,\ldots,k,n-k,\ldots,n-1\}$ with $k\in \mathcal A$ and $k|n$. In this paper, we introduce a new construction of $(n,\mathcal A)_q$-ODFCs by maximum rank-metric codes. It is proved that there is an $(n,\mathcal{A})$-ODFC of size $\frac{q^n-q^{k+r}}{q^k-1}+1$ for any $\mathcal{A}\subseteq\{1,2,\ldots,k,n-k,\ldots,n-1\}$ with $\mathcal A\cap \{k,n-k\}\neq\emptyset$, where $r\equiv n\pmod k$ and $0\leq r<k$. Furthermore, when $k>\frac{q^r-1}{q-1}$, this $(n,\mathcal A)_q$-ODFC is optimal. Specially, when $r=0$, Alonso-Gonz\'{a}lez et al.'s result is also obtained.
Biological organisms have acquired sophisticated body shapes for walking or climbing through million-year evolutionary processes. In contrast, the components of locomoting soft robots, such as legs and arms, are designed in trial-and-error loops guided by a priori knowledge and experience, which leaves considerable room for improvement. Here, we present optimized soft robots that performed a specific locomotion task without any a priori assumptions or knowledge of the layout and shapes of the limbs by fully exploiting the computational capabilities for topology optimization. The only requirements introduced were a design domain and a periodically acting pneumatic actuator. The freeform shape of a soft body was derived from iterative updates in a gradient-based topology optimization that incorporated complex physical phenomena, such as large deformations, contacts, material viscosity, and fluid-structure interactions, in transient problems. The locomotion tasks included a horizontal movement on flat ground (walking) and a vertical movement between two walls (climbing). Without any human intervention, optimized soft robots have limbs and exhibit locomotion similar to those of biological organisms. Linkage-like structures were formed for the climbing task to assign different movements to multiple legs with limited degrees of freedom in the actuator. We fabricated the optimized design using 3D printing and confirmed the performance of these robots. This study presents a new and efficient strategy for designing soft robots and other bioinspired systems, suggesting that a purely mathematical process can produce shapes reminiscent of nature's long-term evolution.
Over the last decade, approximating functions in infinite dimensions from samples has gained increasing attention in computational science and engineering, especially in computational uncertainty quantification. This is primarily due to the relevance of functions that are solutions to parametric differential equations in various fields, e.g. chemistry, economics, engineering, and physics. While acquiring accurate and reliable approximations of such functions is inherently difficult, current benchmark methods exploit the fact that such functions often belong to certain classes of holomorphic functions to get algebraic convergence rates in infinite dimensions with respect to the number of (potentially adaptive) samples $m$. Our work focuses on providing theoretical approximation guarantees for the class of $(\boldsymbol{b},\varepsilon)$-holomorphic functions, demonstrating that these algebraic rates are the best possible for Banach-valued functions in infinite dimensions. We establish lower bounds using a reduction to a discrete problem in combination with the theory of $m$-widths, Gelfand widths and Kolmogorov widths. We study two cases, known and unknown anisotropy, in which the relative importance of the variables is known and unknown, respectively. A key conclusion of our paper is that in the latter setting, approximation from finite samples is impossible without some inherent ordering of the variables, even if the samples are chosen adaptively. Finally, in both cases, we demonstrate near-optimal, non-adaptive (random) sampling and recovery strategies which achieve close to same rates as the lower bounds.
Finding the optimal design of experiments in the Bayesian setting typically requires estimation and optimization of the expected information gain functional. This functional consists of one outer and one inner integral, separated by the logarithm function applied to the inner integral. When the mathematical model of the experiment contains uncertainty about the parameters of interest and nuisance uncertainty, (i.e., uncertainty about parameters that affect the model but are not themselves of interest to the experimenter), two inner integrals must be estimated. Thus, the already considerable computational effort required to determine good approximations of the expected information gain is increased further. The Laplace approximation has been applied successfully in the context of experimental design in various ways, and we propose two novel estimators featuring the Laplace approximation to alleviate the computational burden of both inner integrals considerably. The first estimator applies Laplace's method followed by a Laplace approximation, introducing a bias. The second estimator uses two Laplace approximations as importance sampling measures for Monte Carlo approximations of the inner integrals. Both estimators use Monte Carlo approximation for the remaining outer integral estimation. We provide three numerical examples demonstrating the applicability and effectiveness of our proposed estimators.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.