Can graded meshes yield more accurate numerical solution than uniform meshes? A time-dependent nonlocal diffusion problem with a weakly singular kernel is considered using collocation method. For its steady-state counterpart, under the sufficiently smooth solution, we first clarify that the standard graded meshes are worse than uniform meshes and may even lead to divergence; instead, an optimal convergence rate arises in so-called anomalous graded meshes. Furthermore, under low regularity solutions, it may suffer from a severe order reduction in (Chen, Qi, Shi and Wu, IMA J. Numer. Anal., 41 (2021) 3145--3174). In this case, conversely, a sharp error estimates appears in standard graded meshes, but offering far less than first-order accuracy. For the time-dependent case, however, second-order convergence can be achieved on graded meshes. The related analysis are easily extended for certain multidimensional problems. Numerical results are provided that confirm the sharpness of the error estimates.
The paper presents a numerical method for simulating flow and mechanics in fractured rock. The governing equations that couple the effects in the rock mass and in the fractures are obtained using the discrete fracture-matrix approach. The fracture flow is driven by the cubic law, and the contact conditions prevent fractures from self-penetration. A stable finite element discretization is proposed for the displacement-pressure-flux formulation. The resulting nonlinear algebraic system of equations and inequalities is decoupled using a robust iterative splitting into the linearized flow subproblem, and the quadratic programming problem for the mechanical part. The non-penetration conditions are solved by means of dualization and an optimal quadratic programming algorithm. The capability of the numerical scheme is demonstrated on a benchmark problem for tunnel excavation with hundreds of fractures in 3D. The paper's novelty consists in a combination of three crucial ingredients: (i) application of discrete fracture-matrix approach to poroelasticity, (ii) robust iterative splitting of resulting nonlinear algebraic system working for real-world 3D problems, and (iii) efficient solution of its mechanical quadratic programming part with a large number of fractures in mutual contact by means of own solvers implemented into an in-house software library.
We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.
Can a mere next-token predictor faithfully model human intelligence? We crystallize this intuitive concern, which is fragmented in the literature. As a starting point, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner -- remarkably, despite the task being straightforward to learn. We provide preliminary evidence that this failure can be resolved when training to predict multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm. We make our code available under //github.com/gregorbachmann/Next-Token-Failures
This paper is devoted to the analysis of a numerical scheme based on the Finite Element Method for approximating the solution of Koiter's model for a linearly elastic elliptic membrane shell subjected to remaining confined in a prescribed half-space. First, we show that the solution of the obstacle problem under consideration is uniquely determined and satisfies a set of variational inequalities which are governed by a fourth order elliptic operator, and which are posed over a non-empty, closed, and convex subset of a suitable space. Second, we show that the solution of the obstacle problem under consideration can be approximated by means of the penalty method. Third, we show that the solution of the corresponding penalised problem is more regular up to the boundary. Fourth, we write down the mixed variational formulation corresponding to the penalised problem under consideration, and we show that the solution of the mixed variational formulation is more regular up to the boundary as well. In view of this result concerning the augmentation of the regularity of the solution of the mixed penalised problem, we are able to approximate the solution of the one such problem by means of a Finite Element scheme. Finally, we present numerical experiments corroborating the validity of the mathematical results we obtained.
Dimensionality reduction on quadratic manifolds augments linear approximations with quadratic correction terms. Previous works rely on linear approximations given by projections onto the first few leading principal components of the training data; however, linear approximations in subspaces spanned by the leading principal components alone can miss information that are necessary for the quadratic correction terms to be efficient. In this work, we propose a greedy method that constructs subspaces from leading as well as later principal components so that the corresponding linear approximations can be corrected most efficiently with quadratic terms. Properties of the greedily constructed manifolds allow applying linear algebra reformulations so that the greedy method scales to data points with millions of dimensions. Numerical experiments demonstrate that an orders of magnitude higher accuracy is achieved with the greedily constructed quadratic manifolds compared to manifolds that are based on the leading principal components alone.
First-order methods are often analyzed via their continuous-time models, where their worst-case convergence properties are usually approached via Lyapunov functions. In this work, we provide a systematic and principled approach to find and verify Lyapunov functions for classes of ordinary and stochastic differential equations. More precisely, we extend the performance estimation framework, originally proposed by Drori and Teboulle [10], to continuous-time models. We retrieve convergence results comparable to those of discrete methods using fewer assumptions and convexity inequalities, and provide new results for stochastic accelerated gradient flows.
In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.
Optimal estimation and inference for both the minimizer and minimum of a convex regression function under the white noise and nonparametric regression models are studied in a nonasymptotic local minimax framework, where the performance of a procedure is evaluated at individual functions. Fully adaptive and computationally efficient algorithms are proposed and sharp minimax lower bounds are given for both the estimation accuracy and expected length of confidence intervals for the minimizer and minimum. The nonasymptotic local minimax framework brings out new phenomena in simultaneous estimation and inference for the minimizer and minimum. We establish a novel uncertainty principle that provides a fundamental limit on how well the minimizer and minimum can be estimated simultaneously for any convex regression function. A similar result holds for the expected length of the confidence intervals for the minimizer and minimum.
This article is concerned with the multilevel Monte Carlo (MLMC) methods for approximating expectations of some functions of the solution to the Heston 3/2-model from mathematical finance, which takes values in $(0, \infty)$ and possesses superlinearly growing drift and diffusion coefficients. To discretize the SDE model, a new Milstein-type scheme is proposed to produce independent sample paths. The proposed scheme can be explicitly solved and is positivity-preserving unconditionally, i.e., for any time step-size $h>0$. This positivity-preserving property for large discretization time steps is particularly desirable in the MLMC setting. Furthermore, a mean-square convergence rate of order one is proved in the non-globally Lipschitz regime, which is not trivial, as the diffusion coefficient grows super-linearly. The obtained order-one convergence in turn promises the desired relevant variance of the multilevel estimator and justifies the optimal complexity $\mathcal{O}(\epsilon^{-2})$ for the MLMC approach, where $\epsilon > 0$ is the required target accuracy. Numerical experiments are finally reported to confirm the theoretical findings.
In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data,through multiple imputation.Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.