The asymptotically precise estimation of the generalization of kernel methods has recently received attention due to the parallels between neural networks and their associated kernels. However, prior works derive such estimates for training by kernel ridge regression (KRR), whereas neural networks are typically trained with gradient descent (GD). In the present work, we consider the training of kernels with a family of $\textit{spectral algorithms}$ specified by profile $h(\lambda)$, and including KRR and GD as special cases. Then, we derive the generalization error as a functional of learning profile $h(\lambda)$ for two data models: high-dimensional Gaussian and low-dimensional translation-invariant model. Under power-law assumptions on the spectrum of the kernel and target, we use our framework to (i) give full loss asymptotics for both noisy and noiseless observations (ii) show that the loss localizes on certain spectral scales, giving a new perspective on the KRR saturation phenomenon (iii) conjecture, and demonstrate for the considered data models, the universality of the loss w.r.t. non-spectral details of the problem, but only in case of noisy observation.
In this paper, to address the optimization problem on a compact matrix manifold, we introduce a novel algorithmic framework called the Transformed Gradient Projection (TGP) algorithm, using the projection onto this compact matrix manifold. Compared with the existing algorithms, the key innovation in our approach lies in the utilization of a new class of search directions and various stepsizes, including the Armijo, nonmonotone Armijo, and fixed stepsizes, to guide the selection of the next iterate. Our framework offers flexibility by encompassing the classical gradient projection algorithms as special cases, and intersecting the retraction-based line-search algorithms. Notably, our focus is on the Stiefel or Grassmann manifold, revealing that many existing algorithms in the literature can be seen as specific instances within our proposed framework, and this algorithmic framework also induces several new special cases. Then, we conduct a thorough exploration of the convergence properties of these algorithms, considering various search directions and stepsizes. To achieve this, we extensively analyze the geometric properties of the projection onto compact matrix manifolds, allowing us to extend classical inequalities related to retractions from the literature. Building upon these insights, we establish the weak convergence, convergence rate, and global convergence of TGP algorithms under three distinct stepsizes. In cases where the compact matrix manifold is the Stiefel or Grassmann manifold, our convergence results either encompass or surpass those found in the literature. Finally, through a series of numerical experiments, we observe that the TGP algorithms, owing to their increased flexibility in choosing search directions, outperform classical gradient projection and retraction-based line-search algorithms in several scenarios.
This contribution introduces the idea of refinement patterns for the generation of optimal meshes in the context of the Finite Element Method. The main idea is to generate a library of possible patterns on which elements can be refined and use this library to inform an h adaptive code on how to handle complex refinements in regions of interest. There are no restrictions on the type of elements that can be refined, and the patterns can be generated for any element type. The main advantage of this approach is that it allows for the generation of optimal meshes in a systematic way where, even if a certain pattern is not available, it can easily be included through a simple text file with nodes and sub-elements. The contribution presents a detailed methodology for incorporating refinement patterns into h adaptive Finite Element Method codes and demonstrates the effectiveness of the approach through mesh refinement of problems with complex geometries.
The fine-grained complexity of computing the Fr\'echet distance has been a topic of much recent work, starting with the quadratic SETH-based conditional lower bound by Bringmann from 2014. Subsequent work established largely the same complexity lower bounds for the Fr\'echet distance in 1D. However, the imbalanced case, which was shown by Bringmann to be tight in dimensions $d\geq 2$, was still left open. Filling in this gap, we show that a faster algorithm for the Fr\'echet distance in the imbalanced case is possible: Given two 1-dimensional curves of complexity $n$ and $n^{\alpha}$ for some $\alpha \in (0,1)$, we can compute their Fr\'echet distance in $O(n^{2\alpha} \log^2 n + n \log n)$ time. This rules out a conditional lower bound of the form $O((nm)^{1-\epsilon})$ that Bringmann showed for $d \geq 2$ and any $\varepsilon>0$ in turn showing a strict separation with the setting $d=1$. At the heart of our approach lies a data structure that stores a 1-dimensional curve $P$ of complexity $n$, and supports queries with a curve $Q$ of complexity~$m$ for the continuous Fr\'echet distance between $P$ and $Q$. The data structure has size in $\mathcal{O}(n\log n)$ and uses query time in $\mathcal{O}(m^2 \log^2 n)$. Our proof uses a key lemma that is based on the concept of visiting orders and may be of independent interest. We demonstrate this by substantially simplifying the correctness proof of a clustering algorithm by Driemel, Krivo\v{s}ija and Sohler from 2015.
We analyze a bilinear optimal control problem for the Stokes--Brinkman equations: the control variable enters the state equations as a coefficient. In two- and three-dimensional Lipschitz domains, we perform a complete continuous analysis that includes the existence of solutions and first- and second-order optimality conditions. We also develop two finite element methods that differ fundamentally in whether the admissible control set is discretized or not. For each of the proposed methods, we perform a convergence analysis and derive a priori error estimates; the latter under the assumption that the domain is convex. Finally, assuming that the domain is Lipschitz, we develop an a posteriori error estimator for each discretization scheme and obtain a global reliability bound.
In cluster-randomized experiments, there is emerging interest in exploring the causal mechanism in which a cluster-level treatment affects the outcome through an intermediate outcome. Despite an extensive development of causal mediation methods in the past decade, only a few exceptions have been considered in assessing causal mediation in cluster-randomized studies, all of which depend on parametric model-based estimators. In this article, we develop the formal semiparametric efficiency theory to motivate several doubly-robust methods for addressing several mediation effect estimands corresponding to both the cluster-average and the individual-level treatment effects in cluster-randomized experiments--the natural indirect effect, natural direct effect, and spillover mediation effect. We derive the efficient influence function for each mediation effect, and carefully parameterize each efficient influence function to motivate practical strategies for operationalizing each estimator. We consider both parametric working models and data-adaptive machine learners to estimate the nuisance functions, and obtain semiparametric efficient causal mediation estimators in the latter case. Our methods are illustrated via extensive simulations and two completed cluster-randomized experiments.
Two sequential estimators are proposed for the odds p/(1-p) and log odds log(p/(1-p)) respectively, using independent Bernoulli random variables with parameter p as inputs. The estimators are unbiased, and guarantee that the variance of the estimation error divided by the true value of the odds, or the variance of the estimation error of the log odds, are less than a target value for any p in (0,1). The estimators are close to optimal in the sense of Wolfowitz's bound.
Scopus and the Web of Science have been the foundation for research in the science of science even though these traditional databases systematically underrepresent certain disciplines and world regions. In response, new inclusive databases, notably OpenAlex, have emerged. While many studies have begun using OpenAlex as a data source, few critically assess its limitations. This study, conducted in collaboration with the OpenAlex team, addresses this gap by comparing OpenAlex to Scopus across a number of dimensions. The analysis concludes that OpenAlex is a superset of Scopus and can be a reliable alternative for some analyses, particularly at the country level. Despite this, issues of metadata accuracy and completeness show that additional research is needed to fully comprehend and address OpenAlex's limitations. Doing so will be necessary to confidently use OpenAlex across a wider set of analyses, including those that are not at all possible with more constrained databases.
The starting point of this paper is a collection of properties of an algorithm that have been distilled from the informal descriptions of what an algorithm is that are given in standard works from the mathematical and computer science literature. Based on that, the notion of a proto-algorithm is introduced. The thought is that algorithms are equivalence classes of proto-algorithms under some equivalence relation. Three equivalence relations are defined. Two of them give bounds between which an appropriate equivalence relation must lie. The third lies in between these two and is likely an appropriate equivalence relation. A sound method is presented to prove, using an imperative process algebra based on ACP, that this equivalence relation holds between two proto-algorithms.
We consider a finite element method for elliptic equation with heterogeneous and possibly high-contrast coefficients based on primal hybrid formulation. A space decomposition as in FETI and BDCC allows a sequential computations of the unknowns through elliptic problems and satisfies equilibrium constraints. One of the resulting problems is non-local but with exponentially decaying solutions, enabling a practical scheme where the basis functions have an extended, but still local, support. We obtain quasi-optimal a priori error estimates for low-contrast problems assuming minimal regularity of the solutions. To also consider the high-contrast case, we propose a variant of our method, enriching the space solution via local eigenvalue problems and obtaining optimal a priori error estimate that mitigates the effect of having coefficients with different magnitudes and again assuming no regularity of the solution. The technique developed is dimensional independent and easy to extend to other problems such as elasticity.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.