We propose a simple methodology to approximate functions with given asymptotic behavior by specifically constructed terms and an unconstrained deep neural network (DNN). The methodology we describe extends to various asymptotic behaviors and multiple dimensions and is easy to implement. In this work we demonstrate it for linear asymptotic behavior in one-dimensional examples. We apply it to function approximation and regression problems where we measure approximation of only function values (``Vanilla Machine Learning''-VML) or also approximation of function and derivative values (``Differential Machine Learning''-DML) on several examples. We see that enforcing given asymptotic behavior leads to better approximation and faster convergence.
We investigate the proof complexity of systems based on positive branching programs, i.e. non-deterministic branching programs (NBPs) where, for any 0-transition between two nodes, there is also a 1-transition. Positive NBPs compute monotone Boolean functions, just like negation-free circuits or formulas, but constitute a positive version of (non-uniform) NL, rather than P or NC1, respectively. The proof complexity of NBPs was investigated in previous work by Buss, Das and Knop, using extension variables to represent the dag-structure, over a language of (non-deterministic) decision trees, yielding the system eLNDT. Our system eLNDT+ is obtained by restricting their systems to a positive syntax, similarly to how the 'monotone sequent calculus' MLK is obtained from the usual sequent calculus LK by restricting to negation-free formulas. Our main result is that eLNDT+ polynomially simulates eLNDT over positive sequents. Our proof method is inspired by a similar result for MLK by Atserias, Galesi and Pudl\'ak, that was recently improved to a bona fide polynomial simulation via works of Je\v{r}\'abek and Buss, Kabanets, Kolokolova and Kouck\'y. Along the way we formalise several properties of counting functions within eLNDT+ by polynomial-size proofs and, as a case study, give explicit polynomial-size poofs of the propositional pigeonhole principle.
Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalized linear models. Many improvements and sophistications to the first gradient boosting machine algorithm exist. We present in a unified notation, and contrast, all the existing point and probabilistic gradient boosting for decision tree algorithms: GBM, XGBoost, DART, LightGBM, CatBoost, EGBM, PGBM, XGBoostLSS, cyclic GBM, and NGBoost. In this comprehensive numerical study, we compare their performance on five publicly available datasets for claim frequency and severity, of various size and comprising different number of (high cardinality) categorical variables. We explain how varying exposure-to-risk can be handled with boosting in frequency models. We compare the algorithms on the basis of computational efficiency, predictive performance, and model adequacy. LightGBM and XGBoostLSS win in terms of computational efficiency. The fully interpretable EGBM achieves competitive predictive performance compared to the black box algorithms considered. We find that there is no trade-off between model adequacy and predictive accuracy: both are achievable simultaneously.
Many inverse problems are ill-posed and need to be complemented by prior information that restricts the class of admissible models. Bayesian approaches encode this information as prior distributions that impose generic properties on the model such as sparsity, non-negativity or smoothness. However, in case of complex structured models such as images, graphs or three-dimensional (3D) objects,generic prior distributions tend to favor models that differ largely from those observed in the real world. Here we explore the use of diffusion models as priors that are combined with experimental data within a Bayesian framework. We use 3D point clouds to represent 3D objects such as household items or biomolecular complexes formed from proteins and nucleic acids. We train diffusion models that generate coarse-grained 3D structures at a medium resolution and integrate these with incomplete and noisy experimental data. To demonstrate the power of our approach, we focus on the reconstruction of biomolecular assemblies from cryo-electron microscopy (cryo-EM) images, which is an important inverse problem in structural biology. We find that posterior sampling with diffusion model priors allows for 3D reconstruction from very sparse, low-resolution and partial observations.
We establish a general convergence theory of the Rayleigh--Ritz method and the refined Rayleigh--Ritz method for computing some simple eigenpair $(\lambda_{*},x_{*})$ of a given analytic regular nonlinear eigenvalue problem (NEP). In terms of the deviation $\varepsilon$ of $x_{*}$ from a given subspace $\mathcal{W}$, we establish a priori convergence results on the Ritz value, the Ritz vector and the refined Ritz vector. The results show that, as $\varepsilon\rightarrow 0$, there exists a Ritz value that unconditionally converges to $\lambda_*$ and the corresponding refined Ritz vector does so too but the Ritz vector converges conditionally and it may fail to converge and even may not be unique. We also present an error bound for the approximate eigenvector in terms of the computable residual norm of a given approximate eigenpair, and give lower and upper bounds for the error of the refined Ritz vector and the Ritz vector as well as for that of the corresponding residual norms. These results nontrivially extend some convergence results on these two methods for the linear eigenvalue problem to the NEP. Examples are constructed to illustrate the main results.
Preconditioned eigenvalue solvers offer the possibility to incorporate preconditioners for the solution of large-scale eigenvalue problems, as they arise from the discretization of partial differential equations. The convergence analysis of such methods is intricate. Even for the relatively simple preconditioned inverse iteration (PINVIT), which targets the smallest eigenvalue of a symmetric positive definite matrix, the celebrated analysis by Neymeyr is highly nontrivial and only yields convergence if the starting vector is fairly close to the desired eigenvector. In this work, we prove a new non-asymptotic convergence result for a variant of PINVIT. Our proof proceeds by analyzing an equivalent Riemannian steepest descent method and leveraging convexity-like properties. We show a convergence rate that nearly matches the one of PINVIT. As a major benefit, we require a condition on the starting vector that tends to be less stringent. This improved global convergence property is demonstrated for two classes of preconditioners with theoretical bounds and a range of numerical experiments.
We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.
The notion of a non-deterministic logical matrix (where connectives are interpreted as multi-functions) extends the traditional semantics for propositional logics based on logical matrices (where connectives are interpreted as functions). This extension allows for finitely characterizing a much wider class of logics, and has proven decisive in a myriad of recent compositionality results. In this paper we show that the added expressivity brought by non-determinism also has its drawbacks, and in particular that the problem of determining whether two given finite non-deterministic matrices are equivalent, in the sense that they induce the same logic, becomes undecidable. We also discuss some workable sufficient conditions and particular cases, namely regarding rexpansion homomorphisms and bridges to calculi.
Not accounting for competing events in survival analysis can lead to biased estimates, as individuals who die from other causes do not have the opportunity to develop the event of interest. Formal definitions and considerations for causal effects in the presence of competing risks have been published, but not for the mediation analysis setting. We propose, for the first time, an approach based on the path-specific effects framework to account for competing risks in longitudinal mediation analysis with time-to-event outcomes. We do so by considering the pathway through the competing event as another mediator, which is nested within our longitudinal mediator of interest. We provide a theoretical formulation and related definitions of the effects of interest based on the mediational g-formula, as well as a detailed description of the algorithm. We also present an application of our algorithm to data from the Strong Heart Study, a prospective cohort of American Indian adults. In this application, we evaluated the mediating role of the blood pressure trajectory (measured during three visits) on the association between arsenic and cadmium, in separate models, with time to cardiovascular disease, accounting for competing risks by death. Identifying the effects through different paths enables us to evaluate the impact of metals on the outcome of interest, as well as through competing risks, more transparently.
This paper addresses the problem of adaptively controlling the bias parameter in nonlinear opinion dynamics (NOD) to allocate agents into groups of arbitrary sizes for the purpose of maximizing collective rewards. In previous work, an algorithm based on the coupling of NOD with an multi-objective behavior optimization was successfully deployed as part of a multi-robot system in an autonomous task allocation field experiment. Motivated by the field results, in this paper we propose and analyze a new task allocation model that synthesizes NOD with an evolutionary game framework. We prove sufficient conditions under which it is possible to control the opinion state in the group to a desired allocation of agents between two tasks through an adaptive bias using decentralized feedback. We then verify the theoretical results with a simulation study of a collaborative evolutionary division of labor game.
We prove, for stably computably enumerable formal systems, direct analogues of the first and second incompleteness theorems of G\"odel. A typical stably computably enumerable set is the set of Diophantine equations with no integer solutions, and in particular such sets are generally not computably enumerable. And so this gives the first extension of the second incompleteness theorem to non classically computable formal systems. Let's motivate this with a somewhat physical application. Let $\mathcal{H} $ be the suitable infinite time limit (stabilization in the sense of the paper) of the mathematical output of humanity, specializing to first order sentences in the language of arithmetic (for simplicity), and understood as a formal system. Suppose that all the relevant physical processes in the formation of $\mathcal{H} $ are Turing computable. Then as defined $\mathcal{H} $ may \emph{not} be computably enumerable, but it is stably computably enumerable. Thus, the classical G\"odel disjunction applied to $\mathcal{H} $ is meaningless, but applying our incompleteness theorems to $\mathcal{H} $ we then get a sharper version of G\"odel's disjunction: assume $\mathcal{H} \vdash PA$ then either $\mathcal{H} $ is not stably computably enumerable or $\mathcal{H} $ is not 1-consistent (in particular is not sound) or $\mathcal{H} $ cannot prove a certain true statement of arithmetic (and cannot disprove it if in addition $\mathcal{H} $ is 2-consistent).