We develop a globalized Proximal Newton method for composite and possibly non-convex minimization problems in Hilbert spaces. Additionally, we impose less restrictive assumptions on the composite objective functional considering differentiability and convexity than in existing theory. As far as differentiability of the smooth part of the objective function is concerned, we introduce the notion of second order semi-smoothness and discuss why it constitutes an adequate framework for our Proximal Newton method. However, both global convergence as well as local acceleration still pertain to hold in our scenario. Eventually, the convergence properties of our algorithm are displayed by solving a toy model problem in function space.
We prove complex contraction for zero-free regions of counting weighted set cover problem in which an element can appear in an unbounded number of sets, thus obtaining fully polynomial-time approximation schemes(FPTAS) via Barvinok's algorithmic paradigm\cite{barvinok2016combinatorics}. Relying on the computation tree expansion, our approach does not need proof of correlation decay in the real axis. We directly look in the complex plane for a region that contracts into its interior as the tree recursion procedure goes from leaves to the root. For the class of problems under the framework of weighted set covers, we are able to give a general approach for describing the contraction regions and draw a unified algorithmic conclusion. Several previous results, including counting (weighted-)edge covers, counting bipartite independent sets and counting monotone CNFs can be completely or partially covered by our main theorem. In contrast to the correlation decay method which also depends on tree expansions and needs different potential functions for different problems, our approach is more generic in the sense that our contraction region for different problems shares a common shape in the complex plane.
In this paper, discrete linear quadratic regulator (DLQR) and iterative linear quadratic regulator (ILQR) methods based on high-order Runge-Kutta (RK) discretization are proposed for solving linear and nonlinear quadratic optimal control problems respectively. As discovered in [W. Hager, Runge-Kutta method in optimal control and the discrete adjoint system, Numer. Math.,2000, pp. 247-282], direct approach with RK discretization is equivalent with indirect approach based on symplectic partitioned Runge-Kutta (SPRK) integration. In this paper, we will reconstruct this equivalence by the analogue of continuous and discrete dynamic programming. Then, based on the equivalence, we discuss the issue that the internal-stage controls produced by direct approach may have lower order accuracy than the RK method used. We propose order conditions for internal-stage controls and then demonstrate that third or fourth order explicit RK discretization cannot avoid the order reduction phenomenon. To overcome this obstacle, we calculate node control instead of internal-stage controls in DLQR and ILQR methods. And numerical examples will illustrate the validity of our methods. Another advantage of our methods is high computational efficiency which comes from the usage of feedback technique. In this paper, we also demonstrate that ILQR is essentially a quasi-Newton method with linear convergence rate.
A high-order finite element method is proposed to solve the nonlinear convection-diffusion equation on a time-varying domain whose boundary is implicitly driven by the solution of the equation. The method is semi-implicit in the sense that the boundary is traced explicitly with a high-order surface-tracking algorithm, while the convection-diffusion equation is solved implicitly with high-order backward differentiation formulas and fictitious-domain finite element methods. By two numerical experiments for severely deforming domains, we show that optimal convergence orders are obtained in energy norm for third-order and fourth-order methods.
The asymptotic decision theory by Le Cam and Hajek has been given a lucid perspective by the Ibragimov-Hasminskii theory on convergence of the likelihood random field. Their scheme has been applied to stochastic processes by Kutoyants, and today this plot is called the IHK program. This scheme ensures that asymptotic properties of an estimator follow directly from the convergence of the random field if a large deviation estimate exists. The quasi-likelihood analysis (QLA) proved a polynomial type large deviation (PLD) inequality to go through a bottleneck of the program. A conclusion of the QLA is that if the quasi-likelihood random field is asymptotically quadratic and if a key index reflecting identifiability the random field has is non-degenerate, then the PLD inequality is always valid, and as a result, the IHK program can run. Many studies already took advantage of the QLA theory. However, not a few of them are using it in an inefficient way yet. The aim of this paper is to provide a reformed and simplified version of the QLA and to improve accessibility to the theory. As an example of the effects of the theory based on the PLD, the user can obtain asymptotic properties of the quasi-Bayesian estimator by only verifying non-degeneracy of the key index.
In this paper, we study the problem of learning dynamical properties of ensemble systems from their collective behaviors using statistical approaches in reproducing kernel Hilbert space (RKHS). Specifically, we provide a framework to identify and cluster multiple ensemble systems through computing the maximum mean discrepancy (MMD) between their aggregated measurements in an RKHS, without any prior knowledge of the system dynamics of ensembles. Then, leveraging on a gradient flow of the newly proposed notion of aggregated Markov parameters, we present a systematic framework to recognize and identify an ensemble systems using their linear approximations. Finally, we demonstrate that the proposed approaches can be extended to cluster multiple unknown ensembles in RKHS using their aggregated measurements. Numerical experiments show that our approach is reliable and robust to ensembles with different types of system dynamics.
The Ensemble Kalman Filter (EnKF) belongs to the class of iterative particle filtering methods and can be used for solving control--to--observable inverse problems. In this context, the EnKF is known as Ensemble Kalman Inversion (EKI). In recent years several continuous limits in the number of iteration and particles have been performed in order to study properties of the method. In particular, a one--dimensional linear stability analysis reveals possible drawbacks in the phase space of moments provided by the continuous limits of the EKI, but observed also in the multi--dimensional setting. In this work we address this issue by introducing a stabilization of the dynamics which leads to a method with globally asymptotically stable solutions. We illustrate the performance of the stabilized version by using test inverse problems from the literature and comparing it with the classical continuous limit formulation of the method.
This paper is devoted to a new first order Taylor-like formula where the corresponding remainder is strongly reduced in comparison with the usual one which which appears in the classical Taylor's formula. To derive this new formula, we introduce a linear combination of the first derivatives of the concerned function which are computed at $n+1$ equally spaced points between the two points where the function has to be evaluated. Therefore, we show that an optimal choice of the weights of the linear combination leads to minimize the corresponding remainder. Then, we analyze the Lagrange $P_1$- interpolation error estimate and also the trapezoidal quadrature error to assess the gain of accuracy we get due to this new Taylor-like formula.
Aiming to recover the data from several concurrent node failures, linear $r$-LRC codes with locality $r$ were extended into $(r, \delta)$-LRC codes with locality $(r, \delta)$ which can enable the local recovery of a failed node in case of more than one node failure. Optimal LRC codes are those whose parameters achieve the generalized Singleton bound with equality. In the present paper, we are interested in studying optimal LRC codes over small fields and, more precisely, over $\mathbb{F}_4$. We shall adopt an approach by investigating optimal quaternary $(r,\delta)$-LRC codes through their parity-check matrices. Our study includes determining the structural properties of optimal $(r,\delta)$-LRC codes, their constructions, and their complete classification over $\F_4$ by browsing all possible parameters. We emphasize that the precise structure of optimal quaternary $(r,\delta)$-LRC codes and their classification are obtained via the parity-check matrix approach use proofs-techniques different from those used recently for optimal binary and ternary $(r,\delta)$-LRC codes obtained by Hao et al. in [IEEE Trans. Inf. Theory, 2020, 66(12): 7465-7474].
Counterfactual explanations are usually generated through heuristics that are sensitive to the search's initial conditions. The absence of guarantees of performance and robustness hinders trustworthiness. In this paper, we take a disciplined approach towards counterfactual explanations for tree ensembles. We advocate for a model-based search aiming at "optimal" explanations and propose efficient mixed-integer programming approaches. We show that isolation forests can be modeled within our framework to focus the search on plausible explanations with a low outlier score. We provide comprehensive coverage of additional constraints that model important objectives, heterogeneous data types, structural constraints on the feature space, along with resource and actionability restrictions. Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. It scales up to large data sets and tree ensembles, where it provides, within seconds, systematic explanations grounded on well-defined models solved to optimality.
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. As the core component, self-attention module has aroused widespread interests. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism and some common patterns are observed in visualization. Based on these patterns, a series of efficient transformers are proposed with corresponding sparse attention masks. Besides above empirical results, universal approximability of Transformer-based models is also discovered from a theoretical perspective. However, above understanding and analysis of self-attention is based on a pre-trained model. To rethink the importance analysis in self-attention, we delve into dynamics of attention matrix importance during pre-training. One of surprising results is that the diagonal elements in the attention map are the most unimportant compared with other attention positions and we also provide a proof to show these elements can be removed without damaging the model performance. Furthermore, we propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design further. The extensive experiments verify our interesting findings and illustrate the effect of our proposed algorithm.