In this paper, the local convergence analysis of the multi-step seventh order method is presented for solving nonlinear equations assuming that the first-order Fr\'echet derivative belongs to the Lipschitz class. The significance of our work is that it avoids the standard practice of Taylor expansion thereby, extends the applicability of the scheme by applying the technique based on the first-order derivative only. Also, this study provides radii of balls of convergence, the error bounds in terms of distances in addition to the uniqueness of the solution. Furthermore, generalization of this analysis satisfying H\"{o}lder continuity condition is provided since it is more relaxed than Lipschitz continuity condition. We have considered some numerical examples and computed the radii of the convergence balls.
The present paper continues our investigation of an implementation of a least-squares collocation method for higher-index differential-algebraic equations. In earlier papers, we were able to substantiate the choice of basis functions and collocation points for a robust implementation as well as algorithms for the solution of the discrete system. The present paper is devoted to an analytic estimation of condition numbers for different components of an implementation. We present error estimations, which show the sources for the different errors.
Physical systems are usually modeled by differential equations, but solving these differential equations analytically is often intractable. Instead, the differential equations can be solved numerically by discretization in a finite computational domain. The discretized equation is reduced to a large linear system, whose solution is typically found using an iterative solver. We start with an initial guess, x_0, and iterate the algorithm to obtain a sequence of solution vectors, x_m. The iterative algorithm is said to converge to solution $x$ if and only if x_m converges to $x$. Accuracy of the numerical solutions is important, especially in the design of safety critical systems such as airplanes, cars, or nuclear power plants. It is therefore important to formally guarantee that the iterative solvers converge to the "true" solution of the original differential equation. In this paper, we first formalize the necessary and sufficient conditions for iterative convergence in the Coq proof assistant. We then extend this result to two classical iterative methods: Gauss-Seidel iteration and Jacobi iteration. We formalize conditions for the convergence of the Gauss--Seidel classical iterative method, based on positive definiteness of the iterative matrix. We then formally state conditions for convergence of Jacobi iteration and instantiate it with an example to demonstrate convergence of iterative solutions to the direct solution of the linear system. We leverage recent developments of the Coq linear algebra and mathcomp library for our formalization.
We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm operating in the realizable setting for which we prove matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.
In this paper, we analyze the convergence %semi-convergence properties of projected non-stationary block iterative methods (P-BIM) aiming to find a constrained solution to large linear, usually both noisy and ill-conditioned, systems of equations. We split the error of the $k$th iterate into noise error and iteration error, and consider each error separately. The iteration error is treated for a more general algorithm, also suited for solving split feasibility problems in Hilbert space. The results for P-BIM come out as a special case. The algorithmic step involves projecting onto closed convex sets. When these sets are polyhedral, and of finite dimension, it is shown that the algorithm converges linearly. We further derive an upper bound for the noise error of P-BIM. Based on this bound, we suggest a new strategy for choosing relaxation parameters, which assist in speeding up the reconstruction process and improving the quality of obtained images. The relaxation parameters may depend on the noise. The performance of the suggested strategy is shown by examples taken from the field of image reconstruction from projections.
While the class of Polynomial Nets demonstrates comparable performance to neural networks (NN), it currently has neither theoretical generalization characterization nor robustness guarantees. To this end, we derive new complexity bounds for the set of Coupled CP-Decomposition (CCP) and Nested Coupled CP-decomposition (NCP) models of Polynomial Nets in terms of the $\ell_\infty$-operator-norm and the $\ell_2$-operator norm. In addition, we derive bounds on the Lipschitz constant for both models to establish a theoretical certificate for their robustness. The theoretical results enable us to propose a principled regularization scheme that we also evaluate experimentally in six datasets and show that it improves the accuracy as well as the robustness of the models to adversarial perturbations. We showcase how this regularization can be combined with adversarial training, resulting in further improvements.
Unevenly spaced samples from a periodic function are common in signal processing and can often be viewed as a perturbed equally spaced grid. In this paper, we analyze how the uneven distribution of the samples impacts the quality of interpolation and quadrature. Starting with equally spaced nodes on $[-\pi,\pi)$ with grid spacing $h$, suppose the unevenly spaced nodes are obtained by perturbing each uniform node by an arbitrary amount $\leq \alpha h$, where $0 \leq \alpha < 1/2$ is a fixed constant. We prove a discrete version of the Kadec-1/4 theorem, which states that the nonuniform discrete Fourier transform associated with perturbed nodes has a bounded condition number independent of $h$, for any $\alpha < 1/4$. We go on to show that unevenly spaced quadrature rules converge for all continuous functions and interpolants converge uniformly for all differentiable functions whose derivative has bounded variation when $0 \leq \alpha < 1/4$. Though, quadrature rules at perturbed nodes can have negative weights for any $\alpha > 0$, we provide a bound on the absolute sum of the quadrature weights. Therefore, we show that perturbed equally spaced grids with small $\alpha$ can be used without numerical woes. While our proof techniques work primarily when $0 \leq \alpha < 1/4$, we show that a small amount of oversampling extends our results to the case when $1/4 \leq \alpha < 1/2$.
We consider an homogeneous ideal $I$ in the polynomial ring $S=K[x_1,\dots,$ $x_m]$ over a finite field $K=\mathbb{F}_q$ and the finite set of projective rational points $\mathbb{X}$ that it defines in the projective space $\mathbb{P}^{m-1}$. We concern ourselves with the problem of computing the vanishing ideal $I(\mathbb{X})$. This is usually done by adding the equations of the projective space $I(\mathbb{P}^{m-1})$ to $I$ and computing the radical. We give an alternative and more efficient way using the saturation with respect to the homogeneous maximal ideal.
We propose a projection-free conditional gradient-type algorithm for smooth stochastic multi-level composition optimization, where the objective function is a nested composition of $T$ functions and the constraint set is a closed convex set. Our algorithm assumes access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle satisfying certain standard unbiasedness and second moment assumptions. We show that the number of calls to the stochastic first-order oracle and the linear-minimization oracle required by the proposed algorithm, to obtain an $\epsilon$-stationary solution, are of order $\mathcal{O}_T(\epsilon^{-2})$ and $\mathcal{O}_T(\epsilon^{-3})$ respectively, where $\mathcal{O}_T$ hides constants in $T$. Notably, the dependence of these complexity bounds on $\epsilon$ and $T$ are separate in the sense that changing one does not impact the dependence of the bounds on the other. Moreover, our algorithm is parameter-free and does not require any (increasing) order of mini-batches to converge unlike the common practice in the analysis of stochastic conditional gradient-type algorithms.
We examine global non-asymptotic convergence properties of policy gradient methods for multi-agent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/\epsilon^5)$ sample complexity bound in a potentially infinitely large state space for a sample-based algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoys convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.
Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.