The work is devoted to the construction of a new type of intervals -- functional intervals. These intervals are built on the idea of expanding boundaries from numbers to functions. Functional intervals have shown themselves to be promising for further study and use, since they have more rich algebraic properties compared to classical intervals lamy. In the work, linear functional arithmetic was constructed from one variable. This arithmetic was applied to solve such problems of interval analysis, as minimization of a function on an interval and finding zeros of a function on an interval. Results of numerical experiments for linear functional arithmetic showed a high order of convergence and a higher speed the growth of algorithms when using intervals of a new type, despite the fact that the calculations did not use information about derivative function. Also in the work, a modification of the minimization algorithms functions of several variables, based on the use of the function rational intervals of several variables. As a result, it was Improved speedup of algorithms, but only up to a certain number of unknowns.
In this paper, we consider a fully-discrete approximation of an abstract evolution equation deploying a non-conforming spatial approximation and finite differences in time (Rothe-Galerkin method). The main result is the convergence of the discrete solutions to a weak solution of the continuous problem. Therefore, the result can be interpreted either as a justification of the numerical method or as an alternative way of constructing weak solutions. We formulate the problem in the very general and abstract setting of so-called non-conforming Bochner pseudo-monotone operators, which allows for a unified treatment of several evolution problems. Our abstract results for non-conforming Bochner pseudo-monotone operators allow to establish (weak) convergence just by verifying a few natural assumptions on the operators time-by-time and on the discretization spaces. Hence, applications and extensions to several other evolution problems can be performed easily. We exemplify the applicability of our approach on several DG schemes for the unsteady $p$-Navier-Stokes problem. The results of some numerical experiments are reported in the final section.
For solving a broad class of nonconvex programming problems on an unbounded constraint set, we provide a self-adaptive step-size strategy that does not include line-search techniques and establishes the convergence of a generic approach under mild assumptions. Specifically, the objective function may not satisfy the convexity condition. Unlike descent line-search algorithms, it does not need a known Lipschitz constant to figure out how big the first step should be. The crucial feature of this process is the steady reduction of the step size until a certain condition is fulfilled. In particular, it can provide a new gradient projection approach to optimization problems with an unbounded constrained set. The correctness of the proposed method is verified by preliminary results from some computational examples. To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning, such as supervised feature selection, multi-variable logistic regressions and neural networks for classification.
Independent component analysis (ICA) is a blind source separation method to recover source signals of interest from their mixtures. Most existing ICA procedures assume independent sampling. Second-order-statistics-based source separation methods have been developed based on parametric time series models for the mixtures from the autocorrelated sources. However, the second-order-statistics-based methods cannot separate the sources accurately when the sources have temporal autocorrelations with mixed spectra. To address this issue, we propose a new ICA method by estimating spectral density functions and line spectra of the source signals using cubic splines and indicator functions, respectively. The mixed spectra and the mixing matrix are estimated by maximizing the Whittle likelihood function. We illustrate the performance of the proposed method through simulation experiments and an EEG data application. The numerical results indicate that our approach outperforms existing ICA methods, including SOBI algorithms. In addition, we investigate the asymptotic behavior of the proposed method.
In this paper, we revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation, which arises in signal and image processing. It is shown in the numerical experiment to deblur an image that the convergence behavior in the logarithmic-scale ordinate tends to be linear instead of logarithmic, approximating to be flat. Making meticulous observations, we find that the previous assumption for the smooth part to be convex weakens the least-square model. Specifically, assuming the smooth part to be strongly convex is more reasonable for the least-square model, even though the image matrix is probably ill-conditioned. Furthermore, we improve the pivotal inequality tighter for composite optimization with the smooth part to be strongly convex instead of general convex, which is first found in [Li et al., 2022]. Based on this pivotal inequality, we generalize the linear convergence to composite optimization in both the objective value and the squared proximal subgradient norm. Meanwhile, we set a simple ill-conditioned matrix which is easy to compute the singular values instead of the original blur matrix. The new numerical experiment shows the proximal generalization of Nesterov's accelerated gradient descent (NAG) for the strongly convex function has a faster linear convergence rate than ISTA. Based on the tighter pivotal inequality, we also generalize the faster linear convergence rate to composite optimization, in both the objective value and the squared proximal subgradient norm, by taking advantage of the well-constructed Lyapunov function with a slight modification and the phase-space representation based on the high-resolution differential equation framework from the implicit-velocity scheme.
A graph $G$ is called self-ordered (a.k.a asymmetric) if the identity permutation is its only automorphism. Equivalently, there is a unique isomorphism from $G$ to any graph that is isomorphic to $G$. We say that $G=(V,E)$ is robustly self-ordered if the size of the symmetric difference between $E$ and the edge-set of the graph obtained by permuting $V$ using any permutation $\pi:V\to V$ is proportional to the number of non-fixed-points of $\pi$. In this work, we initiate the study of the structure, construction and utility of robustly self-ordered graphs. We show that robustly self-ordered bounded-degree graphs exist (in abundance), and that they can be constructed efficiently, in a strong sense. Specifically, given the index of a vertex in such a graph, it is possible to find all its neighbors in polynomial-time (i.e., in time that is poly-logarithmic in the size of the graph). We also consider graphs of unbounded degree, seeking correspondingly unbounded robustness parameters. We again demonstrate that such graphs (of linear degree) exist (in abundance), and that they can be constructed efficiently, in a strong sense. This turns out to require very different tools. Specifically, we show that the construction of such graphs reduces to the construction of non-malleable two-source extractors (with very weak parameters but with some additional natural features). We demonstrate that robustly self-ordered bounded-degree graphs are useful towards obtaining lower bounds on the query complexity of testing graph properties both in the bounded-degree and the dense graph models. One of the results that we obtain, via such a reduction, is a subexponential separation between the query complexities of testing and tolerant testing of graph properties in the bounded-degree graph model.
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms. It was not successfully uncovered until the high-resolution differential equation framework was proposed in [Shi et al., 2022] that the mechanism behind the acceleration phenomenon is due to the gradient correction term. To deepen our understanding of the high-resolution differential equation framework on the convergence rate, we continue to investigate NAG for the $\mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation in this paper. First, we revisit the proof from the gradient-correction scheme. Similar to [Chen et al., 2022], the straightforward calculation simplifies the proof extremely and enlarges the step size to $s=1/L$ with minor modification. Meanwhile, the way of constructing Lyapunov functions is principled. Furthermore, we also investigate NAG from the implicit-velocity scheme. Due to the difference in the velocity iterates, we find that the Lyapunov function is constructed from the implicit-velocity scheme without the additional term and the calculation of iterative difference becomes simpler. Together with the optimal step size obtained, the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
This paper develops a two-stage stochastic model to investigate evolution of random fields on the unit sphere $\bS^2$ in $\R^3$. The model is defined by a time-fractional stochastic diffusion equation on $\bS^2$ governed by a diffusion operator with the time-fractional derivative defined in the Riemann-Liouville sense. In the first stage, the model is characterized by a homogeneous problem with an isotropic Gaussian random field on $\bS^2$ as an initial condition. In the second stage, the model becomes an inhomogeneous problem driven by a time-delayed Brownian motion on $\bS^2$. The solution to the model is given in the form of an expansion in terms of complex spherical harmonics. An approximation to the solution is given by truncating the expansion of the solution at degree $L\geq1$. The rate of convergence of the truncation errors as a function of $L$ and the mean square errors as a function of time are also derived. It is shown that the convergence rates depend not only on the decay of the angular power spectrum of the driving noise and the initial condition, but also on the order of the fractional derivative. We study sample properties of the stochastic solution and show that the solution is an isotropic H\"{o}lder continuous random field. Numerical examples and simulations inspired by the cosmic microwave background (CMB) are given to illustrate the theoretical findings.
Patankar schemes have attracted increasing interest in recent years because they preserve the positivity of the analytical solution of a production-destruction system (PDS) irrespective of the chosen time step size. Although they are now of great interest, for a long time it was not clear what stability properties such schemes have. Recently a new stability approach based on Lyapunov stability with an extension of the center manifold theorem has been proposed to study the stability properties of positivity preserving time integrators. In this work, we study the stability properties of the classical modified Patankar--Runge--Kutta schemes (MPRK) and the modified Patankar Deferred Correction (MPDeC) approaches. We prove that most of the considered MPRK schemes are stable for any time step size and compute the stability function of MPDeC. We investigate its properties numerically revealing that also most MPDeC are stable irrespective of the chosen time step size. Finally, we verify our theoretical results with numerical simulations.
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index -- a graph-theoretical characteristic that we define by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and markedly outperforms alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.