We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings (strongly log-concave, log-concave, Logarithmic-Sobolev inequality (LSI), Poincar\'e inequality) as well as more general settings with semi-smooth or composite potentials. Our algorithm is based on the proximal sampler introduced in~\citet{lee2021structured}. The performance of this proximal sampler is determined by that of the restricted Gaussian oracle (RGO), a key step in the proximal sampler. The main contribution of this work is an inexact realization of RGO based on approximate rejection sampling. To bound the inexactness of RGO, we establish a new concentration inequality for semi-smooth functions over Gaussian distributions, extending the well-known concentration inequality for Lipschitz functions. Applying our RGO implementation to the proximal sampler, we achieve state-of-the-art complexity bounds in almost all settings. For instance, for strongly log-concave distributions, our method has complexity bound $\tilde\mathcal{O}(\kappa d^{1/2})$ without warm start, better than the minimax bound for MALA. For distributions satisfying the LSI, our bound is $\tilde \mathcal{O}(\hat \kappa d^{1/2})$ where $\hat \kappa$ is the ratio between smoothness and the LSI constant, better than all existing bounds.
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
Ghost, or fictitious points allow to capture boundary conditions that are not located on the finite difference grid discretization. We explore in this paper the impact of ghost points on the stability of the explicit Euler finite difference scheme in the context of the diffusion equation. In particular, we consider the case of a one-touch option under the Black-Scholes model. The observations and results are however valid for a much wider range of financial contracts and models.
We consider a general linear parabolic problem with extended time boundary conditions (including initial value problems and periodic ones), and approximate it by the implicit Euler scheme in time and the Gradient Discretisation method in space; the latter is in fact a class of methods that includes conforming and nonconforming finite elements, discontinuous Galerkin methods and several others. The main result is an error estimate which holds without supplementary regularity hypothesis on the solution. This result states that the approximation error has the same order as the sum of the interpolation error and the conformity error. The proof of this result relies on an inf-sup inequality in Hilbert spaces which can be used both in the continuous and the discrete frameworks. The error estimate result is illustrated by numerical examples with low regularity of the solution.
The distributed task allocation problem, as one of the most interesting distributed optimization challenges, has received considerable research attention recently. Previous works mainly focused on the task allocation problem in a population of individuals, where there are no constraints for affording task amounts. The latter condition, however, cannot always be hold. In this paper, we study the task allocation problem with constraints of task allocation in a game-theoretical framework. We assume that each individual can afford different amounts of task and the cost function is convex. To investigate the problem in the framework of population games, we construct a potential game and calculate the fitness function for each individual. We prove that when the Nash equilibrium point in the potential game is in the feasible solutions for the limited task allocation problem, the Nash equilibrium point is the unique globally optimal solution. Otherwise, we also derive analytically the unique globally optimal solution. In addition, in order to confirm our theoretical results, we consider the exponential and quadratic forms of cost function for each agent. Two algorithms with the mentioned representative cost functions are proposed to numerically seek the optimal solution to the limited task problems. We further perform Monte Carlo simulations which provide agreeing results with our analytical calculations.
Graph algorithms play an important role in many computer science areas. In order to solve problems that can be modeled using graphs, it is necessary to use a data structure that can represent those graphs in an efficient manner. On top of this, an infrastructure should be build that will assist in implementing common algorithms or developing specialized ones. Here, a new Java library is introduced, called Graph4J, that uses a different approach when compared to existing, well-known Java libraries such as JGraphT, JUNG and Guava Graph. Instead of using object-oriented data structures for graph representation, a lower-level model based on arrays of primitive values is utilized, that drastically reduces the required memory and the running times of the algorithm implementations. The design of the library, the space complexity of the graph structures and the time complexity of the most common graph operations are presented in detail, along with an experimental study that evaluates its performance, when compared to the other libraries. Emphasis is given to infrastructure related aspects, that is graph creation, inspection, alteration and traversal. The improvements obtained for other implemented algorithms are also analyzed and it is shown that the proposed library significantly outperforms the existing ones.
A standard approach to solve ordinary differential equations, when they describe dynamical systems, is to adopt a Runge-Kutta or related scheme. Such schemes, however, are not applicable to the large class of equations which do not constitute dynamical systems. In several physical systems, we encounter integro-differential equations with memory terms where the time derivative of a state variable at a given time depends on all past states of the system. Secondly, there are equations whose solutions do not have well-defined Taylor series expansion. The Maxey-Riley-Gatignol equation, which describes the dynamics of an inertial particle in nonuniform and unsteady flow, displays both challenges. We use it as a test bed to address the questions we raise, but our method may be applied to all equations of this class. We show that the Maxey-Riley-Gatignol equation can be embedded into an extended Markovian system which is constructed by introducing a new dynamical co-evolving state variable that encodes memory of past states. We develop a Runge-Kutta algorithm for the resultant Markovian system. The form of the kernels involved in deriving the Runge-Kutta scheme necessitates the use of an expansion in powers of $t^{1/2}$. Our approach naturally inherits the benefits of standard time-integrators, namely a constant memory storage cost, a linear growth of operational effort with simulation time, and the ability to restart a simulation with the final state as the new initial condition.
In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but a proper variance reduced version is missing. In this work, we propose the first study of variance reduction techniques for stochastic proximal point algorithms. We introduce a stochastic proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-{\L}ojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the step size.
This study presents a novel high-order numerical method designed for solving the two-dimensional time-fractional convection-diffusion (TFCD) equation. The Caputo definition is employed to characterize the time-fractional derivative. A weak singularity at the initial time ($t=0$) is encountered in the considered problem, which is effectively managed by adopting a discretization approach for the time-fractional derivative, where Alikhanov's high-order L2-1$_\sigma$ formula is applied on a non-uniform fitted mesh, resulting in successful tackling of the singularity. A high-order two-dimensional compact operator is implemented to approximate the spatial variables. The alternating direction implicit (ADI) approach is then employed to solve the resulting system of equations by decomposing the two-dimensional problem into two separate one-dimensional problems. The theoretical analysis, encompassing both stability and convergence aspects, has been conducted comprehensively, and it has shown that method is convergent with an order $\mathcal O\left(N_t^{-\min\{3-\alpha,\theta\alpha,1+2\alpha,2+\alpha\}}+h_x^4+h_y^4\right)$, where $\alpha\in(0,1)$ represents the order of the fractional derivative, $N_t$ is the temporal discretization parameter and $h_x$ and $h_y$ represent spatial mesh widths. Moreover, the parameter $\theta$ is utilized in the construction of the fitted mesh.
Interpreting a seemingly-simple function word like "or", "behind", or "more" can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network based visual question answering models apparently can learn to use function words as part of answering questions about complex visual scenes. In this paper, we study what these models learn about function words, in the hope of better understanding how the meanings of these words can be learnt by both models and children. We show that recurrent models trained on visually grounded language learn gradient semantics for function words requiring spacial and numerical reasoning. Furthermore, we find that these models can learn the meanings of logical connectives "and" and "or" without any prior knowledge of logical reasoning, as well as early evidence that they can develop the ability to reason about alternative expressions when interpreting language. Finally, we show that word learning difficulty is dependent on frequency in models' input. Our findings offer evidence that it is possible to learn the meanings of function words in visually grounded context by using non-symbolic general statistical learning algorithms, without any prior knowledge of linguistic meaning.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.