In the analysis of stochastic dynamical systems described by stochastic differential equations (SDEs), it is often of interest to analyse the sensitivity of the expected value of a functional of the solution of the SDE with respect to perturbations in the SDE parameters. In this paper, we consider path functionals that depend on the solution of the SDE up to a stopping time. We derive formulas for Fr\'{e}chet derivatives of the expected values of these functionals with respect to bounded perturbations of the drift, using the Cameron-Martin-Girsanov theorem for the change of measure. Using these derivatives, we construct an example to show that the map that sends the change of drift to the corresponding relative entropy is not in general convex. We then analyse the existence and uniqueness of solutions to stochastic optimal control problems defined on possibly random time intervals, as well as gradient-based numerical methods for solving such problems.
We continue the investigation on the spectrum of operators arising from the discretization of partial differential equations. In this paper we consider a three field formulation recently introduced for the finite element least-squares approximation of linear elasticity. We discuss in particular the distribution of the discrete eigenvalues in the complex plane and how they approximate the positive real eigenvalues of the continuous problem. The dependence of the spectrum on the Lam\'e parameters is considered as well and its behavior when approaching the incompressible limit.
Manifold-valued functional data analysis (FDA) recently becomes an active area of research motivated by the raising availability of trajectories or longitudinal data observed on non-linear manifolds. The challenges of analyzing such data come from many aspects, including infinite dimensionality and nonlinearity, as well as time-domain or phase variability. In this paper, we study the amplitude part of manifold-valued functions on $\mathbb{S}^2$, which is invariant to random time warping or re-parameterization. Utilizing the nice geometry of $\mathbb{S}^2$, we develop a set of efficient and accurate tools for temporal alignment of functions, geodesic computing, and sample mean calculation. At the heart of these tools, they rely on gradient descent algorithms with carefully derived gradients. We show the advantages of these newly developed tools over its competitors with extensive simulations and real data and demonstrate the importance of considering the amplitude part of functions instead of mixing it with phase variability in manifold-valued FDA.
In distributed differential privacy, the parties perform analysis over their joint data while preserving the privacy for both datasets. Interestingly, for a few fundamental two-party functions such as inner product and Hamming distance, the accuracy of the distributed solution lags way behind what is achievable in the client-server setting. McGregor, Mironov, Pitassi, Reingold, Talwar, and Vadhan [FOCS '10] proved that this gap is inherent, showing upper bounds on the accuracy of (any) distributed solution for these functions. These limitations can be bypassed when settling for computational differential privacy, where the data is differentially private only in the eyes of a computationally bounded observer, using public-key cryptography primitives. We prove that the use of public-key cryptography is necessary for bypassing the limitation of McGregor et al., showing that a non-trivial solution for the inner-product, or the Hamming distance, implies the existence of a key-agreement protocol. Our bound implies a combinatorial proof for the fact that non-Boolean inner product of independent (strong) Santha-Vazirani sources is a good condenser. We obtain our main result by showing that the inner-product of a (single, strong) SV source with a uniformly random seed is a good condenser, even when the seed and source are dependent.
In this work, we adapt the {\em micro-macro} methodology to stochastic differential equations for the purpose of numerically solving oscillatory evolution equations. The models we consider are addressed in a wide spectrum of regimes where oscillations may be slow or fast. We show that through an ad-hoc transformation (the micro-macro decomposition), it is possible to retain the usual orders of convergence of Euler-Maruyama method, that is to say, uniform weak order one and uniform strong order one half.
We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.
We provide numerical procedures for possibly best evaluating the sum of positive series. Our procedures are based on the application of a generalized version of Kummer's test.
We show posterior convergence for the community structure in the planted bi-section model, for several interesting priors. Examples include where the label on each vertex is iid Bernoulli distributed, with some parameter $r\in(0,1)$. The parameter $r$ may be fixed, or equipped with a beta distribution. We do not have constraints on the class sizes, which might be as small as zero, or include all vertices, and everything in between. This enables us to test between a uniform (Erd\"os-R\'enyi) random graph with no distinguishable community or the planted bi-section model. The exact bounds for posterior convergence enable us to convert credible sets into confidence sets. Symmetric testing with posterior odds is shown to be consistent.
We study the ability of neural networks to steer or control trajectories of continuous time non-linear dynamical systems on graphs, which we represent with neural ordinary differential equations (neural ODEs). To do so, we introduce a neural-ODE control (NODEC) framework and find that it can learn control signals that drive graph dynamical systems into desired target states. While we use loss functions that do not constrain the control energy, our results show that NODEC produces low energy control signals. Finally, we showcase the performance and versatility of NODEC by using it to control a system of more than one thousand coupled, non-linear ODEs.
We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes ("Standard" and "NTK"). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.