In this paper, a class of statistics based on high frequency observations of oscillating and skew Brownian motions is considered. Their convergence rate towards the local time of the underlying process is obtained in form of a Central Limit Theorem. Oscillating and skew Brownian motions are solutions to stochastic differential equations with singular coefficients: piecewise constant diffusion coefficient or drift involving the local time. The result is applied to provide estimators of the parameter of skew Brownian motion and study their asymptotic behavior. Moreover, in the case of the classical statistic given by the normalized number of crossings, the result is proved to hold for a larger class of It\^o-processes with singular coefficients.
We consider pessimistic bilevel stochastic programs in which the follower maximizes over a fixed compact convex set a strictly convex quadratic function, whose Hessian depends on the leader's decision. The resulting random variable is evaluated by a convex risk measure. Under assumptions including real analyticity of the lower-level goal function, we prove existence of optimal solutions. We discuss an alternate model where the leader hedges against optimal lower-level solutions, and show that in this case solvability can be guaranteed under weaker conditions both in a deterministic and in a stochastic setting. The approach is applied to a mechanical shape optimization problem in which the leader decides on an optimal material distribution to minimize a tracking-type cost functional, whereas the follower chooses forces from an admissible set to maximize a compliance objective. The material distribution is considered to be stochastically perturbed in the actual construction phase. Computational results illustrate the bilevel optimization concept and demonstrate the interplay of follower and leader in shape design and testing.
We show that the nonstandard limiting distribution of HAR test statistics under fixed-b asymptotics is not pivotal (even after studentization) when the data are nonstationarity. It takes the form of a complicated function of Gaussian processes and depends on the integrated local long-run variance and on on the second moments of the relevant series (e.g., of the regressors and errors for the case of the linear regression model). Hence, existing fixed-b inference methods based on stationarity are not theoretically valid in general. The nuisance parameters entering the fixed-b limiting distribution can be consistently estimated under small-b asymptotics but only with nonparametric rate of convergence. Hence, We show that the error in rejection probability (ERP) is an order of magnitude larger than that under stationarity and is also larger than that of HAR tests based on HAC estimators under conventional asymptotics. These theoretical results reconcile with recent finite-sample evidence in Casini (2021) and Casini, Deng and Perron (2021) who showing that fixed-b HAR tests can perform poorly when the data are nonstationary. They can be conservative under the null hypothesis and have non-monotonic power under the alternative hypothesis irrespective of how large the sample size is.
We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.
Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.
In this work, we study the Neural Tangent Kernel (NTK) of Matrix Product States (MPS) and the convergence of its NTK in the infinite bond dimensional limit. We prove that the NTK of MPS asymptotically converges to a constant matrix during the gradient descent (training) process (and also the initialization phase) as the bond dimensions of MPS go to infinity by the observation that the variation of the tensors in MPS asymptotically goes to zero during training in the infinite limit. By showing the positive-definiteness of the NTK of MPS, the convergence of MPS during the training in the function space (space of functions represented by MPS) is guaranteed without any extra assumptions of the data set. We then consider the settings of (supervised) Regression with Mean Square Error (RMSE) and (unsupervised) Born Machines (BM) and analyze their dynamics in the infinite bond dimensional limit. The ordinary differential equations (ODEs) which describe the dynamics of the responses of MPS in the RMSE and BM are derived and solved in the closed-form. For the Regression, we consider Mercer Kernels (Gaussian Kernels) and find that the evolution of the mean of the responses of MPS follows the largest eigenvalue of the NTK. Due to the orthogonality of the kernel functions in BM, the evolution of different modes (samples) decouples and the "characteristic time" of convergence in training is obtained.
We consider a biochemical model that consists of a system of partial differential equations based on reaction terms and subject to non--homogeneous Dirichlet boundary conditions. The model is discretised using the gradient discretisation method (GDM) which is a framework covering a large class of conforming and non conforming schemes. Under classical regularity assumptions on the exact solutions, the GDM enables us to establish the existence of the model solutions in a weak sense, and strong convergence for the approximate solution and its approximate gradient. Numerical test employing a finite volume method is presented to demonstrate the behaviour of the solutions to the model.
Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback-Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST and CIFAR datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error.
In this paper, we consider a boundary value problem (BVP) for a fourth order nonlinear functional integro-differential equation. We establish the existence and uniqueness of solution and construct a numerical method for solving it. We prove that the method is of second order accuracy and obtain an estimate for the total error. Some examples demonstrate the validity of the obtained theoretical results and the efficiency of the numerical method.
The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.