This work considers Gaussian process interpolation with a periodized version of the Mat{\'e}rn covariance function (Stein, 1999, Section 6.7) with Fourier coefficients $\phi$($\alpha$^2 + j^2)^(--$\nu$--1/2). Convergence rates are studied for the joint maximum likelihood estimation of $\nu$ and $\phi$ when the data is sampled according to the model. The mean integrated squared error is also analyzed with fixed and estimated parameters, showing that maximum likelihood estimation yields asymptotically the same error as if the ground truth was known. Finally, the case where the observed function is a ''deterministic'' element of a continuous Sobolev space is also considered, suggesting that bounding assumptions on some parameters can lead to different estimates.
In this paper, we exploit a result in point process theory, knowing the expected value of the $K$-function weighted by the true first-order intensity function. This theoretical result can serve as an estimation method for obtaining the parameters estimates of a specific model, assumed for the data. The motivation is to generally avoid dealing with the complex likelihoods of some complex point processes models and their maximization. This can be more evident when considering the local second-order characteristics, since the proposed method can estimate the vector of the local parameters, one for each point of the analysed point pattern. We illustrate the method through simulation studies for both purely spatial and spatio-temporal point processes.
This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting. We show that 3P-SPIDER extends some Stochastic preconditioned Gradient Descent-based algorithms and some Incremental Expectation Maximization algorithms to composite optimization and to the case the forward operator can not be computed in closed form. We also provide an explicit control of convergence in expectation of 3P-SPIDER, and study its complexity in order to satisfy the epsilon-approximate stationary condition. Our results are the first to combine the composite non-convex optimization setting, a variance reduction technique to tackle the finite sum setting by using a minibatch strategy and, to allow deterministic or random approximations of the preconditioned forward operator. Finally, through an application to inference in a logistic regression model with random effects, we numerically compare 3P-SPIDER to other stochastic forward-backward algorithms and discuss the role of some design parameters of 3P-SPIDER.
Aligning a robot's trajectory or map to the inertial frame is a critical capability that is often difficult to do accurately even though inertial measurement units (IMUs) can observe absolute roll and pitch with respect to gravity. Accelerometer biases and scale factor errors from the IMU's initial calibration are often the major source of inaccuracies when aligning the robot's odometry frame with the inertial frame, especially for low-grade IMUs. Practically, one would simultaneously estimate the true gravity vector, accelerometer biases, and scale factor to improve measurement quality but these quantities are not observable unless the IMU is sufficiently excited. While several methods estimate accelerometer bias and gravity, they do not explicitly address the observability issue nor do they estimate scale factor. We present a fixed-lag factor-graph-based estimator to address both of these issues. In addition to estimating accelerometer scale factor, our method mitigates limited observability by optimizing over a time window an order of magnitude larger than existing methods with significantly lower computational burden. The proposed method, which estimates accelerometer intrinsics and gravity separately from the other states, is enabled by a novel, velocity-agnostic measurement model for intrinsics and gravity, as well as a new method for gravity vector optimization on S2. Accurate IMU state prediction, gravity-alignment, and roll/pitch drift correction are experimentally demonstrated on public and self-collected datasets in diverse environments.
This paper considers a multiple environments linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariate may vary across different environments, yet the conditional expectation of $y$ given the unknown set of important variables are invariant across environments. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel {\it environment invariant linear least squares (EILLS)} objective function, a multiple-environment version of linear least squares that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition. We establish non-asymptotic $\ell_2$ error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the EILLS estimator is able to eliminate all endogenous variables and the $\ell_0$ penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These non-asymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any prior structural knowledge.
This paper studies model checking for general parametric regression models with no dimension reduction structure on the high-dimensional vector of predictors. Using existing test as an initial test, this paper combines the sample-splitting technique and conditional studentization approach to construct a COnditionally Studentized Test(COST). Unlike existing tests, whether the initial test is global or local smoothing-based, and whether the dimension of the predictor vector and the number of parameters are fixed, or diverge at a certain rate as the sample size goes to infinity, the proposed test always has a normal weak limit under the null hypothesis. Further, the test can detect the local alternatives distinct from the null hypothesis at the fastest possible rate of convergence in hypothesis testing. We also discuss the optimal sample splitting in power performance. The numerical studies offer information on its merits and limitations in finite sample cases. As a generic methodology, it could be applied to other testing problems.
We consider box-constrained integer programs with objective $g(Wx) + c^T x$, where $g$ is a "complicated" function with an $m$ dimensional domain. Here we assume we have $n \gg m$ variables and that $W \in \mathbb Z^{m \times n}$ is an integer matrix with coefficients of absolute value at most $\Delta$. We design an algorithm for this problem using only the mild assumption that the objective can be optimized efficiently when all but $m$ variables are fixed, yielding a running time of $n^m(m \Delta)^{O(m^2)}$. Moreover, we can avoid the term $n^m$ in several special cases, in particular when $c = 0$. Our approach can be applied in a variety of settings, generalizing several recent results. An important application are convex objectives of low domain dimension, where we imply a recent result by Hunkenschr\"oder et al. [SIOPT'22] for the 0-1-hypercube and sharp or separable convex $g$, assuming $W$ is given explicitly. By avoiding the direct use of proximity results, which only holds when $g$ is separable or sharp, we match their running time and generalize it for arbitrary convex functions. In the case where the objective is only accessible by an oracle and $W$ is unknown, we further show that their proximity framework can be implemented in $n (m \Delta)^{O(m^2)}$-time instead of $n (m \Delta)^{O(m^3)}$. Lastly, we extend the result by Eisenbrand and Weismantel [SODA'17, TALG'20] for integer programs with few constraints to a mixed-integer linear program setting where integer variables appear in only a small number of different constraints.
Hyper-parameter optimization is one of the most tedious yet crucial steps in training machine learning models. There are numerous methods for this vital model-building stage, ranging from domain-specific manual tuning guidelines suggested by the oracles to the utilization of general-purpose black-box optimization techniques. This paper proposes an agent-based collaborative technique for finding near-optimal values for any arbitrary set of hyper-parameters (or decision variables) in a machine learning model (or general function optimization problem). The developed method forms a hierarchical agent-based architecture for the distribution of the searching operations at different dimensions and employs a cooperative searching procedure based on an adaptive width-based random sampling technique to locate the optima. The behavior of the presented model, specifically against the changes in its design parameters, is investigated in both machine learning and global function optimization applications, and its performance is compared with that of two randomized tuning strategies that are commonly used in practice. According to the empirical results, the proposed model outperformed the compared methods in the experimented classification, regression, and multi-dimensional function optimization tasks, notably in a higher number of dimensions and in the presence of limited on-device computational resources.
Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes.
We present the first calibration of quantum decision theory (QDT) to a dataset of binary risky choice. We quantitatively account for the fraction of choice reversals between two repetitions of the experiment, using a probabilistic choice formulation in the simplest form without model assumption or adjustable parameters. The prediction of choice reversal is then refined by introducing heterogeneity between decision makers through their differentiation into two groups: ``majoritarian'' and ``contrarian'' (in proportion 3:1). This supports the first fundamental tenet of QDT, which models choice as an inherent probabilistic process, where the probability of a prospect can be expressed as the sum of its utility and attraction factors. We propose to parameterise the utility factor with a stochastic version of cumulative prospect theory (logit-CPT), and the attraction factor with a constant absolute risk aversion (CARA) function. For this dataset, and penalising the larger number of QDT parameters via the Wilks test of nested hypotheses, the QDT model is found to perform significantly better than logit-CPT at both the aggregate and individual levels, and for all considered fit criteria for the first experiment iteration and for predictions (second ``out-of-sample'' iteration). The distinctive QDT effect captured by the attraction factor is mostly appreciable (i.e., most relevant and strongest in amplitude) for prospects with big losses. Our quantitative analysis of the experimental results supports the existence of an intrinsic limit of predictability, which is associated with the inherent probabilistic nature of choice. The results of the paper can find applications both in the prediction of choice of human decision makers as well as for organizing the operation of artificial intelligence.
Given univariate random variables $Y_1, \ldots, Y_n$ with the $\text{Uniform}(\theta_0 - 1, \theta_0 + 1)$ distribution, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ is the MLE for $\theta_0$ and estimates $\theta_0$ with error of order $1/n$, which is much smaller compared with the $1/\sqrt{n}$ error rate of the usual sample mean estimator. However, the sample midrange performs poorly when the data has say the Gaussian $N(\theta_0, 1)$ distribution, with an error rate of $1/\sqrt{\log n}$. In this paper, we propose an estimator of the location $\theta_0$ with a rate of convergence that can, in many settings, adapt to the underlying distribution which we assume to be symmetric around $\theta_0$ but is otherwise unknown. When the underlying distribution is compactly supported, we show that our estimator attains a rate of convergence of $n^{-\frac{1}{\alpha}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by the $\ell^\gamma$-center of the data, for a $\gamma\geq2$ chosen in a data-driven way -- by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $\ell^\gamma$ loss function for $\gamma > 2$ in certain settings.