Change point estimation is often formulated as a search for the maximum of a gain function describing improved fits when segmenting the data. Searching through all candidates requires $O(n)$ evaluations of the gain function for an interval with $n$ observations. If each evaluation is computationally demanding (e.g. in high-dimensional models), this can become infeasible. Instead, we propose optimistic search methods with $O(\log n)$ evaluations exploiting specific structure of the gain function. Towards solid understanding of our strategy, we investigate in detail the $p$-dimensional Gaussian changing means setup, including high-dimensional scenarios. For some of our proposals, we prove asymptotic minimax optimality for detecting change points and derive their asymptotic localization rate. These rates (up to a possible log factor) are optimal for the univariate and multivariate scenarios, and are by far the fastest in the literature under the weakest possible detection condition on the signal-to-noise ratio in the high-dimensional scenario. Computationally, our proposed methodology has the worst case complexity of $O(np)$, which can be improved to be sublinear in $n$ if some a-priori knowledge on the length of the shortest segment is available. Our search strategies generalize far beyond the theoretically analyzed setup. We illustrate, as an example, massive computational speedup in change point detection for high-dimensional Gaussian graphical models.
This letter concerns optimal power transmission line inspection formulated as a proposed generalization of the traveling salesman problem for a multi-route one-depot scenario. The problem is formulated for an inspection vehicle with a limited travel budget. Therefore, the solution can be composed of multiple runs to provide full coverage of the given power lines. Besides, the solution indicates how many vehicles can perform the inspection in a single run. The optimal solution of the problem is solved by the proposed Integer Linear Programming (ILP) formulation, which is, however, very computationally demanding. Therefore, the computational requirements are addressed by the combinatorial metaheuristic. The employed greedy randomized adaptive search procedure is significantly less demanding while providing competitive solutions and scales better with the problem size than the ILP-based approach. The proposed formulation and algorithms are demonstrated in a real-world scenario to inspect power line segments at the electrical substation.
Maximum likelihood estimation in logistic regression with mixed effects is known to often result in estimates on the boundary of the parameter space. Such estimates, which include infinite values for fixed effects and singular or infinite variance components, can cause havoc to numerical estimation procedures and inference. We introduce an appropriately scaled additive penalty to the log-likelihood function, or an approximation thereof, which penalizes the fixed effects by the Jeffreys' invariant prior for the model with no random effects and the variance components by a composition of negative Huber loss functions. The resulting maximum penalized likelihood estimates are shown to lie in the interior of the parameter space. Appropriate scaling of the penalty guarantees that the penalization is soft enough to preserve the optimal asymptotic properties expected by the maximum likelihood estimator, namely consistency, asymptotic normality, and Cram\'er-Rao efficiency. Our choice of penalties and scaling factor preserves equivariance of the fixed effects estimates under linear transformation of the model parameters, such as contrasts. Maximum softly-penalized likelihood is compared to competing approaches on two real-data examples, and through comprehensive simulation studies that illustrate its superior finite sample performance.
Probabilistic sensitivity analysis identifies the influential uncertain input to guide decision-making. We propose a general sensitivity framework with respect to the input distribution parameters that unifies a wide range of sensitivity measures, including information theoretical metrics such as the Fisher information. The framework is derived analytically via a constrained maximization and the sensitivity analysis is reformulated into an eigenvalue problem. There are only two main steps to implement the sensitivity framework utilising the likelihood ratio/score function method, a Monte Carlo type sampling followed by solving an eigenvalue equation. The resulting eigenvectors then provide the directions for simultaneous variations of the input parameters and guide the focus to perturb uncertainty the most. Not only is it conceptually simple, but numerical examples demonstrate that the proposed framework also provides new sensitivity insights, such as the combined sensitivity of multiple correlated uncertainty metrics, robust sensitivity analysis with an entropic constraint, and approximation of deterministic sensitivities. Three different examples, ranging from a simple cantilever beam to an offshore marine riser, are used to demonstrate the potential applications of the proposed sensitivity framework to applied mechanics problems.
In this paper we consider change-points in multiple sequences with the objective of minimizing the estimation error of a sequence by making use of information from other sequences. This is in contrast to recent interest on change-points in multiple sequences where the focus is on detection of common change-points. We start with the canonical case of a single sequence with constant change-point intensities. We consider two measures of a change-point algorithm. The first is the probability of estimating the change-point with no error. The second is the expected distance between the true and estimated change-points. We provide a theoretical upper bound for the no error probability, and a lower bound for the expected distance, that must be satisfied by all algorithms. We propose a scan-CUSUM algorithm that achieves the no error upper bound and come close to the distance lower bound. We next consider the case of non-constant intensities and establish sharp conditions under which estimation error can go to zero. We propose an extension of the scan-CUSUM algorithm for a non-constant intensity function, and show that it achieves asymptotically zero error at the boundary of the zero-error regime. We illustrate an application of the scan-CUSUM algorithm on multiple sequences sharing an unknown, non-constant intensity function. We estimate the intensity function from the change-point profile likelihoods of all sequences and apply scan-CUSUM on the estimated intensity function.
The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.
Sparseness and robustness are two important properties for many machine learning scenarios. In the present study, regarding the maximum correntropy criterion (MCC) based robust regression algorithm, we investigate to integrate the MCC method with the automatic relevance determination (ARD) technique in a Bayesian framework, so that MCC-based robust regression could be implemented with adaptive sparseness. To be specific, we use an inherent noise assumption from the MCC to derive an explicit likelihood function, and realize the maximum a posteriori (MAP) estimation with the ARD prior by variational Bayesian inference. Compared to the existing robust and sparse L1-regularized MCC regression, the proposed MCC-ARD regression can eradicate the troublesome tuning for the regularization hyper-parameter which controls the regularization strength. Further, MCC-ARD achieves superior prediction performance and feature selection capability than L1-regularized MCC, as demonstrated by a noisy and high-dimensional simulation study.
We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.
Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.
Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multi-agent models acting as forward solvers for systems of ordinary or stochastic differential equations, and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection, and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a non-convex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster.
Concept shift is a prevailing problem in natural tasks like medical image segmentation where samples usually come from different subpopulations with variant correlations between features and labels. One common type of concept shift in medical image segmentation is the "information imbalance" between label-sparse samples with few (if any) segmentation labels and label-dense samples with plentiful labeled pixels. Existing distributionally robust algorithms have focused on adaptively truncating/down-weighting the "less informative" (i.e., label-sparse in our context) samples. To exploit data features of label-sparse samples more efficiently, we propose an adaptively weighted online optimization algorithm -- AdaWAC -- to incorporate data augmentation consistency regularization in sample reweighting. Our method introduces a set of trainable weights to balance the supervised loss and unsupervised consistency regularization of each sample separately. At the saddle point of the underlying objective, the weights assign label-dense samples to the supervised loss and label-sparse samples to the unsupervised consistency regularization. We provide a convergence guarantee by recasting the optimization as online mirror descent on a saddle point problem. Our empirical results demonstrate that AdaWAC not only enhances the segmentation performance and sample efficiency but also improves the robustness to concept shift on various medical image segmentation tasks with different UNet-style backbones.