The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.
Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statistically efficient two-step method, which only needs to approximate trajectories in dynamic systems but not their derivatives due to the inverse of differential operators by Green's function. This yields a statistically optimal guarantee for parameter estimation in general-order equations, a feature not shared by existing methods, and provides an efficient framework for broad statistical inferences in complex dynamic systems.
The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always H\"older continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
The aim of this paper is to study the complexity of the model checking problem MC for inquisitive propositional logic InqB and for inquisitive modal logic InqM, that is, the problem of deciding whether a given finite structure for the logic satisfies a given formula. In recent years, this problem has been thoroughly investigated for several variations of dependence and teams logics, systems closely related to inquisitive logic. Building upon some ideas presented by Yang, we prove that the model checking problems for InqB and InqM are both AP-complete.
In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data, through multiple imputation. Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values.
Although the asymptotic properties of the parameter estimator have been derived in the $p_{0}$ model for directed graphs with the differentially private bi-degree sequence, asymptotic theory in general models is still lacking. In this paper, we release the bi-degree sequence of directed graphs via the discrete Laplace mechanism, which satisfies differential privacy. We use the moment method to estimate the unknown model parameter. We establish a unified asymptotic result, in which consistency and asymptotic normality of the differentially private estimator holds. We apply the unified theoretical result to the Probit model. Simulations and a real data demonstrate our theoretical findings.
We introduce a statistical method for modeling and forecasting functional panel data, where each element is a density. Density functions are nonnegative and have a constrained integral and thus do not constitute a linear vector space. We implement a center log-ratio transformation to transform densities into unconstrained functions. These functions exhibit cross-sectionally correlation and temporal dependence. Via a functional analysis of variance decomposition, we decompose the unconstrained functional panel data into a deterministic trend component and a time-varying residual component. To produce forecasts for the time-varying component, a functional time series forecasting method, based on the estimation of the long-range covariance, is implemented. By combining the forecasts of the time-varying residual component with the deterministic trend component, we obtain h-step-ahead forecast curves for multiple populations. Illustrated by age- and sex-specific life-table death counts in the United States, we apply our proposed method to generate forecasts of the life-table death counts for 51 states.
We consider a geometric programming problem consisting in minimizing a function given by the supremum of finitely many log-Laplace transforms of discrete nonnegative measures on a Euclidean space. Under a coerciveness assumption, we show that a $\varepsilon$-minimizer can be computed in a time that is polynomial in the input size and in $|\log\varepsilon|$. This is obtained by establishing bit-size estimates on approximate minimizers and by applying the ellipsoid method. We also derive polynomial iteration complexity bounds for the interior point method applied to the same class of problems. We deduce that the spectral radius of a partially symmetric, weakly irreducible nonnegative tensor can be approximated within $\varepsilon$ error in poly-time. For strongly irreducible tensors, we also show that the logarithm of the positive eigenvector is poly-time computable. Our results also yield that the the maximum of a nonnegative homogeneous $d$-form in the unit ball with respect to $d$-H\"older norm can be approximated in poly-time. In particular, the spectral radius of uniform weighted hypergraphs and some known upper bounds for the clique number of uniform hypergraphs are poly-time computable.
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
We consider the problem of estimating a temperature-dependent thermal conductivity model (curve) from temperature measurements. We apply a Bayesian estimation approach that takes into account measurement errors and limited prior information of system properties. The approach intertwines system simulation and Markov chain Monte Carlo (MCMC) sampling. We investigate the impact of assuming different model classes -- cubic polynomials and piecewise linear functions -- their parametrization, and different types of prior information -- ranging from uninformative to informative. Piecewise linear functions require more parameters (conductivity values) to be estimated than the four parameters (coefficients or conductivity values) needed for cubic polynomials. The former model class is more flexible, but the latter requires less MCMC samples. While parametrizing polynomials with coefficients may feel more natural, it turns out that parametrizing them using conductivity values is far more natural for the specification of prior information. Robust estimation is possible for all model classes and parametrizations, as long as the prior information is accurate or not too informative. Gaussian Markov random field priors are especially well-suited for piecewise linear functions.
Adaptiveness is a key principle in information processing including statistics and machine learning. We investigate the usefulness of adaptive methods in the framework of asymptotic binary hypothesis testing, when each hypothesis represents asymptotically many independent instances of a quantum channel, and the tests are based on using the unknown channel and observing outputs. Unlike the familiar setting of quantum states as hypotheses, there is a fundamental distinction between adaptive and non-adaptive strategies with respect to the channel uses, and we introduce a number of further variants of the discrimination tasks by imposing different restrictions on the test strategies. The following results are obtained: (1) We prove that for classical-quantum channels, adaptive and non-adaptive strategies lead to the same error exponents both in the symmetric (Chernoff) and asymmetric (Hoeffding, Stein) settings. (2) The first separation between adaptive and non-adaptive symmetric hypothesis testing exponents for quantum channels, which we derive from a general lower bound on the error probability for non-adaptive strategies; the concrete example we analyze is a pair of entanglement-breaking channels. (3)We prove, in some sense generalizing the previous statement, that for general channels adaptive strategies restricted to classical feed-forward and product state channel inputs are not superior in the asymptotic limit to non-adaptive product state strategies. (4) As an application of our findings, we address the discrimination power of an arbitrary quantum channel and show that adaptive strategies with classical feedback and no quantum memory at the input do not increase the discrimination power of the channel beyond non-adaptive tensor product input strategies.