We derive optimal statistical decision rules for discrete choice problems when payoffs depend on a partially-identified parameter $\theta$ and the decision maker can use a point-identified parameter $P$ to deduce restrictions on $\theta$. Leading examples include optimal treatment choice under partial identification and optimal pricing with rich unobserved heterogeneity. Our optimal decision rules minimize the maximum risk or regret over the identified set of payoffs conditional on $P$ and use the data efficiently to learn about $P$. We discuss implementation of optimal decision rules via the bootstrap and Bayesian methods, in both parametric and semiparametric models. We provide detailed applications to treatment choice and optimal pricing. Using a limits of experiments framework, we show that our optimal decision rules can dominate seemingly natural alternatives. Our asymptotic approach is well suited for realistic empirical settings in which the derivation of finite-sample optimal rules is intractable.
Local search is an effective method for solving large-scale combinatorial optimization problems, and it has made remarkable progress in recent years through several subtle mechanisms. In this paper, we found two ways to improve the local search algorithms in solving Pseudo-Boolean Optimization (PBO): Firstly, some of those mechanisms such as unit propagation are merely used in solving MaxSAT before, which can be generalized to solve PBO as well; Secondly, the existing local search algorithms utilize the heuristic on variables, so-called score, to mainly guide the search. We attempt to gain more insights into the clause, as it plays the role of a middleman who builds a bridge between variables and the given formula. Hence, we first extended the combination of unit propagation-based decimation algorithm to PBO problem, giving a further generalized definition of unit clause for PBO problem, and apply it to the existing solver LS-PBO for constructing an initial assignment; then, we introduced a new heuristic on clauses, dubbed care, to set a higher priority for the clauses that are less satisfied in current iterations. Experiments on benchmarks from the most recent PB Competition, as well as three real-world application benchmarks including minimum-width confidence band, wireless sensor network optimization, and seating arrangement problems show that our algorithm DeciLS-PBO has a promising performance compared to the state-of-the-art algorithms.
The multivariate Hawkes process is a past-dependent point process used to model the relationship of event occurrences between different phenomena.Although the Hawkes process was originally introduced to describe excitation effects, which means that one event increases the chances of another occurring, there has been a growing interest in modelling the opposite effect, known as inhibition.In this paper, we focus on how to infer the parameters of a multidimensional exponential Hawkes process with both excitation and inhibition effects. Our first result is to prove the identifiability of this model under a few sufficient assumptions. Then we propose a maximum likelihood approach to estimate the interaction functions, which is, to the best of our knowledge, the first exact inference procedure in the frequentist framework.Our method includes a variable selection step in order to recover the support of interactions and therefore to infer the connectivity graph.A benefit of our method is to provide an explicit computation of the log-likelihood, which enables in addition to perform a goodness-of-fit test for assessing the quality of estimations.We compare our method to standard approaches, which were developed in the linear framework and are not specifically designed for handling inhibiting effects.We show that the proposed estimator performs better on synthetic data than alternative approaches. We also illustrate the application of our procedure to a neuronal activity dataset, which highlights the presence of both exciting and inhibiting effects between neurons.
Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.
This article shows how to develop an efficient solver for a stabilized numerical space-time formulation of the advection-dominated diffusion transient equation. At the discrete space-time level, we approximate the solution by using higher-order continuous B-spline basis functions in its spatial and temporal dimensions. This problem is very difficult to solve numerically using the standard Galerkin finite element method due to artificial oscillations present when the advection term dominates the diffusion term. However, a first-order constraint least-square formulation allows us to obtain numerical solutions avoiding oscillations. The advantages of space-time formulations are the use of high-order methods and the feasibility of developing space-time mesh adaptive techniques on well-defined discrete problems. We develop a solver for a least-square formulation to obtain a stabilized and symmetric problem on finite element meshes. The computational cost of our solver is bounded by the cost of the inversion of the space-time mass and stiffness (with one value fixed at a point) matrices and the cost of the GMRES solver applied for the symmetric and positive definite problem. We illustrate our findings on an advection-dominated diffusion space-time model problem and present two numerical examples: one with isogeometric analysis discretizations and the second one with an adaptive space-time finite element method.
Whenever a binary classifier is used to provide decision support, it typically provides both a label prediction and a confidence value. Then, the decision maker is supposed to use the confidence value to calibrate how much to trust the prediction. In this context, it has been often argued that the confidence value should correspond to a well calibrated estimate of the probability that the predicted label matches the ground truth label. However, multiple lines of empirical evidence suggest that decision makers have difficulties at developing a good sense on when to trust a prediction using these confidence values. In this paper, our goal is first to understand why and then investigate how to construct more useful confidence values. We first argue that, for a broad class of utility functions, there exist data distributions for which a rational decision maker is, in general, unlikely to discover the optimal decision policy using the above confidence values -- an optimal decision maker would need to sometimes place more (less) trust on predictions with lower (higher) confidence values. However, we then show that, if the confidence values satisfy a natural alignment property with respect to the decision maker's confidence on her own predictions, there always exists an optimal decision policy under which the level of trust the decision maker would need to place on predictions is monotone on the confidence values, facilitating its discoverability. Further, we show that multicalibration with respect to the decision maker's confidence on her own predictions is a sufficient condition for alignment. Experiments on four different AI-assisted decision making tasks where a classifier provides decision support to real human experts validate our theoretical results and suggest that alignment may lead to better decisions.
Partial orders are a natural model for the social hierarchies that may constrain "queue-like" rank-order data. However, the computational cost of counting the linear extensions of a general partial order on a ground set with more than a few tens of elements is prohibitive. Vertex-series-parallel partial orders (VSPs) are a subclass of partial orders which admit rapid counting and represent the sorts of relations we expect to see in a social hierarchy. However, no Bayesian analysis of VSPs has been given to date. We construct a marginally consistent family of priors over VSPs with a parameter controlling the prior distribution over VSP depth. The prior for VSPs is given in closed form. We extend an existing observation model for queue-like rank-order data to represent noise in our data and carry out Bayesian inference on "Royal Acta" data and Formula 1 race data. Model comparison shows our model is a better fit to the data than Plackett-Luce mixtures, Mallows mixtures, and "bucket order" models and competitive with more complex models fitting general partial orders.
Optimal values and solutions of empirical approximations of stochastic optimization problems can be viewed as statistical estimators of their true values. From this perspective, it is important to understand the asymptotic behavior of these estimators as the sample size goes to infinity. This area of study has a long tradition in stochastic programming. However, the literature is lacking consistency analysis for problems in which the decision variables are taken from an infinite dimensional space, which arise in optimal control, scientific machine learning, and statistical estimation. By exploiting the typical problem structures found in these applications that give rise to hidden norm compactness properties for solution sets, we prove consistency results for nonconvex risk-averse stochastic optimization problems formulated in infinite dimensional space. The proof is based on several crucial results from the theory of variational convergence. The theoretical results are demonstrated for several important problem classes arising in the literature.
We consider the problem of identifying the signal shared between two one-dimensional target variables, in the presence of additional multivariate observations. Canonical Correlation Analysis (CCA)-based methods have traditionally been used to identify shared variables, however, they were designed for multivariate targets and only offer trivial solutions for univariate cases. In the context of Multi-Task Learning (MTL), various models were postulated to learn features that are sparse and shared across multiple tasks. However, these methods were typically evaluated by their predictive performance. To the best of our knowledge, no prior studies systematically evaluated models in terms of correctly recovering the shared signal. Here, we formalize the setting of univariate shared information retrieval, and propose ICM, an evaluation metric which can be used in the presence of ground-truth labels, quantifying 3 aspects of the learned shared features. We further propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables. We benchmark the models on a range of scenarios on synthetic data with known ground-truths and observe DCID outperforming the baselines in a wide range of settings. Finally, we demonstrate a real-life application of DCID on brain Magnetic Resonance Imaging (MRI) data, where we are able to extract more accurate predictors of changes in brain regions and obesity. The code for our experiments as well as the supplementary materials are available at //github.com/alexrakowski/dcid
Suppose we are given access to $n$ independent samples from distribution $\mu$ and we wish to output one of them with the goal of making the output distributed as close as possible to a target distribution $\nu$. In this work we show that the optimal total variation distance as a function of $n$ is given by $\tilde\Theta(\frac{D}{f'(n)})$ over the class of all pairs $\nu,\mu$ with a bounded $f$-divergence $D_f(\nu\|\mu)\leq D$. Previously, this question was studied only for the case when the Radon-Nikodym derivative of $\nu$ with respect to $\mu$ is uniformly bounded. We then consider an application in the seemingly very different field of smoothed online learning, where we show that recent results on the minimax regret and the regret of oracle-efficient algorithms still hold even under relaxed constraints on the adversary (to have bounded $f$-divergence, as opposed to bounded Radon-Nikodym derivative). Finally, we also study efficacy of importance sampling for mean estimates uniform over a function class and compare importance sampling with rejection sampling.
Social disruption occurs when a policy creates or destroys many network connections between agents. It is a costly side effect of many interventions and so a growing empirical literature recommends measuring and accounting for social disruption when evaluating the welfare impact of a policy. However, there is currently little work characterizing what can actually be learned about social disruption from data in practice. In this paper, we consider the problem of identifying social disruption in a research design that is popular in the literature. We provide two sets of identification results. First, we show that social disruption is not generally point identified, but informative bounds can be constructed using the eigenvalues of the network adjacency matrices observed by the researcher. Second, we show that point identification follows from a theoretically motivated monotonicity condition, and we derive a closed form representation. We apply our methods in two empirical illustrations and find large policy effects that otherwise might be missed by alternatives in the literature.