We study a pricing setting where each customer is offered a contextualized price based on customer and/or product features that are predictive of the customer's valuation for that product. Often only historical sales records are available, where we observe whether each customer purchased a product at the price prescribed rather than the customer's true valuation. As such, the data is influenced by the historical sales policy which introduces difficulties in a) estimating future loss/regret for pricing policies without the possibility of conducting real experiments and b) optimizing new policies for downstream tasks such as revenue management. We study how to formulate loss functions which can be used for optimizing pricing policies directly, rather than going through an intermediate demand estimation stage, which can be biased in practice due to model misspecification, regularization or poor calibration. While existing approaches have been proposed when valuation data is available, we propose loss functions for the observational data setting. To achieve this, we adapt ideas from machine learning with corrupted labels, where we can consider each observed customer's outcome (purchased or not for a prescribed price), as a (known) probabilistic transformation of the customer's valuation. From this transformation we derive a class of suitable unbiased loss functions. Within this class we identify minimum variance estimators, those which are robust to poor demand function estimation, and provide guidance on when the estimated demand function is useful. Furthermore, we also show that when applied to our contextual pricing setting, estimators popular in the off-policy evaluation literature fall within this class of loss functions, and also offer managerial insights on when each estimator is likely to perform well in practice.
Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kernel sum-of-squares model for functions that take values in the PSD cone, which extends kernel sums-of-squares models that were recently proposed to encode non-negative scalar functions. We provide a representer theorem for this class of PSD functions, show that it constitutes a universal approximator of PSD functions, and derive eigenvalue bounds in the case of subsampled equality constraints. We then apply our results to modeling convex functions, by enforcing a kernel sum-of-squares representation of their Hessian, and show that any smooth and strongly convex function may be thus represented. Finally, we illustrate our methods on a PSD matrix-valued regression task, and on scalar-valued convex regression.
Consider estimation of average treatment effects with multi-valued treatments using augmented inverse probability weighted (IPW) estimators, depending on outcome regression and propensity score models in high-dimensional settings. These regression models are often fitted by regularized likelihood-based estimation, while ignoring how the fitted functions are used in the subsequent inference about the treatment parameters. Such separate estimation can be associated with known difficulties in existing methods. We develop regularized calibrated estimation for fitting propensity score and outcome regression models, where sparsity-including penalties are employed to facilitate variable selection but the loss functions are carefully chosen such that valid confidence intervals can be obtained under possible model misspecification. Unlike in the case of binary treatments, the usual augmented IPW estimator is generalized by allowing different copies of coefficient estimators in outcome regression to ensure just-identification. For propensity score estimation, the new loss function and estimating functions are directly tied to achieving covariate balance between weighted treatment groups. We develop practical numerical algorithms for computing the regularized calibrated estimators with group Lasso by innovatively exploiting Fisher scoring, and provide rigorous high-dimensional analysis for the resulting augmented IPW estimators under suitable sparsity conditions, while tackling technical issues absent or overlooked in previous analyses. We present simulation studies and an empirical application to estimate the effects of maternal smoking on birth weights. The proposed methods are implemented in the R package mRCAL.
Addressing the problem of fairness is crucial to safely use machine learning algorithms to support decisions with a critical impact on people's lives such as job hiring, child maltreatment, disease diagnosis, loan granting, etc. Several notions of fairness have been defined and examined in the past decade, such as statistical parity and equalized odds. The most recent fairness notions, however, are causal-based and reflect the now widely accepted idea that using causality is necessary to appropriately address the problem of fairness. This paper examines an exhaustive list of causal-based fairness notions and study their applicability in real-world scenarios. As the majority of causal-based fairness notions are defined in terms of non-observable quantities (e.g., interventions and counterfactuals), their deployment in practice requires to compute or estimate those quantities using observational data. This paper offers a comprehensive report of the different approaches to infer causal quantities from observational data including identifiability (Pearl's SCM framework) and estimation (potential outcome framework). The main contributions of this survey paper are (1) a guideline to help selecting a suitable fairness notion given a specific real-world scenario, and (2) a ranking of the fairness notions according to Pearl's causation ladder indicating how difficult it is to deploy each notion in practice.
Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions. We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.
Leveraging biased click data for optimizing learning to rank systems has been a popular approach in information retrieval. Because click data is often noisy and biased, a variety of methods have been proposed to construct unbiased learning to rank (ULTR) algorithms for the learning of unbiased ranking models. Among them, automatic unbiased learning to rank (AutoULTR) algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their differences in theories and algorithm design, existing studies on ULTR usually use uni-variate ranking functions to score each document or result independently. On the other hand, recent advances in context-aware learning-to-rank models have shown that multivariate scoring functions, which read multiple documents together and predict their ranking scores jointly, are more powerful than uni-variate ranking functions in ranking tasks with human-annotated relevance labels. Whether such superior performance would hold in ULTR with noisy data, however, is mostly unknown. In this paper, we investigate existing multivariate scoring functions and AutoULTR algorithms in theory and prove that permutation invariance is a crucial factor that determines whether a context-aware learning-to-rank model could be applied to existing AutoULTR framework. Our experiments with synthetic clicks on two large-scale benchmark datasets show that AutoULTR models with permutation-invariant multivariate scoring functions significantly outperform those with uni-variate scoring functions and permutation-variant multivariate scoring functions.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.
Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.
From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.