亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.

相關內容

A good automatic evaluation metric for language generation ideally correlates highly with human judgements of text quality. Yet, there is a dearth of such metrics, which inhibits the rapid and efficient progress of language generators. One exception is the recently proposed Mauve. In theory, Mauve measures an information-theoretic divergence between two probability distributions over strings: one representing the language generator under evaluation; the other representing the true natural language distribution. Mauve's authors argue that its success comes from the qualitative properties of their proposed divergence. Yet in practice, as this divergence is uncomputable, Mauve approximates it by measuring the divergence between multinomial distributions over clusters instead, where cluster assignments are attained by grouping strings based on a pre-trained language model's embeddings. As we show, however, this is not a tight approximation -- in either theory or practice. This begs the question: why does Mauve work so well? In this work, we show that Mauve was right for the wrong reasons, and that its newly proposed divergence is not necessary for its high performance. In fact, classical divergences paired with its proposed cluster-based approximation may actually serve as better evaluation metrics. We finish the paper with a probing analysis; this analysis leads us to conclude that -- by encoding syntactic- and coherence-level features of text, while ignoring surface-level features -- such cluster-based substitutes to string distributions may simply be better for evaluating state-of-the-art language generators.

Conformal prediction is a widely used method to quantify uncertainty in settings where the data is independent and identically distributed (IID), or more generally, exchangeable. Conformal prediction takes in a pre-trained classifier, a calibration dataset and a confidence level as inputs, and returns a function which maps feature vectors to subsets of classes. The output of the returned function for a new feature vector (i.e., a test data point) is guaranteed to contain the true class with the pre-specified confidence. Despite its success and usefulness in IID settings, extending conformal prediction to non-exchangeable (e.g., Markovian) data in a manner that provably preserves all desirable theoretical properties has largely remained an open problem. As a solution, we extend conformal prediction to the setting of a Hidden Markov Model (HMM) with unknown parameters. The key idea behind the proposed method is to partition the non-exchangeable Markovian data from the HMM into exchangeable blocks by exploiting the de Finetti's Theorem for Markov Chains discovered by Diaconis and Freedman (1980). The permutations of the exchangeable blocks are then viewed as randomizations of the observed Markovian data from the HMM. The proposed method provably retains all desirable theoretical guarantees offered by the classical conformal prediction framework. Detailed numerical results that verify and compliment the theoretical conclusions are provided to illustrate the performance of the proposed method.

We consider discrete-time parametric population-size-dependent branching processes (PSDBPs) with almost sure extinction and propose a new class of weighted least-squares estimators based on a single trajectory of population size counts. We prove that, conditional on non-extinction up to a finite time $n$, our estimators are consistent and asymptotic normal as $n\to\infty$. We pay particular attention to estimating the carrying capacity of a population. Our estimators are the first conditionally consistent estimators for PSDBPs, and more generally, for Markov models for populations with a carrying capacity. Through simulated examples, we demonstrate that our estimators outperform other least squares estimators for PSDBPs in a variety of settings. Finally, we apply our methods to estimate the carrying capacity of the endangered Chatham Island black robin population.

There are many high dimensional function classes that have fast agnostic learning algorithms when assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be confident that data indeed satisfies such assumption, so that one can trust in output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussianity testers do not exist for the $L_1$ and EMD distance measures. A key step is to show that half-spaces are well-approximated with low-degree polynomials relative to distributions with low-degree moments close to those of a Gaussian. We also go beyond spherically-symmetric distributions, and give a tester-learner pair for halfspaces under the uniform distribution on $\{0,1\}^n$ with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This is achieved using polynomial approximation theory and critical index machinery. We also show there exist some well-studied settings where $2^{\tilde{O}(\sqrt{n})}$ run-time agnostic learning algorithms are available, yet the combined run-times of tester-learner pairs must be as high as $2^{\Omega(n)}$. On that account, the design of tester-learner pairs is a research direction in its own right independent of standard agnostic learning.

In modern machine learning applications, frequent encounters of covariate shift and label scarcity have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on receiver operating characteristic (ROC) analysis. We proposed $\bf S$emi-supervised $\bf T$ransfer l$\bf E$arning of $\bf A$ccuracy $\bf M$easures (STEAM), an efficient three-step estimation procedure that employs 1) double-index modeling to construct calibrated density ratio weights and 2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimators under correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for Rheumatoid Arthritis (RA) on a temporally evolving EHR cohort.

The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This led to the proposal of the double descent curve of performance by Belkin et al. Although it seems to describe a real, representative phenomenon, the field is lacking a fundamental theoretical understanding of what is happening, what are the consequences for model selection and when is double descent expected to occur. In this paper we develop a principled understanding of the phenomenon, and sketch answers to these important questions. Furthermore, we report real experimental results that are correctly predicted by our proposed hypothesis.

There is a long-standing debate in the statistical, epidemiological and econometric fields as to whether nonparametric estimation that uses data-adaptive methods, like machine learning algorithms in model fitting, confer any meaningful advantage over simpler, parametric approaches in real-world, finite sample estimation of causal effects. We address the question: when trying to estimate the effect of a treatment on an outcome, across a universe of reasonable data distributions, how much does the choice of nonparametric vs.~parametric estimation matter? Instead of answering this question with simulations that reflect a few chosen data scenarios, we propose a novel approach evaluating performance across thousands of data-generating mechanisms drawn from non-parametric models with semi-informative priors. We call this approach a Universal Monte-Carlo Simulation. We compare performance of estimating the average treatment effect across two parametric estimators (a g-computation estimator that uses a parametric outcome model and an inverse probability of treatment weighted estimator) and two nonparametric estimators (a tree-based estimator and a targeted minimum loss-based estimator that uses an ensemble of machine learning algorithms in model fitting). We summarize estimator performance in terms of bias, confidence interval coverage, and mean squared error. We find that the nonparametric estimators nearly always outperform the parametric estimators with the exception of having similar performance in terms of bias and slightly worse performance in terms of coverage under the smallest sample size of N=100.

The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

北京阿比特科技有限公司