又黄又爽又色的视频免费,人妻超清中文字幕乱码一区,亚洲中文字幕国产AV,丁香婷婷色五月激情综合深爱

In recent years, gradient boosted decision trees have become popular in building robust machine learning models on big data. The primary technique that has enabled these algorithms success has been distributing the computation while building the decision trees. A distributed decision tree building, in turn, has been enabled by building quantiles of the big datasets and choosing the candidate split points from these quantile sets. In XGBoost, for instance, a sophisticated quantile building algorithm is employed to identify the candidate split points for the decision trees. This method is often projected to yield better results when the computation is distributed. In this paper, we dispel the notion that these methods provide more accurate and scalable methods for building decision trees in a distributed manner. In a significant contribution, we show theoretically and empirically that choosing the split points uniformly at random provides the same or even better performance in terms of accuracy and computational efficiency. Hence, a simple random selection of points suffices for decision tree building compared to more sophisticated methods.

相關內容

切分點

關注 0

自適應采樣 · 樣本 · 泛函 · Processing（編程語言） · 采樣法 ·

2021 年 10 月 15 日

Cross-validation based adaptive sampling for Gaussian process models

Hossein Mohammadi,Peter Challenor,Daniel Williamson,Marc Goodfellow

In many real-world applications, we are interested in approximating black-box, costly functions as accurately as possible with the smallest number of function evaluations. A complex computer code is an example of such a function. In this work, a Gaussian process (GP) emulator is used to approximate the output of complex computer code. We consider the problem of extending an initial experiment (set of model runs) sequentially to improve the emulator. A sequential sampling approach based on leave-one-out (LOO) cross-validation is proposed that can be easily extended to a batch mode. This is a desirable property since it saves the user time when parallel computing is available. After fitting a GP to training data points, the expected squared LOO (ES-LOO) error is calculated at each design point. ES-LOO is used as a measure to identify important data points. More precisely, when this quantity is large at a point it means that the quality of prediction depends a great deal on that point and adding more samples nearby could improve the accuracy of the GP. As a result, it is reasonable to select the next sample where ES-LOO is maximised. However, ES-LOO is only known at the experimental design and needs to be estimated at unobserved points. To do this, a second GP is fitted to the ES-LOO errors and where the maximum of the modified expected improvement (EI) criterion occurs is chosen as the next sample. EI is a popular acquisition function in Bayesian optimisation and is used to trade-off between local/global search. However, it has a tendency towards exploitation, meaning that its maximum is close to the (current) "best" sample. To avoid clustering, a modified version of EI, called pseudo expected improvement, is employed which is more explorative than EI yet allows us to discover unexplored regions. Our results show that the proposed sampling method is promising.

模型評估 · 詞表 · XPath · 標注 · Performer ·

2021 年 10 月 15 日

On-the-fly Global Embeddings Using Random Projections for Extreme Multi-label Classification

Yashaswi Verma

The goal of eXtreme Multi-label Learning (XML) is to automatically annotate a given data point with the most relevant subset of labels from an extremely large vocabulary of labels (e.g., a million labels). Lately, many attempts have been made to address this problem that achieve reasonable performance on benchmark datasets. In this paper, rather than coming-up with an altogether new method, our objective is to present and validate a simple baseline for this task. Precisely, we investigate an on-the-fly global and structure preserving feature embedding technique using random projections whose learning phase is independent of training samples and label vocabulary. Further, we show how an ensemble of multiple such learners can be used to achieve further boost in prediction accuracy with only linear increase in training and prediction time. Experiments on three public XML benchmarks show that the proposed approach obtains competitive accuracy compared with many existing methods. Additionally, it also provides around 6572x speed-up ratio in terms of training time and around 14.7x reduction in model-size compared to the closest competitors on the largest publicly available dataset.

優化器 · 線性的 · 類別 · Performer · binary ·

2021 年 10 月 15 日

Optimal Decision Trees for Nonlinear Metrics

Emir Demirovi?,Peter J. Stuckey

Nonlinear metrics, such as the F1-score, Matthews correlation coefficient, and Fowlkes-Mallows index, are often used to evaluate the performance of machine learning models, in particular, when facing imbalanced datasets that contain more samples of one class than the other. Recent optimal decision tree algorithms have shown remarkable progress in producing trees that are optimal with respect to linear criteria, such as accuracy, but unfortunately nonlinear metrics remain a challenge. To address this gap, we propose a novel algorithm based on bi-objective optimisation, which treats misclassifications of each binary class as a separate objective. We show that, for a large class of metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain the optimal tree by using our method to generate the set of all nondominated trees. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics: the resulting trees may be more desirable according to the given nonlinear metric at the expense of higher runtimes. Nevertheless, the experiments illustrate that runtimes are reasonable for majority of the tested datasets.

優化器 · 模型評估 · 決策樹 · Notability · Performer ·

2021 年 10 月 15 日

MurTree: Optimal Classification Trees via Dynamic Programming and Search

Emir Demirovi?,Anna Lukina,Emmanuel Hebrard,Jeffrey Chan,James Bailey,Christopher Leckie,Kotagiri Ramamohanarao,Peter J. Stuckey

Decision tree learning is a widely used approach in machine learning, favoured in applications that require concise and interpretable models. Heuristic methods are traditionally used to quickly produce models with reasonably high accuracy. A commonly criticised point, however, is that the resulting trees may not necessarily be the best representation of the data in terms of accuracy and size. In recent years, this motivated the development of optimal classification tree algorithms that globally optimise the decision tree in contrast to heuristic methods that perform a sequence of locally optimal decisions. We follow this line of work and provide a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our algorithm supports constraints on the depth of the tree and number of nodes. The success of our approach is attributed to a series of specialised techniques that exploit properties unique to classification trees. Whereas algorithms for optimal classification trees have traditionally been plagued by high runtimes and limited scalability, we show in a detailed experimental study that our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances, providing several orders of magnitude improvements and notably contributing towards the practical realisation of optimal decision trees.

估計/估計量 · 統計量 · Guidance · Performer · 樣本 ·

2021 年 10 月 14 日

Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Designs with Variation in Treatment Timing

Peter Z. Schochet

This article develops new closed-form variance expressions for power analyses for commonly used difference-in-differences (DID) and comparative interrupted time series (CITS) panel data estimators. The main contribution is to incorporate variation in treatment timing into the analysis. The power formulas also account for other key design features that arise in practice: autocorrelated errors, unequal measurement intervals, and clustering due to the unit of treatment assignment. We consider power formulas for both cross-sectional and longitudinal models and allow for covariates. An illustrative power analysis provides guidance on appropriate sample sizes. The key finding is that accounting for treatment timing increases required sample sizes. Further, DID estimators have considerably more power than standard CITS and ITS estimators. An available Shiny R dashboard performs the sample size calculations for the considered estimators.

Better · 秩 · Facebook AI Research · 集成 · 塊 ·

2021 年 10 月 13 日

An algorithm for a fairer and better voting system

Gabriel-Claudiu Grama

from arxiv, 26 pages (4 essential pages to read, 17 pages of simulation results, 4 pages of appendix, 1 page of table of contents and references)

The major finding, of this article, is an ensemble method, but more exactly, a novel, better ranked voting system (and other variations of it), that aims to solve the problem of finding the best candidate to represent the voters. We have the source code on GitHub, for making realistic simulations of elections, based on artificial intelligence for comparing different variations of the algorithm, and other already known algorithms. We have convincing evidence that our algorithm is better than Instant-Runoff Voting, Preferential Block Voting, Single Transferable Vote, and First Past The Post (if certain, natural conditions are met, to support the wisdom of the crowds). By also comparing with the best voter, we demonstrated the wisdom of the crowds, suggesting that democracy (distributed system) is a better option than dictatorship (centralized system), if those certain, natural conditions are met. Voting systems are not restricted to politics, they are ensemble methods for artificial intelligence, but the context of this article is natural intelligence. It is important to find a system that is fair (e.g. freedom of expression on the ballot exists), especially when the outcome of the voting system has social impact: some voting systems have the unfair inevitability to trend (over time) towards the same two major candidates (Duverger's law).

MCMC · 近似推斷 · 近似 · 推斷 · MoDELS ·

2021 年 10 月 13 日

Fast Approximate Inference for Spatial Extreme Value Models

Meixi Chen,Reza Ramezan,Martin Lysy

The generalized extreme value (GEV) distribution is a popular model for analyzing and forecasting extreme weather data. To increase prediction accuracy, spatial information is often pooled via a latent Gaussian process on the GEV parameters. Inference for such hierarchical GEV models is typically carried out using Markov chain Monte Carlo (MCMC) methods. However, MCMC can be prohibitively slow and computationally intensive when the number of latent variables is moderate to large. In this paper, we develop a fast Bayesian inference method for spatial GEV models based on the Laplace approximation. Through simulation studies, we compare the speed and accuracy of our method to both MCMC and a more sophisticated but less flexible Bayesian approximation. A case study in forecasting extreme wind speeds is presented.

Continuity · 離散化 · 估計/估計量 · 學成 · 情景 ·

2021 年 10 月 12 日

Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings

Hengrui Cai,Chengchun Shi,Rui Song,Wenbin Lu

We consider off-policy evaluation (OPE) in continuous treatment settings, such as personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new treatment decision rule using historical data generated by a different decision rule. Most existing works on OPE focus on discrete treatment settings. To handle continuous treatments, we develop a novel estimation method for OPE using deep jump learning. The key ingredient of our method lies in adaptively discretizing the treatment space using deep discretization, by leveraging deep learning and multi-scale change point detection. This allows us to apply existing OPE methods in discrete treatments to handle continuous treatments. Our method is further justified by theoretical results, simulations, and a real application to Warfarin Dosing.

Extensibility · Better · TEAM · 正則化項 · 泛函 ·

2021 年 10 月 12 日

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

Gabriele Farina,Christian Kroer,Tuomas Sandholm

from arxiv, Extended version of the EC21 conference version

We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games. First-order methods must typically be instantiated with a regularizer that serves as a distance-generating function for the decision sets of the players. For the case of two-player zero-sum games, the state-of-the-art theoretical convergence rate for Nash equilibrium is achieved by using the dilated entropy function. In this paper, we introduce a new entropy-based distance-generating function for two-player zero-sum games, and show that this function achieves significantly better strong convexity properties than the dilated entropy, while maintaining the same easily-implemented closed-form proximal mapping. Extensive numerical simulations show that these superior theoretical properties translate into better numerical performance as well. We then generalize our new entropy distance function, as well as general dilated distance functions, to the scaled extension operator. The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria. By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria. Our methods have a guaranteed $1/T$ rate of convergence, along with linear-time proximal updates.

估計/估計量 · 穩健性 · 容差 · 樣本復雜度 · TCS ·

2017 年 12 月 14 日

Being Robust (in High Dimensions) Can Be Practical

Ilias Diakonikolas,Gautam Kamath,Daniel M. Kane,Jerry Li,Ankur Moitra,Alistair Stewart

from arxiv, Appeared in ICML 2017

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.