国产一本二本三本的区别视频_亚洲国产原创精品国语一区_人人射人人都有操_日韩中文字幕在线视频一区_欧美一区二区激情_亚洲人成网站日韩在线观看_91精品国自产拍在线观看不卡

Predictor screening rules, which discard predictors from the design matrix before fitting a model, have had considerable impact on the speed with which l1-regularized regression problems, such as the lasso, can be solved. Current state-of-the-art screening rules, however, have difficulties in dealing with highly-correlated predictors, often becoming too conservative. In this paper, we present a new screening rule to deal with this issue: the Hessian Screening Rule. The rule uses second-order information from the model to provide more accurate screening as well as higher-quality warm starts. The proposed rule outperforms all studied alternatives on data sets with high correlation for both l1-regularized least-squares (the lasso) and logistic regression. It also performs best overall on the real data sets that we examine.

相關內容

預測器/決策(ce)函數

關注 1

樣例 · 標注 · 噪聲 · 樣本 · 情景 ·

2022 年 4 月 20 日

Quantity vs Quality: Investigating the Trade-Off between Sample Size and Label Reliability

Timo Bertram,Johannes Fürnkranz,Martin Müller

from arxiv, Preliminary work under review for ICML 2022

In this paper, we study learning in probabilistic domains where the learner may receive incorrect labels but can improve the reliability of labels by repeatedly sampling them. In such a setting, one faces the problem of whether the fixed budget for obtaining training examples should rather be used for obtaining all different examples or for improving the label quality of a smaller number of examples by re-sampling their labels. We motivate this problem in an application to compare the strength of poker hands where the training signal depends on the hidden community cards, and then study it in depth in an artificial setting where we insert controlled noise levels into the MNIST database. Our results show that with increasing levels of noise, resampling previous examples becomes increasingly more important than obtaining new examples, as classifier performance deteriorates when the number of incorrect labels is too high. In addition, we propose two different validation strategies; switching from lower to higher validations over the course of training and using chi-square statistics to approximate the confidence in obtained labels.

情景 · 近似 · 成對型 · 線性分類 · 泛化理論 ·

2022 年 4 月 20 日

Conditional Gradients for the Approximately Vanishing Ideal

E. Wirth,S. Pokutta

from arxiv, 22 pages

The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.

學習器 · 泛函 · 學成 · Performer · 估計/估計量 ·

2022 年 4 月 19 日

Practical considerations for specifying a super learner

Rachael V. Phillips,Mark J. van der Laan,Hana Lee,Susan Gruber

from arxiv, This article has been submitted for publication as an Education Corner article in the International Journal of Epidemiology published by Oxford University Press

Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modeling. Constructing a predictive model can be thought of as learning a prediction function, i.e., a function that takes as input covariate data and outputs a predicted value. Many strategies for learning these functions from data are available, from parametric regressions to machine learning algorithms. It can be challenging to choose an approach, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task at hand. The super learner (SL) is an algorithm that alleviates concerns over selecting the one "right" strategy while providing the freedom to consider many of them, such as those recommended by collaborators, used in related research, or specified by subject-matter experts. It is an entirely pre-specified and data-adaptive strategy for predictive modeling. To ensure the SL is well-specified for learning the prediction function, the analyst does need to make a few important choices. In this Education Corner article, we provide step-by-step guidelines for making these choices, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience, and guided by theory.

方差 · Machine Translation · 同分布的 · 數據集 · Performer ·

2022 年 4 月 19 日

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Jiannan Xiang,Huayang Li,Yahui Liu,Lemao Liu,Guoping Huang,Defu Lian,Shuming Shi

from arxiv, Findings of ACL 2022

Current practices in metric evaluation focus on one single dataset, e.g., Newstest dataset in each year's WMT Metrics Shared Task. However, in this paper, we qualitatively and quantitatively show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets. Then this paper further investigates two potential hypotheses, i.e., insignificant data points and the deviation of Independent and Identically Distributed (i.i.d) assumption, which may take responsibility for the issue of data variance. In conclusion, our findings suggest that when evaluating automatic translation metrics, researchers should take data variance into account and be cautious to claim the result on a single dataset, because it may leads to inconsistent results with most of other datasets.

MoDELS · 學成 · 分離的 · 相互獨立的 · 黑盒 ·

2022 年 4 月 18 日

Separating Rule Discovery and Global Solution Composition in a Learning Classifier System

Michael Heider,Helena Stegherr,Jonathan Wurth,Roman Sraj,J?rg H?hner

from arxiv, Genetic and Evolutionary Computation Conference Companion (GECCO '22 Companion), July 9--13, 2022, Boston, MA, USA

While utilization of digital agents to support crucial decision making is increasing, trust in suggestions made by these agents is hard to achieve. However, it is essential to profit from their application, resulting in a need for explanations for both the decision making process and the model. For many systems, such as common black-box models, achieving at least some explainability requires complex post-processing, while other systems profit from being, to a reasonable extent, inherently interpretable. We propose a rule-based learning system specifically conceptualised and, thus, especially suited for these scenarios. Its models are inherently transparent and easily interpretable by design. One key innovation of our system is that the rules' conditions and which rules compose a problem's solution are evolved separately. We utilise independent rule fitnesses which allows users to specifically tailor their model structure to fit the given requirements for explainability.

線性的 · 子集搜索 · MoDELS · 損失函數（機器學習） · INFORMS ·

2022 年 4 月 18 日

Subset selection for linear mixed models

Daniel R. Kowal

Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence, such as grouped, clustered, or multilevel data. However, selection among the covariates--while accounting for this structured dependence--remains a challenge. We introduce a Bayesian decision analysis for subset selection with LMMs. Using a Mahalanobis loss function that incorporates the structured dependence, we derive optimal linear coefficients for (i) any given subset of variables and (ii) all subsets of variables that satisfy a cardinality constraint. Crucially, these estimates inherit shrinkage or regularization and uncertainty quantification from the underlying Bayesian model, and apply for any well-specified Bayesian LMM. More broadly, our decision analysis strategy deemphasizes the role of a single "best" subset, which is often unstable and limited in its information content, and instead favors a collection of near-optimal subsets. This collection is summarized by key member subsets and variable-specific importance metrics. Customized subset search and out-of-sample approximation algorithms are provided for more scalable computing. These tools are applied to simulated data and a longitudinal physical activity dataset, and demonstrate excellent prediction, estimation, and selection ability.

可辨認的 · 查準率/準確率 · 向量化 · 情景 · 論文 ·

2022 年 4 月 18 日

Safe rules for the identification of zeros in the solutions of the SLOPE problem

Clément Elvira,Cédric Herzet

from arxiv, 26 pages, 3 figures

In this paper we propose a methodology to accelerate the resolution of the so-called "Sorted L-One Penalized Estimation" (SLOPE) problem. Our method leverages the concept of "safe screening", well-studied in the literature for \textit{group-separable} sparsity-inducing norms, and aims at identifying the zeros in the solution of SLOPE. More specifically, we derive a set of $\tfrac{n(n+1)}{2}$ inequalities for each element of the $n$-dimensional primal vector and prove that the latter can be safely screened if some subsets of these inequalities are verified. We propose moreover an efficient algorithm to jointly apply the proposed procedure to all the primal variables. Our procedure has a complexity $\mathcal{O}(n\log n + LT)$ where $T\leq n$ is a problem-dependent constant and $L$ is the number of zeros identified by the tests. Numerical experiments confirm that, for a prescribed computational budget, the proposed methodology leads to significant improvements of the solving precision.

學成 · 全 · 可約的 · 原點 · 樣本 ·

2022 年 4 月 15 日

Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation

Abhineet Jain,Jack Kolb,J. M. Abbess IV,Harish Ravichandar

from arxiv, Accepted in the Machine Learning in Human-Robot Collaboration Workshop at the 17th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2022). The first two authors contributed equally. 6 pages, 7 figures

Imitation learning is a promising approach to help robots acquire dexterous manipulation capabilities without the need for a carefully-designed reward or a significant computational effort. However, existing imitation learning approaches require sophisticated data collection infrastructure and struggle to generalize beyond the training distribution. One way to address this limitation is to gather additional data that better represents the full operating conditions. In this work, we investigate characteristics of such additional demonstrations and their impact on performance. Specifically, we study the effects of corrective and randomly-sampled additional demonstrations on learning a policy that guides a five-fingered robot hand through a pick-and-place task. Our results suggest that corrective demonstrations considerably outperform randomly-sampled demonstrations, when the proportion of additional demonstrations sampled from the full task distribution is larger than the number of original demonstrations sampled from a restrictive training distribution. Conversely, when the number of original demonstrations are higher than that of additional demonstrations, we find no significant differences between corrective and randomly-sampled additional demonstrations. These results provide insights into the inherent trade-off between the effort required to collect corrective demonstrations and their relative benefits over randomly-sampled demonstrations. Additionally, we show that inexpensive vision-based sensors, such as LeapMotion, can be used to dramatically reduce the cost of providing demonstrations for dexterous manipulation tasks. Our code is available at //github.com/GT-STAR-Lab/corrective-demos-dexterous-manipulation.

自編碼器 · 收縮自編碼器 · 描述符 · 可約的 · 卷積 ·

2022 年 4 月 15 日

Condition-Invariant and Compact Visual Place Description by Convolutional Autoencoder

Hanjing Ye,Weinan Chen,Jingwen Yu,Li He,Yisheng Guan,Hong Zhang

from arxiv, under review in Journal Intelligent and Robotic Systems (JINT), 2022

Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are CNN-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: a) their high dimension and b) lack of generalization, leading to low efficiency and poor performance in applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features, and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in three challenging datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. For the benefit of the community, we make public the source code.

INFORMS · 表示定理 · 可交換的 · 相對熵 · 查全率/召回率 ·

2022 年 4 月 14 日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Lampros Gavalakis,Ioannis Kontoyiannis

from arxiv, Small changes from the previous version, including a few more references and clarifications in the Introduction

We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'