云南虫谷在线观看免费观看电视剧_欧美亚洲一区电影_日本强伦姧人妻一区二区三_成人激情视频网站_欧美一成人第一精品视频_欧美A级黄色免费A比视频_99久久精品免费看国产一区二区

This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.

相關內容

嶺回歸(gui)

關注 0

Minimax · 稀疏 · 估計/估計量 · 估計誤差 · 特化 ·

2023 年 7 月 18 日

Minimax Rates for High-dimensional Double Sparse Structure over $\ell_u(\ell_q)$-balls

Zhifan Li,Yanhang Zhang,Jianxin Yin

from arxiv, 49 pages, 6 figures

In this paper, we focus on the high-dimensional double sparse structure, where the parameter of interest simultaneously encourages group-wise sparsity and element-wise sparsity in each group. By combining the Gilbert-Varshamov bound and its variants, we develop a novel lower bound technique for the metric entropy of the parameter space, specifically tailored for the double sparse structure over $\ell_u(\ell_q)$-balls with $u,q \in [0,1]$. We prove lower bounds on the estimation error using an information-theoretic approach, leveraging our proposed lower bound technique and Fano's inequality. To complement the lower bounds, we establish matching upper bounds through a direct analysis of constrained least-squares estimators and utilize results from empirical processes. A significant finding of our study is the discovery of a phase transition phenomenon in the minimax rates for $u,q \in (0, 1]$. Furthermore, we extend the theoretical results to the double sparse regression model and determine its minimax rate for estimation error. To tackle double sparse linear regression, we develop the DSIHT (Double Sparse Iterative Hard Thresholding) algorithm, demonstrating its optimality in the minimax sense. Finally, we demonstrate the superiority of our method through numerical experiments.

秩 · Learning · MoDELS · Processing（編程語言） · 優化器 ·

2023 年 7 月 18 日

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Zeyu Zhang,Yi Su,Hui Yuan,Yiran Wu,Rishab Balasubramanian,Qingyun Wu,Huazheng Wang,Mengdi Wang

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we unified the ranking process under general stochastic click models as a Markov Decision Process (MDP), and the optimal ranking could be learned with offline reinforcement learning (RL) directly. Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models. Through a dedicated formulation of the MDP, we show that offline RL algorithms can adapt to various click models without complex debiasing techniques and prior knowledge of the model. Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms while maintaining consistency and robustness under different click models.

Boosting（一種模型訓練加速方式） · 貪心 · 近似 · Seven · TransE ·

2023 年 7 月 17 日

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Rajat De,Dominik Kempa

Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio $\rho=\mathcal{O}(\text{polylog }N)$. Unfortunately, for the majority of grammar compressors, $\rho$ is either unknown or satisfies $\rho=\omega(\text{polylog }N)$. In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and $\alpha$-Balanced. Only one of them ($\alpha$-Balanced) is known to achieve $\rho=\mathcal{O}(\text{polylog }N)$. We develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio $\rho$ of the used grammar compressor. Using this technique, we first prove that $\Omega(\log N/\log \log N)$ time is required for random access on RePair, Greedy, LongestMatch, Sequential, and Bisection, while $\Omega(\log\log N)$ time is required for random access to LZ78. All these lower bounds hold within space $\mathcal{O}(n\text{ polylog }N)$ and match the existing upper bounds. We also generalize this technique to prove several conditional lower bounds for compressed computation. For example, we prove that unless the Combinatorial $k$-Clique Conjecture fails, there is no combinatorial algorithm for CFG parsing on Bisection (for which it holds $\rho=\tilde{\Theta}(N^{1/2})$) that runs in $\mathcal{O}(n^c\cdot N^{3-\epsilon})$ time for all constants $c>0$ and $\epsilon>0$. Previously, this was known only for $c<2\epsilon$.

主動學習 · Learning · 貝葉斯最優分類器 · SimPLe · 泛函 ·

2023 年 7 月 17 日

A Two-Stage Active Learning Algorithm for $k$-Nearest Neighbors

Nick Rittler,Kamalika Chaudhuri

$k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes. Unfortunately, it has thus far proved difficult to design active learning strategies for the training of local voting-based classifiers that naturally retain these desirable properties, and hence active learning strategies for $k$-nearest neighbor classification have been conspicuously missing from the literature. In this work, we introduce a simple and intuitive active learning algorithm for the training of $k$-nearest neighbor classifiers, the first in the literature which retains the concept of the $k$-nearest neighbor vote at prediction time. We provide consistency guarantees for a modified $k$-nearest neighbors classifier trained on samples acquired via our scheme, and show that when the conditional probability function $\mathbb{P}(Y=y|X=x)$ is sufficiently smooth and the Tsybakov noise condition holds, our actively trained classifiers converge to the Bayes optimal classifier at a faster asymptotic rate than passively trained $k$-nearest neighbor classifiers.

估計/估計量 · 跡 · Analysis · MoDELS · COVID-19 ·

2023 年 7 月 17 日

Parameter estimation for contact tracing in graph-based models

Augustine Okolie,Johannes Müller,Mirjam Kretzschmar

from arxiv, 19 pages, 8 figures, 3 tables

We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is $R_0$. The estimator is tested in a simulation study and is furthermore applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution fit the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency of the estimates on the reproduction number. Finally, we discuss the relevance of our findings.

二次規劃 · 支持向量機 · 支持向量 · 向量化 · 分解的 ·

2023 年 7 月 15 日

A Nearly-Linear Time Algorithm for Structured Support Vector Machines

Yuzhou Gu,Zhao Song,Lichen Zhang

from arxiv, arXiv admin note: text overlap with arXiv:2211.06033

Quadratic programming is a fundamental problem in the field of convex optimization. Many practical tasks can be formulated as quadratic programming, for example, the support vector machine (SVM). Linear SVM is one of the most popular tools over the last three decades in machine learning before deep learning method dominating. In general, a quadratic program has input size $\Theta(n^2)$ (where $n$ is the number of variables), thus takes $\Omega(n^2)$ time to solve. Nevertheless, quadratic programs coming from SVMs has input size $O(n)$, allowing the possibility of designing nearly-linear time algorithms. Two important classes of SVMs are programs admitting low-rank kernel factorizations and low-treewidth programs. Low-treewidth convex optimization has gained increasing interest in the past few years (e.g.~linear programming [Dong, Lee and Ye 2021] and semidefinite programming [Gu and Song 2022]). Therefore, an important open question is whether there exist nearly-linear time algorithms for quadratic programs with these nice structures. In this work, we provide the first nearly-linear time algorithm for solving quadratic programming with low-rank factorization or low-treewidth, and a small number of linear constraints. Our results imply nearly-linear time algorithms for low-treewidth or low-rank SVMs.

樣本復雜度 · Learning · 噪聲 · 樣本 · 額外誤差 ·

2023 年 7 月 13 日

Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise

Ilias Diakonikolas,Jelena Diakonikolas,Daniel M. Kane,Puqian Wang,Nikos Zarifis

We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetilde{\Theta}(d/\epsilon)$, where $d$ is the dimension and $\epsilon$ is the excess error. Our positive result is a computationally efficient learning algorithm with sample complexity $\tilde{O}(d/\epsilon + d/(\max\{p, \epsilon\})^2)$, where $p$ quantifies the bias of the target halfspace. On the lower bound side, we show that any efficient SQ algorithm (or low-degree test) for the problem requires sample complexity at least $\Omega(d^{1/2}/(\max\{p, \epsilon\})^2)$. Our lower bound suggests that this quadratic dependence on $1/\epsilon$ is inherent for efficient algorithms.

估計/估計量 · 分解的 · 有偏 · 方差 · 相互獨立的 ·

2023 年 7 月 13 日

Leveraging Factored Action Spaces for Off-Policy Evaluation

Aaman Rebello,Shengpu Tang,Jenna Wiens,Sonali Parbhoo

from arxiv, Main paper: 8 pages, 7 figures. Appendix: 30 pages, 17 figures. Accepted at ICML 2023 Workshop on Counterfactuals in Minds and Machines, Honolulu, Hawaii, USA. Camera ready version

Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.

圖 · 學成 · MoDELS · Extensibility · 深度學習 ·

2022 年 2 月 24 日

Bayesian Deep Learning for Graphs

Federico Errica

from arxiv, PhD Thesis

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

離散化 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2019 年 3 月 28 日

Learning Discrete Structures for Graph Neural Networks

Luca Franceschi,Mathias Niepert,Massimiliano Pontil,Xiao He

from arxiv, 18 pages

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.