唯美清纯另类亚洲一区二区_国产自愉一二三四五区_亚洲中文精品免费看_免费亚洲午夜成人无码精品_日韩国国产一区二区三区_一级A一级A爰片免费免免下_国产精品欧美亚洲韩国日本不卡

There are many high dimensional function classes that have fast agnostic learning algorithms when assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be confident that data indeed satisfies such assumption, so that one can trust in output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussianity testers do not exist for the $L_1$ and EMD distance measures. A key step is to show that half-spaces are well-approximated with low-degree polynomials relative to distributions with low-degree moments close to those of a Gaussian. We also go beyond spherically-symmetric distributions, and give a tester-learner pair for halfspaces under the uniform distribution on $\{0,1\}^n$ with combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This is achieved using polynomial approximation theory and critical index machinery. We also show there exist some well-studied settings where $2^{\tilde{O}(\sqrt{n})}$ run-time agnostic learning algorithms are available, yet the combined run-times of tester-learner pairs must be as high as $2^{\Omega(n)}$. On that account, the design of tester-learner pairs is a research direction in its own right independent of standard agnostic learning.

相關內容

Learning

關注 12

情景 · 控制器 · Continuity · 凸集 · 矩 ·

2023 年 1 月 23 日

(Non)-penalized Multilevel methods for non-uniformly log-concave distributions

Maxime Egéa

We study and develop multilevel methods for the numerical approximation of a log-concave probability $\pi$ on $\mathbb{R}^d$, based on (over-damped) Langevin diffusion. In the continuity of \cite{art:egeapanloup2021multilevel} concentrated on the uniformly log-concave setting, we here study the procedure in the absence of the uniformity assumption. More precisely, we first adapt an idea of \cite{art:DalalyanRiouKaragulyan} by adding a penalization term to the potential to recover the uniformly convex setting. Such approach leads to an \textit{$\varepsilon$-complexity} of the order $\varepsilon^{-5} \pi(|.|^2)^{3} d$ (up to logarithmic terms). Then, in the spirit of \cite{art:gadat2020cost}, we propose to explore the robustness of the method in a weakly convex parametric setting where the lowest eigenvalue of the Hessian of the potential $U$ is controlled by the function $U(x)^{-r}$ for $r \in (0,1)$. In this intermediary framework between the strongly convex setting ($r=0$) and the ``Laplace case'' ($r=1$), we show that with the help of the control of exponential moments of the Euler scheme, we can adapt some fundamental properties for the efficiency of the method. In the ``best'' setting where $U$ is ${\mathcal{C}}^3$ and $U(x)^{-r}$ control the largest eigenvalue of the Hessian, we obtain an $\varepsilon$-complexity of the order $c_{\rho,\delta}\varepsilon^{-2-\rho} d^{1+\frac{\rho}{2}+(4-\rho+\delta) r}$ for any $\rho>0$ (but with a constant $c_{\rho,\delta}$ which increases when $\rho$ and $\delta$ go to $0$).

統計量 · 類別 · 極小點 · 蒙特卡羅 · SimPLe ·

2023 年 1 月 23 日

On a new class of tests for the Pareto distribution using Fourier methods

L. Ndwandwe,J. S. Allison,M. Smuts,I. J. H. Visagie

We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternatives. A Monte Carlo study is included in which the newly proposed tests are shown to produce high powers. These powers include results relating to fixed alternatives as well as local powers against mixture distributions. The use of the proposed tests is illustrated using an observed data set.

Markov · 估計/估計量 · 方陣 · 泛化理論 · 線性的 ·

2023 年 1 月 20 日

Likelihood-based generalization of Markov parameter estimation and multiple shooting objectives in system identification

Nicholas Galioto,Alex Arkady Gorodetsky

This paper considers the problem of system identification (ID) of linear and nonlinear non-autonomous systems from noisy and sparse data. We propose and analyze an objective function derived from a Bayesian formulation for learning a hidden Markov model with stochastic dynamics. We then analyze this objective function in the context of several state-of-the-art approaches for both linear and nonlinear system ID. In the former, we analyze least squares approaches for Markov parameter estimation, and in the latter, we analyze the multiple shooting approach. We demonstrate the limitations of the optimization problems posed by these existing methods by showing that they can be seen as special cases of the proposed optimization objective under certain simplifying assumptions: conditional independence of data and zero model error. Furthermore, we observe that our proposed approach has improved smoothness and inherent regularization that make it well-suited for system ID and provide mathematical explanations for these characteristics' origins. Finally, numerical simulations demonstrate a mean squared error over 8.7 times lower compared to multiple shooting when data are noisy and/or sparse. Moreover, the proposed approach can identify accurate and generalizable models even when there are more parameters than data or when the underlying system exhibits chaotic behavior.

估計/估計量 · Performer · 閾值 · 情景 · 可辨認的 ·

2023 年 1 月 20 日

Few Quantum Algorithms on Amplitude Distribution

Debajyoti Bera,Tharrmashastha Sapv

from arxiv, 35 pages

Amplitude filtering is concerned with identifying basis-states in a superposition whose amplitudes are greater than a specified threshold; probability filtering is defined analogously for probabilities. Given the scarcity of qubits, the focus of this work is to design log-space algorithms for them. Both algorithms follow a similar pattern of estimating the amplitude (or, probability for the latter problem) of each state, in superposition, then comparing each estimate against the threshold for setting up a flag qubit upon success, finally followed by amplitude amplification of states in which the flag is set. We show how to implement each step using very few qubits by designing three subroutines. Our first algorithm performs amplitude amplification even when the "good state'' operator has a small probability of being incorrect -- here we improve upon the space complexity of the previously known algorithms. Our second algorithm performs "true amplitude estimation'' in roughly the same complexity as that of "amplitude estimation'', which actually estimates a probability instead of an amplitude. Our third algorithm is for performing amplitude estimation in parallel (superposition) which is difficult when each estimation branch involves different oracles. As an immediate reward, we observed that the above algorithms for the filtering problems directly improved the upper bounds on the space-bounded query complexity of problems such as non-linearity estimation of Boolean functions and $k$-distinctness.

潛變量/隱變量 · MoDELS · 潛在 · 信息先驗 · 正定 ·

2023 年 1 月 20 日

Opaque prior distributions in Bayesian latent variable models

Edgar C. Merkle,Oludare Ariyo,Sonja D. Winter,Mauricio Garnier-Villarreal

from arxiv, 23 pages, 8 figures

We review common situations in Bayesian latent variable models where the prior distribution that a researcher specifies does not match the prior distribution that the estimation method uses. These situations can arise from the positive definite requirement on correlation matrices, from the sign indeterminacy of factor loadings, and from order constraints on threshold parameters. The issue is especially problematic for reproducibility and for model checks that involve prior distributions, including prior predictive assessment and Bayes factors. In these cases, one might be assessing the wrong model, casting doubt on the relevance of the results. The most straightforward solution to these issues sometimes involves use of informative prior distributions. We explore other solutions and make recommendations for practice.

情景 · 稀疏 · 坐標下降 · 可分離的 · 線性模型 ·

2023 年 1 月 20 日

Tight bounds for maximum $\ell_1$-margin classifiers

Stefan Stojanovic,Konstantin Donhauser,Fanny Yang

Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|w^*\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier.

有偏 · MoDELS · Learning · 貪心 · 基 ·

2023 年 1 月 19 日

General Greedy De-bias Learning

Xinzhe Han,Shuhui Wang,Chi Su,Qingming Huang,Qi Tian

from arxiv, This work has been accepted by IEEE T-PAMI. Copyright is transferred without notice, after which this version may no longer be accessible

Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by special design low capability biased models or losses, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model. The base model is encouraged to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.

Prompt · 學成 · Extensibility · 替代損失 · 講稿 ·

2022 年 5 月 6 日

Prompt Distribution Learning

Yuning Lu,Jianzhuang Liu,Yonggang Zhang,Yajing Liu,Xinmei Tian

from arxiv, Accepted by CVPR 2022

We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.

MoDELS · Performer · Processing（編程語言） · 學成 · 穩健性 ·

2021 年 9 月 3 日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Paul Michel

from arxiv, PhD thesis

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

學成 · 強化學習 · 確定性策略 · 稀疏編碼 · Performer ·

2021 年 1 月 7 日

Coding for Distributed Multi-Agent Reinforcement Learning

Baoqian Wang,Junfei Xie,Nikolay Atanasov

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.