在线直接观看免费的黄片视频-尤物视频一区二区

A sharp, distribution free, non-asymptotic result is proved for the concentration of a random function around the mean function, when the randomization is generated by a finite sequence of independent data and the random functions satisfy uniform bounded variation assumptions. The specific motivation for the work comes from the need for inference on the distributional impacts of social policy intervention. However, the family of randomized functions that we study is broad enough to cover wide-ranging applications. For example, we provide a Kolmogorov-Smirnov like test for randomized functions that are almost surely Lipschitz continuous, and novel tools for inference with heterogeneous treatment effects. A Dvoretzky-Kiefer-Wolfowitz like inequality is also provided for the sum of almost surely monotone random functions, extending the famous non-asymptotic work of Massart for empirical cumulative distribution functions generated by i.i.d. data, to settings without micro-clusters proposed by Canay, Santos, and Shaikh. We illustrate the relevance of our theoretical results for applied work via empirical applications. Notably, the proof of our main concentration result relies on a novel stochastic rendition of the fundamental result of Debreu, generally dubbed the "gap lemma," that transforms discontinuous utility representations of preorders into continuous utility representations, and on an envelope theorem of an infinite dimensional optimisation problem that we carefully construct.

相關內容

泛函

關注 0

文本分類 · 大語言模型 · 基學習器 · 基 · 學習器 ·

2024 年 2 月 12 日

Pushing The Limit of LLM Capacity for Text Classification

Yazhou Zhang,Mengyao Wang,Chenyu Ren,Qiuchi Li,Prayag Tiwari,Benyou Wang,Jing Qin

The value of text classification's future research has encountered challenges and uncertainties, due to the extraordinary efficacy demonstrated by large language models (LLMs) across numerous downstream NLP tasks. In this era of open-ended language modeling, where task boundaries are gradually fading, an urgent question emerges: have we made significant advances in text classification under the full benefit of LLMs? To answer this question, we propose RGPT, an adaptive boosting framework tailored to produce a specialized text classification LLM by recurrently ensembling a pool of strong base learners. The base learners are constructed by adaptively adjusting the distribution of training samples and iteratively fine-tuning LLMs with them. Such base learners are then ensembled to be a specialized text classification LLM, by recurrently incorporating the historical predictions from the previous learners. Through a comprehensive empirical comparison, we show that RGPT significantly outperforms 8 SOTA PLMs and 7 SOTA LLMs on four benchmarks by 1.36% on average. Further evaluation experiments show a clear surpassing of RGPT over human classification.

穩健性 · 近似 · 情景 · 數值分析 ·

2024 年 2 月 10 日

Numerical Solution of Nonclassical Boundary Value Problems

Paola Boito,Yuli Eidelman,Luca Gemignani

We provide a new approach to obtain solutions of certain evolution equations set in a Banach space and equipped with nonlocal boundary conditions. From this approach we derive a family of numerical schemes for the approximation of the solutions. We show by numerical tests that these schemes are numerically robust and computationally efficient.

動力系統 · Learning · 泛函 · 優化器 · 情景 ·

2024 年 2 月 9 日

The Complexity of Sequential Prediction in Dynamical Systems

Vinod Raman,Unique Subedi,Ambuj Tewari

from arxiv, 35 pages

We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.

在線 · Minimax · 監督 · 類別 · 損失 ·

2024 年 2 月 9 日

A Combinatorial Characterization of Supervised Online Learnability

Vinod Raman,Unique Subedi,Ambuj Tewari

from arxiv, 20 pages. arXiv admin note: text overlap with arXiv:2306.06247

We study the online learnability of hypothesis classes with respect to arbitrary, but bounded loss functions. No characterization of online learnability is known at this level of generality. We give a new scale-sensitive combinatorial dimension, named the sequential minimax dimension, and show that it gives a tight quantitative characterization of online learnability. In addition, we show that the sequential minimax dimension subsumes most existing combinatorial dimensions in online learning theory.

MoDELS · 求逆 · 缺失值 · 通道 · Extensibility ·

2024 年 2 月 9 日

Probabilistic Forecasting of Irregular Time Series via Conditional Flows

Vijaya Krishna Yalavarthi,Randolf Scholz,Stefan Born,Lars Schmidt-Thieme

Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model.

泛函 · Machine Learning · Performer · Analysis · Learning ·

2024 年 2 月 9 日

Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms

Hannah Blocher,Georg Schollmeyer,Christoph Jansen,Malte Nalenz

from arxiv, Accepted to ISIPTA 2023; Forthcoming in: Proceedings of Machine Learning Research

We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.

樣本 · 去噪 · 能量函數 · 泛函 · 相互獨立的 ·

2024 年 2 月 9 日

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh,Jarrid Rector-Brooks,Avishek Joey Bose,Sarthak Mittal,Pablo Lemos,Cheng-Hao Liu,Marcin Sendera,Siamak Ravanbakhsh,Gauthier Gidel,Yoshua Bengio,Nikolay Malkin,Alexander Tong

from arxiv, Code for iDEM is available at //github.com/jarridrb/dem

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.

混合專家模型 · Buffer（公司） · 上溢 · MoDELS · GROUP ·

2024 年 2 月 8 日

Buffer Overflow in Mixture of Experts

Jamie Hayes,Ilia Shumailov,Itay Yona

Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-concept attack in a toy experimental setting.

高斯混合（模型） · 估計/估計量 · 泛函 · 損失函數（機器學習） · 混合專家模型 ·

2024 年 2 月 7 日

On Parameter Estimation in Deviated Gaussian Mixture of Experts

Huy Nguyen,Khai Nguyen,Nhat Ho

from arxiv, 34 pages, 3 figures

We consider the parameter estimation problem in the deviated Gaussian mixture of experts in which the data are generated from $(1 - \lambda^{\ast}) g_0(Y| X)+ \lambda^{\ast} \sum_{i = 1}^{k_{\ast}} p_{i}^{\ast} f(Y|(a_{i}^{\ast})^{\top}X+b_i^{\ast},\sigma_{i}^{\ast})$, where $X, Y$ are respectively a covariate vector and a response variable, $g_{0}(Y|X)$ is a known function, $\lambda^{\ast} \in [0, 1]$ is true but unknown mixing proportion, and $(p_{i}^{\ast}, a_{i}^{\ast}, b_{i}^{\ast}, \sigma_{i}^{\ast})$ for $1 \leq i \leq k^{\ast}$ are unknown parameters of the Gaussian mixture of experts. This problem arises from the goodness-of-fit test when we would like to test whether the data are generated from $g_{0}(Y|X)$ (null hypothesis) or they are generated from the whole mixture (alternative hypothesis). Based on the algebraic structure of the expert functions and the distinguishability between $g_0$ and the mixture part, we construct novel Voronoi-based loss functions to capture the convergence rates of maximum likelihood estimation (MLE) for our models. We further demonstrate that our proposed loss functions characterize the local convergence rates of parameter estimation more accurately than the generalized Wasserstein, a loss function being commonly used for estimating parameters in the Gaussian mixture of experts.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.