青柠在线观看免费高清1,成人亚洲国产综合精品夜色,日韩精品人妻无码视频一区

We study the problem of maximum likelihood (ML) estimation for statistical models defined by reflexive polytopes. Our focus is on the maximum likelihood degree of these models as an algebraic measure of complexity of the corresponding optimization problem. We compute the ML degrees of all 4319 classes of three-dimensional reflexive polytopes, and observe some surprising behavior in terms of the presence of gaps between ML degrees and degrees of the associated toric varieties. We interpret these drops in the context of discriminants and prove formulas for the ML degree for families of reflexive polytopes, including the hypercube and its dual, the cross polytope, in arbitrary dimension. In particular, we determine a family of embeddings for the $d$-cube that implies ML degree one. Finally, we discuss generalized constructions of families of reflexive polytopes in terms of their ML degrees.

相關內容

關注 2

分解的 · 決策樹 · Learning · PAC學習 · PAC學習理論 ·

2024 年 7 月 1 日

Superconstant Inapproximability of Decision Tree Learning

Caleb Koch,Carmen Strassle,Li-Yang Tan

from arxiv, 29 pages, 5 figures, COLT 2024

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if $T$ is only required to be close to optimal, say within a factor of 2, rather than exactly optimal. We answer this affirmatively and show that the task indeed remains NP-hard even if $T$ is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.

泛函 · Networking · 情景 · MoDELS · 講稿 ·

2024 年 7 月 1 日

Immediate Neighbours of Monotone Boolean Functions

José E. R. Cury,Patrícia Tenera Roxo,Vasco Manquinho,Claudine Chaouiya,Pedro T. Monteiro

from arxiv, arXiv admin note: text overlap with arXiv:1901.07623

Boolean networks constitute relevant mathematical models to study the behaviours of genetic and signalling networks. These networks define regulatory influences between molecular nodes, each being associated to a Boolean variable and a regulatory (local) function specifying its dynamical behaviour depending on its regulators. However, existing data is mostly insufficient to adequately parametrise a model, that is to uniquely define a regulatory function for each node. With the intend to support model parametrisation, this paper presents results on the set of Boolean functions compatible with a given regulatory structure, i.e. the partially ordered set of monotone non-degenerate Boolean functions. More precisely, we present original rules to obtain the direct neighbours of any function of this set. Besides a theoretical interest, presented results will enable the development of more efficient methods for Boolean network synthesis and revision, benefiting from the progressive exploration of the vicinity of regulatory functions.

剪枝 · MoDELS · HTTPS · 多樣性 · 在線 ·

2024 年 6 月 28 日

PruningBench: A Comprehensive Benchmark of Structural Pruning

Haoling Li,Changhao Li,Mengqi Xue,Gongfan Fang,Sheng Zhou,Zunlei Feng,Huiqiong Wang,Yong Wang,Lechao Cheng,Mingli Song,Jie Song

from arxiv, Submitted to NeurIPS 2024 Datasets and Benchmarks Track

Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three characteristics: 1) PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques; 2) PruningBench systematically evaluates 16 existing pruning methods, encompassing a wide array of models (e.g., CNNs and ViTs) and tasks (e.g., classification and detection); 3) PruningBench provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards. We provide an online pruning platform //pruning.vipazoo.cn for customizing pruning tasks and reproducing all results in this paper. Codes will be made publicly on //github.com/HollyLee2000/PruningBench.

推斷 · 語言模型化 · 大語言模型 · FAST · MoDELS ·

2024 年 6 月 28 日

Distributed Speculative Inference of Large Language Models

Nadav Timor,Jonathan Mamou,Daniel Korat,Moshe Berchansky,Oren Pereg,Moshe Wasserblat,Tomer Galanti,Michal Gordon,David Harel

Accelerating the inference of large language models (LLMs) is an important challenge in artificial intelligence. This paper introduces distributed speculative inference (DSI), a novel distributed inference algorithm that is provably faster than speculative inference (SI) [leviathan2023fast, chen2023accelerating, miao2023specinfer] and traditional autoregressive inference (non-SI). Like other SI algorithms, DSI works on frozen LLMs, requiring no training or architectural modifications, and it preserves the target distribution. Prior studies on SI have demonstrated empirical speedups (compared to non-SI) but require a fast and accurate drafter LLM. In practice, off-the-shelf LLMs often do not have matching drafters that are sufficiently fast and accurate. We show a gap: SI gets slower than non-SI when using slower or less accurate drafters. We close this gap by proving that DSI is faster than both SI and non-SI given any drafters. By orchestrating multiple instances of the target and drafters, DSI is not only faster than SI but also supports LLMs that cannot be accelerated with SI. Our simulations show speedups of off-the-shelf LLMs in realistic settings: DSI is 1.29-1.92x faster than SI.

簇 · MoDELS · Performer · 統計量 · Processing（編程語言） ·

2024 年 6 月 28 日

Hierarchical Mixture of Finite Mixtures

Alessandro Colombi,Raffaele Argiento,Federico Camerlenghi,Lucia Paci

Statistical modelling in the presence of data organized in groups is a crucial task in Bayesian statistics. The present paper conceives a mixture model based on a novel family of Bayesian priors designed for multilevel data and obtained by normalizing a finite point process. In particular, the work extends the popular Mixture of Finite Mixture model to the hierarchical framework to capture heterogeneity within and between groups. A full distribution theory for this new family and the induced clustering is developed, including the marginal, posterior, and predictive distributions. Efficient marginal and conditional Gibbs samplers are designed to provide posterior inference. The proposed mixture model overcomes the Hierarchical Dirichlet Process, the utmost tool for handling multilevel data, in terms of analytical feasibility, clustering discovery, and computational time. The motivating application comes from the analysis of shot put data, which contains performance measurements of athletes across different seasons. In this setting, the proposed model is exploited to induce clustering of the observations across seasons and athletes. By linking clusters across seasons, similarities and differences in athletes' performances are identified.

暫退法 · 自編碼器 · 數據點 · MoDELS · 數據集 ·

2024 年 6 月 28 日

Generative Autoencoding of Dropout Patterns

Shunta Maeda

We propose a generative model termed Deciphering Autoencoders. In this model, we assign a unique random dropout pattern to each data point in the training dataset and then train an autoencoder to reconstruct the corresponding data point using this pattern as information to be encoded. Even if a completely random dropout pattern is assigned to each data point regardless of their similarities, a sufficiently large encoder can smoothly map them to a low-dimensional latent space to reconstruct individual training data points. During inference, using a dropout pattern different from those used during training allows the model to function as a generator. Since the training of Deciphering Autoencoders relies solely on reconstruction error, it offers more stable training compared to other generative models. Despite their simplicity, Deciphering Autoencoders show sampling quality comparable to DCGAN on the CIFAR-10 dataset.

SSL · Learning · 可理解性 · 情景 · 路徑 ·

2023 年 4 月 24 日

A Cookbook of Self-Supervised Learning

Randall Balestriero,Mark Ibrahim,Vlad Sobal,Ari Morcos,Shashank Shekhar,Tom Goldstein,Florian Bordes,Adrien Bardes,Gregoire Mialon,Yuandong Tian,Avi Schwarzschild,Andrew Gordon Wilson,Jonas Geiping,Quentin Garrido,Pierre Fernandez,Amir Bar,Hamed Pirsiavash,Yann LeCun,Micah Goldblum

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.

Learning · 泛化理論 · 概率近似正確 · 泛化誤差 · 少試學習 ·

2022 年 7 月 29 日

A Survey of Learning on Small Data

Xiaofeng Cao,Weixin Bu,Shengjun Huang,Yingpeng Tang,Yaming Guo,Yi Chang,Ivor W. Tsang

Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.

Networking · 學成 · Principle · MoDELS · Networks ·

2021 年 6 月 18 日

The Principles of Deep Learning Theory

Daniel A. Roberts,Sho Yaida,Boris Hanin

from arxiv, 451 pages, to be published by Cambridge University Press

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.