99视频在线播放喷射_欧美日韩一区不卡在线看片_精品国产亚洲一区人成在线观看_黄色免费小视频网站_亚洲欧美日韩一区二区视_在线视频免费视频_四五月婷婷色播播

Carlo Metta,Marco Fantozzi,Andrea Papini,Gianluca Amato,Matteo Bergamaschi,Silvia Giulia Galfrè,Alessandro Marchetti,Michelangelo Vegliò,Maurizio Parton,Francesco Morandin

from arxiv, Major rewriting. Supersedes v1 and v2. Focusing on the fact that not all parameters are born equal: biases can be more important than weights. Accordingly, new title and new abstract, and many more experiments on fully connected architectures. This is the extended version of the paper published at WACV 2024

We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model's performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code at //github.com/CuriosAI/dac-dev.

相關內容

有偏

關注 0

Networking · 殘差網絡 · 正則化項 · 離散化 · MoDELS ·

2024 年 3 月 1 日

Implicit regularization of deep residual networks towards neural ODEs

Pierre Marion,Yu-Han Wu,Michael E. Sander,Gérard Biau

from arxiv, ICLR 2024 (spotlight). 40 pages, 3 figures

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.

穩健性 · 正則化項 · 稀疏 · Performer · 特征選擇 ·

2024 年 3 月 1 日

CR-Lasso: Robust cellwise regularized sparse regression

Peng Su,Garth Tarr,Samuel Muller,Suojin Wang

Cellwise contamination remains a challenging problem for data scientists, particularly in research fields that require the selection of sparse features. Traditional robust methods may not be feasible nor efficient in dealing with such contaminated datasets. We propose CR-Lasso, a robust Lasso-type cellwise regularization procedure that performs feature selection in the presence of cellwise outliers by minimising a regression loss and cell deviation measure simultaneously. To evaluate the approach, we conduct empirical studies comparing its selection and prediction performance with several sparse regression methods. We show that CR-Lasso is competitive under the settings considered. We illustrate the effectiveness of the proposed method on real data through an analysis of a bone mineral density dataset.

生成方法 · MoDELS · 論文 · binary · 受試者工作特征 ·

2024 年 2 月 29 日

Resolving power: A general approach to compare the distinguishing ability of threshold-free evaluation metrics

Colin S. Beam

from arxiv, 20 pages, 9 figures, 2 tables

Selecting an evaluation metric is fundamental to model development, but uncertainty remains about when certain metrics are preferable and why. This paper introduces the concept of resolving power to describe the ability of an evaluation metric to distinguish between binary classifiers of similar quality. This ability depends on two attributes: 1. The metric's response to improvements in classifier quality (its signal), and 2. The metric's sampling variability (its noise). The paper defines resolving power generically as a metric's sampling uncertainty scaled by its signal. The primary application of resolving power is to assess threshold-free evaluation metrics, such as the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation study compares the AUROC and the AUPRC in a variety of contexts. It finds that the AUROC generally has greater resolving power, but that the AUPRC is better when searching among high-quality classifiers applied to low prevalence outcomes. The paper concludes by proposing an empirical method to estimate resolving power that can be applied to any dataset and any initial classification model.

Networking · 優化器 · 向量化 · Weight · 邊 ·

2024 年 2 月 29 日

More algorithmic results for problems of spread of influence in edge-weighted graphs with and without incentives

Siavash Askari,Manouchehr Zaker

Many phenomena in real world social networks are interpreted as spread of influence between activated and non-activated network elements. These phenomena are formulated by combinatorial graphs, where vertices represent the elements and edges represent social ties between elements. A main problem is to study important subsets of elements (target sets or dynamic monopolies) such that their activation spreads to the entire network. In edge-weighted networks the influence between two adjacent vertices depends on the weight of their edge. In models with incentives, the main problem is to minimize total amount of incentives (called optimal target vectors) which can be offered to vertices such that some vertices are activated and their activation spreads to the whole network. Algorithmic study of target sets and vectors is a hot research field. We prove an inapproximability result for optimal target sets in edge weighted networks even for complete graphs. Some other hardness and polynomial time results are presented for optimal target vectors and degenerate threshold assignments in edge-weighted networks.

線性的 · 優化器 · 穩健性 · 約束 · 噪聲 ·

2024 年 2 月 28 日

Linear shrinkage for optimization in high dimensions

Naqi Huang,Nestor Parolya,Thereisa van Essen

In large-scale, data-driven applications, parameters are often only known approximately due to noise and limited data samples. In this paper, we focus on high-dimensional optimization problems with linear constraints under uncertain conditions. To find high quality solutions for which the violation of the true constraints is limited, we develop a linear shrinkage method that blends random matrix theory and robust optimization principles. It aims to minimize the Frobenius distance between the estimated and the true parameter matrix, especially when dealing with a large and comparable number of constraints and variables. This data-driven method excels in simulations, showing superior noise resilience and more stable performance in both obtaining high quality solutions and adhering to the true constraints compared to traditional robust optimization. Our findings highlight the effectiveness of our method in improving the robustness and reliability of optimization in high-dimensional, data-driven scenarios.

規范化的 · 估計/估計量 · CASE · 損失函數（機器學習） · 重參數化技巧 ·

2024 年 2 月 28 日

Training normalizing flows with computationally intensive target probability distributions

Piotr Bialas,Piotr Korcyl,Tomasz Stebel

from arxiv, 16 pages, 5 figures, 6 tables, 3 listings. Revised version as published in CPC. Added results for a other values of hoping parameter

Machine learning techniques, in particular the so-called normalizing flows, are becoming increasingly popular in the context of Monte Carlo simulations as they can effectively approximate target probability distributions. In the case of lattice field theories (LFT) the target distribution is given by the exponential of the action. The common loss function's gradient estimator based on the "reparametrization trick" requires the calculation of the derivative of the action with respect to the fields. This can present a significant computational cost for complicated, non-local actions like e.g. fermionic action in QCD. In this contribution, we propose an estimator for normalizing flows based on the REINFORCE algorithm that avoids this issue. We apply it to two dimensional Schwinger model with Wilson fermions at criticality and show that it is up to ten times faster in terms of the wall-clock time as well as requiring up to $30\%$ less memory than the reparameterization trick estimator. It is also more numerically stable allowing for single precision calculations and the use of half-float tensor cores. We present an in-depth analysis of the origins of those improvements. We believe that these benefits will appear also outside the realm of the LFT, in each case where the target probability distribution is computationally intensive.

多樣性 · 查全率/召回率 · 查準率/準確率 · MoDELS · Nuance ·

2024 年 2 月 28 日

Exploring Precision and Recall to assess the quality and diversity of LLMs

Florian Le Bronnec,Alexandre Verine,Benjamin Negrevergne,Yann Chevaleyre,Alexandre Allauzen

from arxiv, 21 pages, 15 figures, Under Review

This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and Mistral, focusing on the adaptation of Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals significant insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges faced by current LLMs in generating diverse and high-quality text.

線性的 · Weight · 相互獨立的 · TOOLS · CASE ·

2024 年 2 月 28 日

Log-concave poset inequalities

Swee Hong Chan,Igor Pak

from arxiv, 70 pages, 4 figures. In v3 additional references are added and typos found by referees are fixed

We study combinatorial inequalities for various classes of set systems: matroids, polymatroids, poset antimatroids, and interval greedoids. We prove log-concavity inequalities for counting certain weighted feasible words, which generalize and extend several previous results establishing Mason conjectures for the numbers of independent sets of matroids. Notably, we prove matching equality conditions for both earlier inequalities and our extensions. In contrast with much of the previous work, our proofs are combinatorial and employ nothing but linear algebra. We use the language formulation of greedoids which allows a linear algebraic setup, which in turn can be analyzed recursively. The underlying non-commutative nature of matrices associated with greedoids allows us to proceed beyond polymatroids and prove the equality conditions. As further application of our tools, we rederive both Stanley's inequality on the number of certain linear extensions, and its equality conditions, which we then also extend to the weighted case.

DAG · MoDELS · Processing（編程語言） · 有向非循環圖 · Extensibility ·

2024 年 2 月 27 日

A graphical multi-fidelity Gaussian process model, with application to emulation of heavy-ion collisions

Yi Ji,Simon Mak,Derek Soeder,J-F Paquet,Steffen A. Bass

With advances in scientific computing and mathematical modeling, complex scientific phenomena such as galaxy formations and rocket propulsion can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of different fidelities to train an efficient predictive model which emulates the expensive simulator. For complex scientific problems and with careful elicitation from scientists, such multi-fidelity data may often be linked by a directed acyclic graph (DAG) representing its scientific model dependencies. We thus propose a new Graphical Multi-fidelity Gaussian Process (GMGP) model, which embeds this DAG structure (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable algorithm for recursive computation of the posterior mean and variance along at each depth level of the DAG. We also present a novel experimental design methodology over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP via deep Gaussian processes. The advantages of the GMGP are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang. The proposed model has broader uses in data fusion applications with graphical structure, which we further discuss.

Continuity · state-of-the-art · 學成 · Extensibility · Networking ·

2021 年 4 月 16 日

A continual learning survey: Defying forgetting in classification tasks

Matthias De Lange,Rahaf Aljundi,Marc Masana,Sarah Parisot,Xu Jia,Ales Leonardis,Gregory Slabaugh,Tinne Tuytelaars

from arxiv, Accepted TPAMI paper, including Appendix, code publicly available

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.