2021精品一级毛片一区二区-精品国产91久久久久久久下载

The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an appropriately defined Riemannian manifold. We provide numerical simulations which allow us to study and characterize the effect of the mirror potential on the performance of networks trained with SMD for some binary classification problems.

相關內容

Continuity

關注 4

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

有向 · 泛函 · 幾乎必然收斂 · 有限差分 · 幾乎必然 ·

2022 年 12 月 15 日

Stochastic Zeroth order Descent with Structured Directions

Marco Rando,Cesare Molinari,Silvia Villa,Lorenzo Rosasco

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form $O(d/l k^{-c})$ for every $c<1/2$, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using $l$ multiple directions instead of one. For non-convex functions satisfying the Polyak-{\L}ojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances.

多峰值 · MoDELS · Networking · 損失 · 輸出空間 ·

2022 年 12 月 14 日

Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Expert

Zhitong Gao,Yucong Chen,Chuyu Zhang,Xuming He

from arxiv, In submission

Equipping predicted segmentation with calibrated uncertainty is essential for safety-critical applications. In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images. Due to the high-dimensional output space and potential multiple modes in segmenting ambiguous images, it remains challenging to predict well-calibrated uncertainty for segmentation. To tackle this problem, we propose a novel mixture of stochastic experts (MoSE) model, where each expert network estimates a distinct mode of the aleatoric uncertainty and a gating network predicts the probabilities of an input image being segmented in those modes. This yields an efficient two-level uncertainty representation. To learn the model, we develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations. The loss can easily integrate traditional segmentation quality measures and be efficiently optimized via constraint relaxation. We validate our method on the LIDC-IDRI dataset and a modified multimodal Cityscapes dataset. Results demonstrate that our method achieves the state-of-the-art or competitive performance on all metrics.

優化器 · 圖 · 泛函 · 代價 · 門控循環單元 ·

2022 年 12 月 14 日

RAGO: Recurrent Graph Optimizer For Multiple Rotation Averaging

Heng Li,Zhaopeng Cui,Shuaicheng Liu,Ping Tan

from arxiv, Accepted by CVPR 2022

This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). Conventional optimization-based methods usually fail to produce accurate results due to corrupted and noisy relative measurements. Recent learning-based approaches regard MRA as a regression problem, while these methods are sensitive to initialization due to the gauge freedom problem. To handle these problems, we propose a learnable iterative graph optimizer minimizing a gauge-invariant cost function with an edge rectification strategy to mitigate the effect of inaccurate measurements. Our graph optimizer iteratively refines the global camera rotations by minimizing each node's single rotation objective function. Besides, our approach iteratively rectifies relative rotations to make them more consistent with the current camera orientations and observed relative rotations. Furthermore, we employ a gated recurrent unit to improve the result by tracing the temporal information of the cost graph. Our framework is a real-time learning-to-optimize rotation averaging graph optimizer with a tiny size deployed for real-world applications. RAGO outperforms previous traditional and deep methods on real-world and synthetic datasets. The code is available at //github.com/sfu-gruvi-3dv/RAGO

縮放 · MoDELS · contrastive · Performer · Learning ·

2022 年 12 月 14 日

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti,Romain Beaumont,Ross Wightman,Mitchell Wortsman,Gabriel Ilharco,Cade Gordon,Christoph Schuhmann,Ludwig Schmidt,Jenia Jitsev

from arxiv, Preprint. Under review

Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks including zero-shot classification, retrieval, linear probing, and end-to-end fine-tuning. We find that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures and similar training recipes. We open-source our evaluation workflow and all models, including the largest public CLIP models, to ensure reproducibility and make scaling laws research more accessible. Source code and instructions to reproduce this study will be available at //github.com/LAION-AI/scaling-laws-openclip

圖 · binary · 未標記 · 類別 · 團 ·

2022 年 12 月 14 日

Efficient Non-isomorphic Graph Enumeration Algorithms for Subclasses of Perfect Graphs

Jun Kawahara,Toshiki Saitoh,Hirokazu Takeda,Ryo Yoshinaka,Yui Yoshioka

from arxiv, Accepted to the 17th International Conference and Workshops on Algorithms and Computation (WALCOM 2023)

Intersection graphs are well-studied in the area of graph algorithms. Some intersection graph classes are known to have algorithms enumerating all unlabeled graphs by reverse search. Since these algorithms output graphs one by one and the numbers of graphs in these classes are vast, they work only for a small number of vertices. Binary decision diagrams (BDDs) are compact data structures for various types of data and useful for solving optimization and enumeration problems. This study proposes enumeration algorithms for five intersection graph classes, which admit $\mathrm{O}(n)$-bit string representations for their member graphs. Our algorithm for each class enumerates all unlabeled graphs with $n$ vertices over BDDs representing the binary strings in time polynomial in $n$. Moreover, our algorithms are extended to enumerate those with constraints on the maximum (bi)clique size and/or the number of edges.

Branch · 情景 · 約束 · 特征向量 · 向量化 ·

2022 年 12 月 14 日

Simplification of Forest Classifiers and Regressors

Atsuyoshi Nakamura,Kento Sakurada

We study the problem of sharing as many branching conditions of a given forest classifier or regressor as possible while keeping classification performance. As a constraint for preventing from accuracy degradation, we first consider the one that the decision paths of all the given feature vectors must not change. For a branching condition that a value of a certain feature is at most a given threshold, the set of values satisfying such constraint can be represented as an interval. Thus, the problem is reduced to the problem of finding the minimum set intersecting all the constraint-satisfying intervals for each set of branching conditions on the same feature. We propose an algorithm for the original problem using an algorithm solving this problem efficiently. The constraint is relaxed later to promote further sharing of branching conditions by allowing decision path change of a certain ratio of the given feature vectors or allowing a certain number of non-intersected constraint-satisfying intervals. We also extended our algorithm for both the relaxations. The effectiveness of our method is demonstrated through comprehensive experiments using 21 datasets (13 classification and 8 regression datasets in UCI machine learning repository) and 4 classifiers/regressors (random forest, extremely randomized trees, AdaBoost and gradient boosting).

估計/估計量 · 再生核希爾伯特空間 · 動力系統 · Learning · 操作 ·

2022 年 12 月 13 日

Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces

Vladimir Kostic,Pietro Novelli,Andreas Maurer,Carlo Ciliberto,Lorenzo Rosasco,Massimiliano Pontil

from arxiv, Main text: 10 pages, 2 figures, 1 table. Supplementary informations: 18 pages, 5 figures, 2 tables

We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for forecasting and mode decomposition.

Weight · Integration · Analysis · 平滑 · 在線 ·

2022 年 12 月 13 日

The leaky integrator that could: Or recursive polynomial regression for online signal analysis

Hugh L Kennedy

from arxiv, Added approximate calculation of bandwidth (i.e. frequency dispersion) for an Erlang window from its variance in the time domain (see Table 6)

Fitting a local polynomial model to a noisy sequence of uniformly sampled observations or measurements (i.e. regressing) by minimizing the sum of weighted squared errors (i.e. residuals) may be used to design digital filters for a diverse range of signal-analysis problems, such as detection, classification and tracking, in biomedical, financial, and aerospace applications, for instance. Furthermore, the recursive realization of such filters, using a network of so-called leaky integrators, yields simple digital components with a low computational complexity and an infinite impulse response (IIR) that are ideal in embedded online sensing systems with high data rates. Target tracking, pulse-edge detection, peak detection and anomaly/change detection are considered in this tutorial as illustrative examples. Erlang-weighted polynomial regression provides a design framework within which the various design trade-offs of state estimators (e.g. bias errors vs. random errors) and IIR smoothers (e.g. frequency isolation vs. time localization) may be intuitively balanced. Erlang weights are configured using a smoothing parameter which determines the decay rate of the exponential tail; and a shape parameter which may be used to discount more recent data, so that a greater relative emphasis is placed on a past time interval. In Morrison's 1969 treatise on sequential smoothing and prediction, the exponential weight (i.e. the zero shape-parameter case) and the Laguerre polynomials that are orthogonal with respect to this weight, are described in detail; however, more general Erlang weights and the resulting associated Laguerre polynomials are not considered there, nor have they been covered in detail elsewhere since. Thus, one of the purposes of this tutorial is to explain how Erlang weights may be used to shape and improve the response of recursive regression filters.

估計/估計量 · Performer · 圖像分割 · 值域 · 基準 ·

2022 年 12 月 12 日

Efficient Bayesian Uncertainty Estimation for nnU-Net

Yidong Zhao,Changchun Yang,Artur Schweidtmann,Qian Tao

The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via marginalizing multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.

MoDELS · 蒸餾 · 生成方法 · 馬爾可夫過程 · Markov ·

2022 年 12 月 12 日

A Generic Approach for Reproducible Model Distillation

Yunzhe Zhou,Peiru Xu,Giles Hooker

from arxiv, 31 pages, 8 figures

Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo sample. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process.