久久久久久久精品少妇9999,97国产精品无码免费视频,蜜臀AV秘无码一区二区三区,欧美午夜福利在线观看,国产精品久久久一区无码AV

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods.

相關內容

曲率

關注 1

估計/估計量 · Processing（編程語言） · 超參數 · 無偏 · 優化器 ·

2023 年 6 月 12 日

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

Huong Ha,Vu Nguyen,Hongyu Zhang,Anton van den Hengel

from arxiv, 23 pages, 5 figures

Gaussian process (GP) based Bayesian optimization (BO) is a powerful method for optimizing black-box functions efficiently. The practical performance and theoretical guarantees associated with this approach depend on having the correct GP hyperparameter values, which are usually unknown in advance and need to be estimated from the observed data. However, in practice, these estimations could be incorrect due to biased data sampling strategies commonly used in BO. This can lead to degraded performance and break the sub-linear global convergence guarantee of BO. To address this issue, we propose a new BO method that can sub-linearly converge to the global optimum of the objective function even when the true GP hyperparameters are unknown in advance and need to be estimated from the observed data. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures unbiased estimation from the observed data. We further provide theoretical analysis of our proposed method. Finally, we demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.

線性的 · 稀疏 · Weight · 超平面 · 成比例 ·

2023 年 6 月 12 日

A Weighted Randomized Sparse Kaczmarz Method for Solving Linear Systems

Lu Zhang,Ziyang Yuan,Hongxia Wang,Hui Zhang

from arxiv, Dr. Lionel N. Tondji kindly reminded us that their 2021 conference paper shares some similarity with ours. We feel very sorry that we had not noticed their work before we submitted our paper. Here, we claim that the method was independently proposed by both of us. Moreover, we also consider an exact step-size version and the effect of the weighted parameter with detailed theory

The randomized sparse Kaczmarz method, designed for seeking the sparse solutions of the linear systems $Ax=b$, selects the $i$-th projection hyperplane with likelihood proportional to $\|a_{i}\|_2^2$, where $a_{i}^T$ is $i$-th row of $A$. In this work, we propose a weighted randomized sparse Kaczmarz method, which selects the $i$-th projection hyperplane with probability proportional to $\lvert\langle a_{i},x_{k}\rangle-b_{i}\rvert^p$, where $0<p<\infty$, for possible acceleration. It bridges the randomized Kaczmarz and greedy Kaczmarz by parameter $p$. Theoretically, we show its linear convergence rate in expectation with respect to the Bregman distance in the noiseless and noisy cases, which is at least as efficient as the randomized sparse Kaczmarz method. The superiority of the proposed method is demonstrated via a group of numerical experiments.

近似 · 論文 · 幾乎必然收斂 · 幾乎必然 · 駐點 ·

2023 年 6 月 10 日

Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients

Tadipatri Uday Kiran Reddy,Mathukumalli Vidyasagar

from arxiv, 33 pages, 6 figures

In this paper, we study the well-known "Heavy Ball" method for convex and nonconvex optimization introduced by Polyak in 1964, and establish its convergence under a variety of situations. Traditionally, most algorithms use "full-coordinate update," that is, at each step, every component of the argument is updated. However, when the dimension of the argument is very high, it is more efficient to update some but not all components of the argument at each iteration. We refer to this as "batch updating" in this paper. When gradient-based algorithms are used together with batch updating, in principle it is sufficient to compute only those components of the gradient for which the argument is to be updated. However, if a method such as backpropagation is used to compute these components, computing only some components of gradient does not offer much savings over computing the entire gradient. Therefore, to achieve a noticeable reduction in CPU usage at each step, one can use first-order differences to approximate the gradient. The resulting estimates are biased, and also have unbounded variance. Thus some delicate analysis is required to ensure that the HB algorithm converge when batch updating is used instead of full-coordinate updating, and/or approximate gradients are used instead of true gradients. In this paper, we establish the almost sure convergence of the iterations to the stationary point(s) of the objective function under suitable conditions; in addition, we also derive upper bounds on the rate of convergence. To the best of our knowledge, there is no other paper that combines all of these features. This paper is dedicated to the memory of Boris Teodorovich Polyak

模型評估 · Networking · Learning · 逼真度 · 分解的 ·

2023 年 6 月 10 日

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Dongwon Son,Beomjoon Kim

from arxiv, RSS 2023 //sites.google.com/view/locc-rss2023/home

Our goal is to develop an efficient contact detection algorithm for large-scale GPU-based simulation of non-convex objects. Current GPU-based simulators such as IsaacGym and Brax must trade-off speed with fidelity, generality, or both when simulating non-convex objects. Their main issue lies in contact detection (CD): existing CD algorithms, such as Gilbert-Johnson-Keerthi (GJK), must trade off their computational speed with accuracy which becomes expensive as the number of collisions among non-convex objects increases. We propose a data-driven approach for CD, whose accuracy depends only on the quality and quantity of offline dataset rather than online computation time. Unlike GJK, our method inherently has a uniform computational flow, which facilitates efficient GPU usage based on advanced compilers such as XLA (Accelerated Linear Algebra). Further, we offer a data-efficient solution by learning the patterns of colliding local crop object shapes, rather than global object shapes which are harder to learn. We demonstrate our approach improves the efficiency of existing CD methods by a factor of 5-10 for non-convex objects with comparable accuracy. Using the previous work on contact resolution for a neural-network-based contact detector, we integrate our CD algorithm into the open-source GPU-based simulator, Brax, and show that we can improve the efficiency over IsaacGym and generality over standard Brax. We highly recommend the videos of our simulator included in the supplementary materials.

近似 · 泛函 · 優化器 · 估計/估計量 · 樣例 ·

2023 年 6 月 10 日

Approximations of Time-Dependent Nonlinear Partial Differential Equations using Galerkin Optimal Auxiliary Function Method

Nilormy Gupta Trisha,Md. Shafiqul Islam

from arxiv, Accepted for Publication into the journal GANIT: Journal of Bangladesh Mathematical Society, 2023

The purpose of this research work is to employ the Optimal Auxiliary Function Method (OAFM) for obtaining numerical approximations of time-dependent nonlinear partial differential equations (PDEs) that arise in many disciplines of science and engineering. The initial and first approximations of parabolic nonlinear PDEs associated with initial conditions have been generated by utilizing this method. Then the Galerkin method is applied to estimate the coefficients that remain unknown. Finally, the values of the coefficients generated by the Galerkin method have been inserted into the first approximation. In each example, all numerical computations and corresponding absolute errors are provided in schematic and tabular representations. The rate of convergence attained by the proposed method is depicted in tabular form

估計/估計量 · 線性的 · Weight · 情景 · 漸近無偏 ·

2023 年 6 月 9 日

Causal Effect Estimation from Observational and Interventional Data Through Matrix Weighted Linear Estimators

Klaus-Rudolf Kladny,Julius von Kügelgen,Bernhard Sch?lkopf,Michael Muehlebach

We study causal effect estimation from a mixture of observational and interventional data in a confounded linear regression model with multivariate treatments. We show that the statistical efficiency in terms of expected squared error can be improved by combining estimators arising from both the observational and interventional setting. To this end, we derive methods based on matrix weighted linear estimators and prove that our methods are asymptotically unbiased in the infinite sample limit. This is an important improvement compared to the pooled estimator using the union of interventional and observational data, for which the bias only vanishes if the ratio of observational to interventional data tends to zero. Studies on synthetic data confirm our theoretical findings. In settings where confounding is substantial and the ratio of observational to interventional data is large, our estimators outperform a Stein-type estimator and various other baselines.

泛化理論 · MoDELS · 情景 · 分解的 · Vision ·

2023 年 6 月 9 日

Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects

Zhuofan Ying,Peter Hase,Mohit Bansal

from arxiv, 21 pages, 12 figures. Our code is available at //github.com/zfying/AdaptiveContext

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either irrelevant (Background-Invariance) or beneficial (Object-Disambiguation), reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt in the other. This underscores the importance of generalizing across both OOD settings, as this ability is crucial for both human cognition and robust AI systems. Next, to better understand the model properties contributing to OOD generalization, we use representational geometry analysis and our own probing methods to examine a population of models, and we discover that those with more factorized representations and appropriate feature weighting are more successful in handling Background-Invariance and Object-Disambiguation tests. We further validate these findings through causal intervention on representation factorization and feature weighting to demonstrate their causal effect on performance. Lastly, we propose new augmentation methods to enhance model generalization. These methods outperform strong baselines, yielding improvements in both in-distribution and OOD tests. In conclusion, to replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations and appropriately weight both kinds of features.

Performer · 類別 · 優化器 · Analysis · 能量函數 ·

2023 年 6 月 9 日

Newton-based alternating methods for the ground state of a class of multi-component Bose-Einstein condensates

Pengfei Huang,Qingzhi Yang

The computation of the ground states of special multi-component Bose-Einstein condensates (BECs) can be formulated as an energy functional minimization problem with spherical constraints. It leads to a nonconvex quartic-quadratic optimization problem after suitable discretizations. First, we generalize the Newton-based methods for single-component BECs to the alternating minimization scheme for multi-component BECs. Second, the global convergent alternating Newton-Noda iteration (ANNI) is proposed. In particular, we prove the positivity preserving property of ANNI under mild conditions. Finally, our analysis is applied to a class of more general "multi-block" optimization problems with spherical constraints. Numerical experiments are performed to evaluate the performance of proposed methods for different multi-component BECs, including pseudo spin-1/2, anti-ferromagnetic spin-1 and spin-2 BECs. These results support our theory and demonstrate the efficiency of our algorithms.

貪心 · 模態 · MoDELS · 學成 · 泛化理論 ·

2022 年 2 月 10 日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Nan Wu,Stanis?aw Jastrz?bski,Kyunghyun Cho,Krzysztof J. Geras

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.