精品亚洲中文一区二区三区,亚洲精品无码中出中文字幕,韩国二区一区在线青青涩,免费一级黄片在线播放,精品无码少妇一区二区三区久久

We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We propose to address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all the past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme exhibits local $Q$-superlinear convergence with a non-asymptotic rate of $(\Upsilon\sqrt{\log (t)/t}\,)^{t}$, where $\Upsilon$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the method, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.

相關內容

可約的

關注 2

估計/估計量 · 方陣 · MoDELS · 損失函數（機器學習） · 規范化的 ·

2023 年 1 月 30 日

Profile least squares estimators in the monotone single index model

Fadoua Balabdaoui,Piet Groeneboom

from arxiv, 21 pages, 6 figures

We consider least squares estimators of the finite regression parameter $\alpha$ in the single index regression model $Y=\psi(\alpha^T X)+\epsilon$, where $X$ is a $d$-dimensional random vector, $\E(Y|X)=\psi(\alpha^T X)$, and where $\psi$ is monotone. It has been suggested to estimate $\alpha$ by a profile least squares estimator, minimizing $\sum_{i=1}^n(Y_i-\psi(\alpha^T X_i))^2$ over monotone $\psi$ and $\alpha$ on the boundary $S_{d-1}$of the unit ball. Although this suggestion has been around for a long time, it is still unknown whether the estimate is $\sqrt{n}$ convergent. We show that a profile least squares estimator, using the same pointwise least squares estimator for fixed $\alpha$, but using a different global sum of squares, is $\sqrt{n}$-convergent and asymptotically normal. The difference between the corresponding loss functions is studied and also a comparison with other methods is given.

Minimax · 規范化的 · 鞍點 · 泛函 · Extensibility ·

2023 年 1 月 30 日

Nonmonotone local minimax methods for finding multiple saddle points

Wei Liu,Ziqing Xie,Wenfan Yi

from arxiv, 32 pages, 7 figures; Accepted by Journal of Computational Mathematics on January 3, 2023

In this paper, by designing a normalized nonmonotone search strategy with the Barzilai--Borwein-type step-size, a novel local minimax method (LMM), which is a globally convergent iterative method, is proposed and analyzed to find multiple (unstable) saddle points of nonconvex functionals in Hilbert spaces. Compared to traditional LMMs with monotone search strategies, this approach, which does not require strict decrease of the objective functional value at each iterative step, is observed to converge faster with less computations. Firstly, based on a normalized iterative scheme coupled with a local peak selection that pulls the iterative point back onto the solution submanifold, by generalizing the Zhang--Hager (ZH) search strategy in the optimization theory to the LMM framework, a kind of normalized ZH-type nonmonotone step-size search strategy is introduced, and then a novel nonmonotone LMM is constructed. Its feasibility and global convergence results are rigorously carried out under the relaxation of the monotonicity for the functional at the iterative sequences. Secondly, in order to speed up the convergence of the nonmonotone LMM, a globally convergent Barzilai--Borwein-type LMM (GBBLMM) is presented by explicitly constructing the Barzilai--Borwein-type step-size as a trial step-size of the normalized ZH-type nonmonotone step-size search strategy in each iteration. Finally, the GBBLMM algorithm is implemented to find multiple unstable solutions of two classes of semilinear elliptic boundary value problems with variational structures: one is the semilinear elliptic equations with the homogeneous Dirichlet boundary condition and another is the linear elliptic equations with semilinear Neumann boundary conditions. Extensive numerical results indicate that our approach is very effective and speeds up the LMMs significantly.

估計/估計量 · 邊緣化 · 試驗 · 可辨認的 · Extensibility ·

2023 年 1 月 28 日

Covariate-assisted bounds on causal effects with instrumental variables

Alexander W. Levis,Matteo Bonvini,Zhenghao Zeng,Luke Keele,Edward H. Kennedy

from arxiv, 30 pages, 1 figure

When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs typically relies on strong untestable structural assumptions. When one is unwilling to assert such structural assumptions, IVs can nonetheless be used to construct bounds on the ATE. Famously, Balke and Pearl (1997) employed linear programming techniques to prove tight bounds on the ATE for a binary outcome, in a randomized trial with noncompliance and no covariate information. We demonstrate how these bounds remain useful in observational settings with baseline confounders of the IV, as well as randomized trials with measured baseline covariates. The resulting lower and upper bounds on the ATE are non-smooth functionals, and thus standard nonparametric efficiency theory is not immediately applicable. To remedy this, we propose (1) estimators of smooth approximations of these bounds, and (2) under a novel margin condition, influence function-based estimators of the ATE bounds that can attain parametric convergence rates when the nuisance functions are modeled flexibly. We propose extensions to continuous outcomes, and finally, illustrate the proposed estimators in a randomized experiment studying the effects of influenza vaccination encouragement on flu-related hospital visits.

樣例 · Weight · 情景 · 樣本 · MoDELS ·

2023 年 1 月 28 日

Leveraging Importance Weights in Subset Selection

Gui Citovsky,Giulia DeSalvo,Sanjiv Kumar,Srikumar Ramalingam,Afshin Rostamizadeh,Yunjuan Wang

from arxiv, ICLR 2023

We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i.e. further train model weights) once a large enough batch of examples is selected. Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches. IWeS admits significant performance improvement compared to other subset selection algorithms for seven publicly available datasets. Additionally, it is competitive in an active learning setting, where the label information is not available at selection time. We also provide an initial theoretical analysis to support our importance weighting approach, proving generalization and sampling rate bounds.

Weight · 多樣性 · 泛化理論 · Networking · motivation ·

2023 年 1 月 27 日

Diverse Weight Averaging for Out-of-Distribution Generalization

Alexandre Ramé,Matthieu Kirchmeyer,Thibaud Rahier,Alain Rakotomamonjy,Patrick Gallinari,Matthieu Cord

from arxiv, 36 pages, 16 figures, 15 tables

Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.

Learning · Networking · Principle · 推斷 · Neural Networks ·

2023 年 1 月 27 日

Constrained Parameter Inference as a Principle for Learning

Nasir Ahmad,Ellen Schrader,Marcel van Gerven

from arxiv, 18 pages, 5 figures

Learning in neural networks is often framed as a problem in which targeted error signals are directly propagated to parameters and used to produce updates that induce more optimal network behaviour. Backpropagation of error (BP) is an example of such an approach and has proven to be a highly successful application of stochastic gradient descent to deep neural networks. We propose constrained parameter inference (COPI) as a new principle for learning. The COPI approach assumes that learning can be set up in a manner where parameters infer their own values based upon observations of their local neuron activities. We find that this estimation of network parameters is possible under the constraints of decorrelated neural inputs and top-down perturbations of neural states for credit assignment. We show that the decorrelation required for COPI allows learning at extremely high learning rates, competitive with that of adaptive optimizers, as used by BP. We further demonstrate that COPI affords a new approach to feature analysis and network compression. Finally, we argue that COPI may shed new light on learning in biological networks given the evidence for decorrelation in the brain.

Learning · ENJOY · 統計量 · 估計/估計量 · Analysis ·

2023 年 1 月 27 日

The Stochastic Proximal Distance Algorithm

Haoyu Jiang,Jason Xu

Stochastic versions of proximal methods have gained much attention in statistics and machine learning. These algorithms tend to admit simple, scalable forms, and enjoy numerical stability via implicit updates. In this work, we propose and analyze a stochastic version of the recently proposed proximal distance algorithm, a class of iterative optimization methods that recover a desired constrained estimation problem as a penalty parameter $\rho \rightarrow \infty$. By uncovering connections to related stochastic proximal methods and interpreting the penalty parameter as the learning rate, we justify heuristics used in practical manifestations of the proximal distance method, establishing their convergence guarantees for the first time. Moreover, we extend recent theoretical devices to establish finite error bounds and a complete characterization of convergence rates regimes. We validate our analysis via a thorough empirical study, also showing that unsurprisingly, the proposed method outpaces batch versions on popular learning tasks.

Automator · MoDELS · 模型選擇 · 穩健性 · 知識 (knowledge) ·

2023 年 1 月 26 日

The Automated Discovery of Kinetic Rate Models -- Methodological Frameworks

Miguel ángel de Carvalho Servia,Ilya Orson Sandoval,Klaus Hellgardt,King Kuok, Hii,Dongda Zhang,Ehecatl Antonio del Rio Chanona

The industrialization of catalytic processes is of far more importance today than it has ever been before and kinetic models are essential tools for their industrialization. Kinetic models affect the design, the optimization and the control of catalytic processes, but they are not easy to obtain. Classical paradigms, such as mechanistic modeling require substantial domain knowledge, while data-driven and hybrid modeling lack interpretability. Consequently, a different approach called automated knowledge discovery has recently gained popularity. Many methods under this paradigm have been developed, where ALAMO, SINDy and genetic programming are notable examples. However, these methods suffer from important drawbacks: they require assumptions about model structures, scale poorly, lack robust and well-founded model selection routines, and they are sensitive to noise. To overcome these challenges, the present work constructs two methodological frameworks, Automated Discovery of Kinetics using a Strong/Weak formulation of symbolic regression, ADoK-S and ADoK-W, for the automated generation of catalytic kinetic models. We leverage genetic programming for model generation, a sequential optimization routine for model refinement, and a robust criterion for model selection. Both frameworks are tested against three computational case studies of increasing complexity. We showcase their ability to retrieve the underlying kinetic rate model with a limited amount of noisy data from the catalytic system, indicating a strong potential for chemical reaction engineering applications.

MoDELS · Integration · 周期的 · 計算成本 · 代價 ·

2023 年 1 月 26 日

The Method of Harmonic Balance for the Giesekus Model under Oscillatory Shear

Shivangi Mittal,Yogesh M. Joshi,Sachin Shanbhag

from arxiv, submitted to JNNFM

The method of harmonic balance (HB) is a spectrally accurate method used to obtain periodic steady state solutions to dynamical systems subjected to periodic perturbations. We adapt HB to solve for the stress response of the Giesekus model under large amplitude oscillatory shear (LAOS) deformation. HB transforms the system of differential equations to a set of nonlinear algebraic equations in the Fourier coefficients. Convergence studies find that the difference between the HB and true solutions decays exponentially with the number of harmonics ($H$) included in the ansatz as $e^{-m H}$. The decay coefficient $m$ decreases with increasing strain amplitude, and exhibits a "U" shaped dependence on applied frequency. The computational cost of HB increases slightly faster than linearly with $H$. The net result of rapid convergence and modest increase in computational cost with increasing $H$ implies that HB outperforms the conventional method of using numerical integration to solve differential constitutive equations under oscillatory shear. Numerical experiments find that HB is simultaneously about three orders of magnitude cheaper, and several orders of magnitude more accurate than numerical integration. Thus, it offers a compelling value proposition for parameter estimation or model selection.

優化器 · Performer · MNIST (數據集) · 近似 · Oracle ·

2023 年 1 月 26 日

A Fully First-Order Method for Stochastic Bilevel Optimization

Jeongyeol Kwon,Dohyun Kwon,Stephen Wright,Robert Nowak

We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.