云南虫谷在线观看免费观看电视剧,国产欧美日韩综合在线,欧美亚洲一区电影,国产成人久久77777精品

This paper presents an accelerated proximal gradient method for multiobjective optimization, in which each objective function is the sum of a continuously differentiable, convex function and a closed, proper, convex function. Extending first-order methods for multiobjective problems without scalarization has been widely studied, but providing accelerated methods with accurate proofs of convergence rates remains an open problem. Our proposed method is a multiobjective generalization of the accelerated proximal gradient method, also known as the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), for scalar optimization. The key to this successful extension is solving a subproblem with terms exclusive to the multiobjective case. This approach allows us to demonstrate the global convergence rate of the proposed method ($O(1 / k^2)$), using a merit function to measure the complexity. Furthermore, we present an efficient way to solve the subproblem via its dual representation, and we confirm the validity of the proposed method through some numerical experiments.

相關內容

泛函

關注 0

有限差分 · 模型評估 · MATLAB · 確切的 · 有向 ·

2023 年 6 月 12 日

An Upwind Finite Difference Method to Singularly Perturbed Convection Diffusion Problems on a Shishkin Mesh

Daniel T. Gregory

from arxiv, 19 pages, 4 figures

This paper introduces a numerical approach to solve singularly perturbed convection diffusion boundary value problems for second-order ordinary differential equations that feature a small positive parameter {\epsilon} multiplying the highest derivative. We specifically examine Dirichlet boundary conditions. To solve this differential equation, we propose an upwind finite difference method and incorporate the Shishkin mesh scheme to capture the solution near boundary layers. Our solver is both direct and of high accuracy, with computation time that scales linearly with the number of grid points. MATLAB code of the numerical recipe is made publicly available. We present numerical results to validate the theoretical results and assess the accuracy of our method. The tables and graphs included in this paper demonstrate the numerical outcomes, which indicate that our proposed method offers a highly accurate approximation of the exact solution.

方差減小 · Networking · Learning · 方差 · Neural Networks ·

2023 年 6 月 9 日

On the effectiveness of partial variance reduction in federated learning with heterogeneous data

Bo Li,Mikkel N. Schmidt,Tommy S. Alstr?m,Sebastian U. Stich

from arxiv, Accepted to CVPR 2023

Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm.

優化器 · Learning · 泛函 · 情景 · 替代函數 ·

2023 年 6 月 8 日

Target-based Surrogates for Stochastic Optimization

Jonathan Wilder Lavington,Sharan Vaswani,Reza Babanezhad,Mark Schmidt,Nicolas Le Roux

We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the $SSO$ algorithm, which can be viewed as projected stochastic gradient descent in the target space. This connection enables us to prove theoretical guarantees for $SSO$ when minimizing convex functions. Our framework allows the use of standard stochastic optimization algorithms to construct surrogates which can be minimized by any deterministic optimization method. To evaluate our framework, we consider a suite of supervised learning and imitation learning problems. Our experiments indicate the benefits of target optimization and the effectiveness of $SSO$.

近似 · Learning · 隨機變量 · 蒙特卡羅 · 隨機梯度下降 ·

2023 年 6 月 8 日

Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing

Sebastian Becker,Arnulf Jentzen,Marvin S. Müller,Philippe von Wurstemberger

from arxiv, 71 pages, 4 Figures, 14 Tables; to appear in Math. Finance

In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain a single approximation. In this paper we introduce a new approximation strategy for parametric approximation problems including the parametric financial pricing problems described above. A central aspect of the approximation strategy proposed in this article is to combine MC algorithms with machine learning techniques to, roughly speaking, learn the random variables (LRV) in MC simulations. In other words, we employ stochastic gradient descent (SGD) optimization methods not to train parameters of standard artificial neural networks (ANNs) but to learn random variables appearing in MC approximations. We numerically test the LRV strategy on various parametric problems with convincing results when compared with standard MC simulations, Quasi-Monte Carlo simulations, SGD-trained shallow ANNs, and SGD-trained deep ANNs. Our numerical simulations strongly indicate that the LRV strategy might be capable to overcome the curse of dimensionality in the $L^\infty$-norm in several cases where the standard deep learning approach has been proven not to be able to do so. This is not a contradiction to lower bounds established in the scientific literature because this new LRV strategy is outside of the class of algorithms for which lower bounds have been established in the scientific literature. The proposed LRV strategy is of general nature and not only restricted to the parametric financial pricing problems described above, but applicable to a large class of approximation problems.

估計/估計量 · MoDELS · 線性回歸 · 樣本 · 線性的 ·

2023 年 6 月 8 日

Estimation of Poverty Measures for Small Areas Under a Two-Fold Nested Error Linear Regression Model: Comparison of Two Methods

Maryam Sohrabi,J. N. K. Rao

Demand for reliable statistics at a local area (small area) level has greatly increased in recent years. Traditional area-specific estimators based on probability samples are not adequate because of small sample size or even zero sample size in a local area. As a result, methods based on models linking the areas are widely used. World Bank focused on estimating poverty measures, in particular poverty incidence and poverty gap called FGT measures, using a simulated census method, called ELL, based on a one-fold nested error model for a suitable transformation of the welfare variable. Modified ELL methods leading to significant gain in efficiency over ELL also have been proposed under the one-fold model. An advantage of ELL and modified ELL methods is that distributional assumptions on the random effects in the model are not needed. In this paper, we extend ELL and modified ELL to two-fold nested error models to estimate poverty indicators for areas (say a state) and subareas (say counties within a state). Our simulation results indicate that the modified ELL estimators lead to large efficiency gains over ELL at the area level and subarea level. Further, modified ELL method retaining both area and subarea estimated effects in the model (called MELL2) performs significantly better in terms of mean squared error (MSE) for sampled subareas than the modified ELL retaining only estimated area effect in the model (called MELL1).

可微函數 · 泛函 · 近似 · 優化器 · Analysis ·

2023 年 6 月 8 日

New error bounds for Legendre approximations of differentiable functions

Haiyong Wang

from arxiv, Some typos in the first version have been corrected

In this paper we present a new perspective on error analysis of Legendre approximations for differentiable functions. We start by introducing a sequence of Legendre-Gauss-Lobatto polynomials and prove their theoretical properties, such as an explicit and optimal upper bound. We then apply these properties to derive a new and explicit bound for the Legendre coefficients of differentiable functions and establish some explicit and optimal error bounds for Legendre projections in the $L^2$ and $L^{\infty}$ norms. Illustrative examples are provided to demonstrate the sharpness of our new results.

估計/估計量 · 控制器 · Performer · 情景 · 統計量 ·

2023 年 6 月 7 日

$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

Michael Giegrich,Roel Oomen,Christoph Reisinger

We propose a novel $K$-nearest neighbor resampling procedure for estimating the performance of a policy from historical data containing realized episodes of a decision process generated under a different policy. We focus on feedback policies that depend deterministically on the current state in environments with continuous state-action spaces and system-inherent stochasticity effected by chosen actions. Such settings are common in a wide range of high-stake applications and are actively investigated in the context of stochastic control. Our procedure exploits that similar state/action pairs (in a metric sense) are associated with similar rewards and state transitions. This enables our resampling procedure to tackle the counterfactual estimation problem underlying off-policy evaluation (OPE) by simulating trajectories similarly to Monte Carlo methods. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization and does not explicitly assume a parametric model for the environment's dynamics. These properties make the proposed resampling algorithm particularly useful for stochastic control environments. We prove that our method is statistically consistent in estimating the performance of a policy in the OPE setting under weak assumptions and for data sets containing entire episodes rather than independent transitions. To establish the consistency, we generalize Stone's Theorem, a well-known result in nonparametric statistics on local averaging, to include episodic data and the counterfactual estimation underlying OPE. Numerical experiments demonstrate the effectiveness of the algorithm in a variety of stochastic control settings including a linear quadratic regulator, trade execution in limit order books and online stochastic bin packing.

Analysis · 泛函 · 估計/估計量 · 均值 · 稀疏 ·

2023 年 6 月 7 日

From dense to sparse design: Optimal rates under the supremum norm for estimating the mean function in functional data analysis

Max Berger,Philipp Hermann,Hajo Holzmann

In the setting of functional data analysis, we derive optimal rates of convergence in the supremum norm for estimating the H\"older-smooth mean function of a stochastic processes which is repeatedly and discretely observed at fixed, multivariate, synchronous design points and with additional errors. Similarly to the rates in $L_2$ obtained in Cai and Yuan (2011), for sparse design a discretization term dominates, while in the dense case the $\sqrt n$ rate can be achieved as if the $n$ processes were continuously observed without errors. However, our analysis differs in several respects from Cai and Yuan (2011). First, we do not assume that the paths of the processes are as smooth as the mean, but still obtain the $\sqrt n$ rate of convergence without additional logarithmic factors in the dense setting. Second, we show that in the supremum norm, there is an intermediate regime between the sparse and dense cases dominated by the contribution of the observation errors. Third, and in contrast to the analysis in $L_2$, interpolation estimators turn out to be sub-optimal in $L_\infty$ in the dense setting, which explains their poor empirical performance. We also obtain a central limit theorem in the supremum norm and discuss the selection of the bandwidth. Simulations and real data applications illustrate the results.

端到端 · Learning · 最優化 · 優化器 · 經驗風險最小化 ·

2023 年 6 月 7 日

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Yves Rychener,Daniel Kuhn Tobias Sutter

from arxiv, Accepted at ICML 2023

We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.

類別 · 優化器 · 泛函 · Performer · 最速下降法 ·

2023 年 6 月 6 日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

S. Gratton,S. Jerad,Ph. L. Toint

A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values, recovering existing results for purely-first order variants and improving the explicit dependence on problem dimension. This rate is shown to be essentially sharp. A new class of methods is also presented, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods' performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise.