亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the underlying noise sequence, we use the randomized batching scheme, which is based on the multilevel Monte Carlo method. Moreover, our technique allows us to eliminate the limiting assumptions of previous research on Markov noise, such as the need for a bounded domain and uniformly bounded stochastic gradients. Our extension to variational inequalities under Markovian noise is original. Additionally, we provide lower bounds that match the oracle complexity of our method in the case of strongly convex optimization problems.

相關內容

In uncertainty quantification, variance-based global sensitivity analysis quantitatively determines the effect of each input random variable on the output by partitioning the total output variance into contributions from each input. However, computing conditional expectations can be prohibitively costly when working with expensive-to-evaluate models. Surrogate models can accelerate this, yet their accuracy depends on the quality and quantity of training data, which is expensive to generate (experimentally or computationally) for complex engineering systems. Thus, methods that work with limited data are desirable. We propose a diffeomorphic modulation under observable response preserving homotopy (D-MORPH) regression to train a polynomial dimensional decomposition surrogate of the output that minimizes the number of training data. The new method first computes a sparse Lasso solution and uses it to define the cost function. A subsequent D-MORPH regression minimizes the difference between the D-MORPH and Lasso solution. The resulting D-MORPH surrogate is more robust to input variations and more accurate with limited training data. We illustrate the accuracy and computational efficiency of the new surrogate for global sensitivity analysis using mathematical functions and an expensive-to-simulate model of char combustion. The new method is highly efficient, requiring only 15% of the training data compared to conventional regression.

Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.

Forecasting water content dynamics in heterogeneous porous media has significant interest in hydrological applications; in particular, the treatment of infiltration when in presence of cracks and fractures can be accomplished resorting to peridynamic theory, which allows a proper modeling of non localities in space. In this framework, we make use of Chebyshev transform on the diffusive component of the equation and then we integrate forward in time using an explicit method. We prove that the proposed spectral numerical scheme provides a solution converging to the unique solution in some appropriate Sobolev space. We finally exemplify on several different soils, also considering a sink term representing the root water uptake.

Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data. TDA enhances the analysis of objects by embedding them into a simplicial complex and extracting useful global properties such as the Betti numbers, i.e. the number of multidimensional holes, which can be used to define kernel methods that are easily integrated with existing machine-learning algorithms. These kernel methods have found broad applications, as they rely on powerful mathematical frameworks which provide theoretical guarantees on their performance. However, the computation of higher-dimensional Betti numbers can be prohibitively expensive on classical hardware, while quantum algorithms can approximate them in polynomial time in the instance size. In this work, we propose a quantum approach to defining topological kernels, which is based on constructing Betti curves, i.e. topological fingerprint of filtrations with increasing order. We exhibit a working prototype of our approach implemented on a noiseless simulator and show its robustness by means of some empirical results suggesting that topological approaches may offer an advantage in quantum machine learning.

In this paper, we consider the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables, wherein a network of $m$ computing agents collaborate via peer-to-peer communications. We consider when the coupling function is in expectation or finite-sum form and the double regularizers are convex functions, applied separately to the primal and dual variables. Our algorithmic framework introduces a Lagrangian multiplier to eliminate the consensus constraint on the dual variable. Coupling this with variance-reduction (VR) techniques, our proposed method, entitled VRLM, by a single neighbor communication per iteration, is able to achieve an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity under the general stochastic setting, with either a big-batch or small-batch VR option, where $\kappa$ is the condition number of the problem and $\varepsilon$ is the desired solution accuracy. With a big-batch VR, we can additionally achieve $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity. Under the special finite-sum setting, our method with a big-batch VR can achieve an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity, where $n$ is the number of components in the finite sum. All complexity results match the best-known results achieved by a few existing methods for solving special cases of the problem we consider. To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general convex nonsmooth regularizers applied to both the primal and dual variables in the decentralized stochastic setting. Numerical experiments are conducted on two machine learning problems. Our code is downloadable from //github.com/RPI-OPT/VRLM.

The absence of unknown timing information about the microphones recording start time and the sources emission time presents a challenge in several applications, including joint microphones and sources localization. Compared with traditional optimization methods that try to estimate unknown timing information directly, low rank property (LRP) contains an additional low rank structure that facilitates a linear constraint of unknown timing information for formulating corresponding low rank structure information, enabling the achievement of global optimal solutions of unknown timing information with suitable initialization. However, the initialization of unknown timing information is random, resulting in local minimal values for estimation of the unknown timing information. In this paper, we propose a combined low rank approximation method to alleviate the effect of random initialization on the estimation of unknown timing information. We define three new variants of LRP supported by proof that allows unknown timing information to benefit from more low rank structure information. Then, by utilizing the low rank structure information from both LRP and proposed variants of LRP, four linear constraints of unknown timing information are presented. Finally, we use the proposed combined low rank approximation algorithm to obtain global optimal solutions of unknown timing information through the four available linear constraints. Experimental results demonstrate superior performance of our method compared to state-of-the-art approaches in terms of recovery rate (the number of successful initialization for any configuration), convergency rate (the number of successfully recovered configurations), and estimation errors of unknown timing information.

We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.

We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

Several distributions and families of distributions are proposed to model skewed data, think, e.g., of skew-normal and related distributions. Lambert W random variables offer an alternative approach where, instead of constructing a new distribution, a certain transform is proposed (Goerg, 2011). Such an approach allows the construction of a Lambert W skewed version from any distribution. We choose Lambert W normal distribution as a natural starting point and also include Lambert W exponential distribution due to the simplicity and shape of the exponential distribution, which, after skewing, may produce a reasonably heavy tail for loss models. In the theoretical part, we focus on the mathematical properties of obtained distributions, including the range of skewness. In the practical part, the suitability of corresponding Lambert W transformed distributions is evaluated on real insurance data. The results are compared with those obtained using common loss distributions.

In this work we propose a two-step alternative clearing method of day-ahead electricity markets. In the first step, using the aggregation of bids, an approximate clearing is performed, and based on the outcome of this problem, the estimates for the clearing prices of individual periods are derived. These assumptions regarding the range of clearing prices explicitly determine the acceptance indicators for a subset of the original bids. In the subsequent stage, another round of clearing is performed to determine the acceptance indicators of the remaining bids and the market clearing prices. We show that the bid-aggregation based method may result in suboptimal solution or in an infeasible problem in the second step, but we also point out that these pitfalls of the algorithm may be avoided if a different aggregation pattern is used. We propose to define multiple different aggregation patterns, and to use parallel computing to enhance the performance of the algorithm. We test the proposed approach on setups of various problem sizes, and conclude that in the case of parallel computing with 4 threads a significant gain in computational speed may be achieved, with a high success rate.

北京阿比特科技有限公司