亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. In this paper, we introduce a bilevel stochastic gradient method for bilevel problems with lower-level constraints. We also present a comprehensive convergence theory that covers all inexact calculations of the adjoint gradient (also called hypergradient) and addresses both the lower-level unconstrained and constrained cases. To promote the use of bilevel optimization in large-scale learning, we introduce a practical bilevel stochastic gradient method (BSG-1) that does not require second-order derivatives and, in the lower-level unconstrained case, dismisses any system solves and matrix-vector products.

相關內容

In this paper, we study fast first-order algorithms that approximately solve linear programs (LPs). More specifically, we apply algorithms from online linear programming to offline LPs and derive algorithms that are free of any matrix multiplication. To further improve the applicability of the proposed methods, we propose a variable-duplication technique that achieves $\mathcal{O}(\sqrt{mn/K})$ optimality gap by copying each variable $K$ times. Moreover, we identify that online algorithms can be efficiently incorporated into a column generation framework for large-scale LPs. Finally, numerical experiments show that our proposed methods can be applied either as an approximate direct solver or as an initialization subroutine in frameworks of exact LP solving.

In this work we consider stochastic gradient descent (SGD) for solving linear inverse problems in Banach spaces. SGD and its variants have been established as one of the most successful optimisation methods in machine learning, imaging and signal processing, etc. At each iteration SGD uses a single datum, or a small subset of data, resulting in highly scalable methods that are very attractive for large-scale inverse problems. Nonetheless, the theoretical analysis of SGD-based approaches for inverse problems has thus far been largely limited to Euclidean and Hilbert spaces. In this work we present a novel convergence analysis of SGD for linear inverse problems in general Banach spaces: we show the almost sure convergence of the iterates to the minimum norm solution and establish the regularising property for suitable a priori stopping criteria. Numerical results are also presented to illustrate features of the approach.

Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bilevel problem through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem without lower-level strong convexity. Experiments showcase the efficiency of the proposed PBGD algorithm.

The complexity and dynamism of microservices pose significant challenges to system reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization after anomaly detection is crucial for ensuring the reliability of microservice systems. However, two significant issues rest in existing approaches: (1) Microservices generate traces, system logs, and key performance indicators (KPIs), but existing approaches usually consider traces only, failing to understand the system fully as traces cannot depict all anomalies; (2) Troubleshooting microservices generally contains two main phases, i.e., anomaly detection and root cause localization. Existing studies regard these two phases as independent, ignoring their close correlation. Even worse, inaccurate detection results can deeply affect localization effectiveness. To overcome these limitations, we propose Eadro, the first end-to-end framework to integrate anomaly detection and root cause localization based on multi-source data for troubleshooting large-scale microservices. The key insights of Eadro are the anomaly manifestations on different data sources and the close connection between detection and localization. Thus, Eadro models intra-service behaviors and inter-service dependencies from traces, logs, and KPIs, all the while leveraging the shared knowledge of the two phases via multi-task learning. Experiments on two widely-used benchmark microservices demonstrate that Eadro outperforms state-of-the-art approaches by a large margin. The results also show the usefulness of integrating multi-source data. We also release our code and data to facilitate future research.

We consider the problem of learning the dynamics of a linear system when one has access to data generated by an auxiliary system that shares similar (but not identical) dynamics, in addition to data from the true system. We use a weighted least squares approach, and provide a finite sample error bound of the learned model as a function of the number of samples and various system parameters from the two systems as well as the weight assigned to the auxiliary data. We show that the auxiliary data can help to reduce the intrinsic system identification error due to noise, at the price of adding a portion of error that is due to the differences between the two system models. We further provide a data-dependent bound that is computable when some prior knowledge about the systems is available. This bound can also be used to determine the weight that should be assigned to the auxiliary data during the model training stage.

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.

The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

北京阿比特科技有限公司