亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.

相關內容

隨機梯度下降,按照數據生成分布抽取m個樣本,通過計算他們梯度的平均值來更新梯度。

Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its influence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification.

We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.

Minimax problems have recently attracted a lot of research interests. A few efforts have been made to solve decentralized nonconvex strongly-concave (NCSC) minimax-structured optimization; however, all of them focus on smooth problems with at most a constraint on the maximization variable. In this paper, we make the first attempt on solving composite NCSC minimax problems that can have convex nonsmooth terms on both minimization and maximization variables. Our algorithm is designed based on a novel reformulation of the decentralized minimax problem that introduces a multiplier to absorb the dual consensus constraint. The removal of dual consensus constraint enables the most aggressive (i.e., local maximization instead of a gradient ascent step) dual update that leads to the benefit of taking a larger primal stepsize and better complexity results. In addition, the decoupling of the nonsmoothness and consensus on the dual variable eases the analysis of a decentralized algorithm; thus our reformulation creates a new way for interested researchers to design new (and possibly more efficient) decentralized methods on solving NCSC minimax problems. We show a global convergence result of the proposed algorithm and an iteration complexity result to produce a (near) stationary point of the reformulation. Moreover, a relation is established between the (near) stationarities of the reformulation and the original formulation. With this relation, we show that when the dual regularizer is smooth, our algorithm can have lower complexity results (with reduced dependence on a condition number) than existing ones to produce a near-stationary point of the original formulation. Numerical experiments are conducted on a distributionally robust logistic regression to demonstrate the performance of the proposed algorithm.

Recent advances in learning-based control leverage deep function approximators, such as neural networks, to model the evolution of controlled dynamical systems over time. However, the problem of learning a dynamics model and a stabilizing controller persists, since the synthesis of a stabilizing feedback law for known nonlinear systems is a difficult task, let alone for complex parametric representations that must be fit to data. To this end, we propose Control with Inherent Lyapunov Stability (CoILS), a method for jointly learning parametric representations of a nonlinear dynamics model and a stabilizing controller from data. To do this, our approach simultaneously learns a parametric Lyapunov function which intrinsically constrains the dynamics model to be stabilizable by the learned controller. In addition to the stabilizability of the learned dynamics guaranteed by our novel construction, we show that the learned controller stabilizes the true dynamics under certain assumptions on the fidelity of the learned dynamics. Finally, we demonstrate the efficacy of CoILS on a variety of simulated nonlinear dynamical systems.

During the first part of life, the brain develops while it learns through a process called synaptogenesis. The neurons, growing and interacting with each other, create synapses. However, eventually the brain prunes those synapses. While previous work focused on learning and pruning independently, in this work we propose a biologically plausible model that, thanks to a combination of Hebbian learning and pruning, aims to simulate the synaptogenesis process. In this way, while learning how to solve the task, the agent translates its experience into a particular network structure. Namely, the network structure builds itself during the execution of the task. We call this approach Self-building Neural Network (SBNN). We compare our proposed SBNN with traditional neural networks (NNs) over three classical control tasks from OpenAI. The results show that our model performs generally better than traditional NNs. Moreover, we observe that the performance decay while increasing the pruning rate is smaller in our model than with NNs. Finally, we perform a validation test, testing the models over tasks unseen during the learning phase. In this case, the results show that SBNNs can adapt to new tasks better than the traditional NNs, especially when over $80\%$ of the weights are pruned.

The stochastic approximation (SA) algorithm is a widely used probabilistic method for finding a zero or a fixed point of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one makes a distinction between ``synchronous'' updating, whereby every component of the current guess is updated at each time, and ``asynchronous'' updating, whereby only one component is updated. In this paper, we study an intermediate situation that we call ``batch asynchronous stochastic approximation'' (BASA), in which, at each time instant, \textit{some but not all} components of the current estimated solution are updated. BASA allows the user to trade off memory requirements against time complexity. We develop a general methodology for proving that such algorithms converge to the fixed point of the map under study. These convergence proofs make use of weaker hypotheses than existing results. Specifically, existing convergence proofs require that the measurement noise is a zero-mean i.i.d\ sequence or a martingale difference sequence. In the present paper, we permit biased measurements, that is, measurement noises that have nonzero conditional mean. Also, all convergence results to date assume that the stochastic step sizes satisfy a probabilistic analog of the well-known Robbins-Monro conditions. We replace this assumption by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we analyze the temporal difference algorithm $TD(\lambda)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function. In both cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.

Persistent monitoring of a spatiotemporal fluid process requires data sampling and predictive modeling of the process being monitored. In this paper we present PASST algorithm: Predictive-model based Adaptive Sampling of a Spatio-Temporal process. PASST is an adaptive robotic sampling algorithm that leverages predictive models to efficiently and persistently monitor a fluid process in a given region of interest. Our algorithm makes use of the predictions from a learned prediction model to plan a path for an autonomous vehicle to adaptively and efficiently survey the region of interest. In turn, the sampled data is used to obtain better predictions by giving an updated initial state to the predictive model. For predictive model, we use Knowledged-based Neural Ordinary Differential Equations to train models of fluid processes. These models are orders of magnitude smaller in size and run much faster than fluid data obtained from direct numerical simulations of the partial differential equations that describe the fluid processes or other comparable computational fluids models. For path planning, we use reinforcement learning based planning algorithms that use the field predictions as reward functions. We evaluate our adaptive sampling path planning algorithm on both numerically simulated fluid data and real-world nowcast ocean flow data to show that we can sample the spatiotemporal field in the given region of interest for long time horizons. We also evaluate PASST algorithm's generalization ability to sample from fluid processes that are not in the training repertoire of the learned models.

Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds. Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases. Experimental results confirm the effectiveness of the presented algorithms.

We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.

Federated Learning (FL) is a novel machine learning framework, which enables multiple distributed devices cooperatively to train a shared model scheduled by a central server while protecting private data locally. However, the non-independent-and-identically-distributed (Non-IID) data samples and frequent communication across participants may significantly slow down the convergent rate and increase communication costs. To achieve fast convergence, we ameliorate the conventional local updating rule by introducing the aggregated gradients at each local update epoch, and propose an adaptive learning rate algorithm that further takes the deviation of local parameter and global parameter into consideration. The above adaptive learning rate design requires all clients' local information including the local parameters and gradients, which is challenging as there is no communication during the local update epochs. To obtain a decentralized adaptive learning rate for each client, we utilize the mean field approach by introducing two mean field terms to estimate the average local parameters and gradients respectively, which does not require the clients to exchange their local information with each other at each local epoch. Numerical results show that our proposed framework is superior to the state-of-art FL schemes in both model accuracy and convergent rate for IID and Non-IID datasets.

北京阿比特科技有限公司