亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We introduce a natural online allocation problem that connects several of the most fundamental problems in online optimization. Let $M$ be an $n$-point metric space. Consider a resource that can be allocated in arbitrary fractions to the points of $M$. At each time $t$, a convex monotone cost function $c_t: [0,1]\to\mathbb{R}_+$ appears at some point $r_t\in M$. In response, an algorithm may change the allocation of the resource, paying movement cost as determined by the metric and service cost $c_t(x_{r_t})$, where $x_{r_t}$ is the fraction of the resource at $r_t$ at the end of time $t$. For example, when the cost functions are $c_t(x)=\alpha x$, this is equivalent to randomized MTS, and when the cost functions are $c_t(x)=\infty\cdot 1_{x<1/k}$, this is equivalent to fractional $k$-server. We give an $O(\log n)$-competitive algorithm for weighted star metrics. Due to the generality of allowed cost functions, classical multiplicative update algorithms do not work for the metric allocation problem. A key idea of our algorithm is to decouple the rate at which a variable is updated from its value, resulting in interesting new dynamics. This can be viewed as running mirror descent with a time-varying regularizer, and we use this perspective to further refine the guarantees of our algorithm. The standard analysis techniques run into multiple complications when the regularizer is time-varying, and we show how to overcome these issues by making various modifications to the default potential function. We also consider the problem when cost functions are allowed to be non-convex. In this case, we give tight bounds of $\Theta(n)$ on tree metrics, which imply deterministic and randomized competitive ratios of $O(n^2)$ and $O(n\log n)$ respectively on arbitrary metrics. Our algorithm is based on an $\ell_2^2$-regularizer.

相關內容

在數學優化,統計學,計量經濟學,決策理論,機器學習和計算神經科學中,代價函數,又叫損失函數或成本函數,它是將一個或多個變量的事件閾值映射到直觀地表示與該事件。 一個優化問題試圖最小化損失函數。 目標函數是損失函數或其負值,在這種情況下它將被最大化。

The distributed convex optimization problem over the multi-agent system is considered in this paper, and it is assumed that each agent possesses its own cost function and communicates with its neighbours over a sequence of time-varying directed graphs. However, due to some reasons there exist communication delays while agents receive information from other agents, and we are going to seek the optimal value of the sum of agents' loss functions in this case. We desire to handle this problem with the push-sum distributed dual averaging (PS-DDA) algorithm. It is proved that this algorithm converges and the error decays at a rate $\mathcal{O}\left(T^{-0.5}\right)$ with proper step size, where $T$ is iteration span. The main result presented in this paper also illustrates the convergence of the proposed algorithm is related to the maximum value of the communication delay on one edge. We finally apply the theoretical results to numerical simulations to show the PS-DDA algorithm's performance.

This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents. This is a sequential decision making problem with two sequences of arbitrarily varying convex loss and constraint functions. At each round, each agent selects a decision from the decision set, and then only a portion of the loss function and a coordinate block of the constraint function at this round are privately revealed to this agent. The goal of the network is to minimize the network-wide loss accumulated over time. Two distributed online algorithms with full-information and bandit feedback are proposed. Both dynamic and static network regret bounds are analyzed for the proposed algorithms, and network cumulative constraint violation is used to measure constraint violation, which excludes the situation that strictly feasible constraints can compensate the effects of violated constraints. In particular, we show that the proposed algorithms achieve $\mathcal{O}(T^{\max\{\kappa,1-\kappa\}})$ static network regret and $\mathcal{O}(T^{1-\kappa/2})$ network cumulative constraint violation, where $T$ is the time horizon and $\kappa\in(0,1)$ is a user-defined trade-off parameter. Moreover, if the loss functions are strongly convex, then the static network regret bound can be reduced to $\mathcal{O}(T^{\kappa})$. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.

Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.

This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. Estimating a measure under this model is equivalent to estimating the unknown barycenteric coordinates. We provide novel geometrical, statistical, and computational insights for measure estimation under the BCM, consisting of three main results. Our first main result leverages the Riemannian geometry of Wasserstein-2 space to provide a procedure for recovering the barycentric coordinates as the solution to a quadratic optimization problem assuming access to the true reference measures. The essential geometric insight is that the parameters of this quadratic problem are determined by inner products between the optimal displacement maps from the given measure to the reference measures defining the BCM. Our second main result then establishes an algorithm for solving for the coordinates in the BCM when all the measures are observed empirically via i.i.d. samples. We prove precise rates of convergence for this algorithm -- determined by the smoothness of the underlying measures and their dimensionality -- thereby guaranteeing its statistical consistency. Finally, we demonstrate the utility of the BCM and associated estimation procedures in three application areas: (i) covariance estimation for Gaussian measures; (ii) image processing; and (iii) natural language processing.

Given a bichromatic point set $P=\textbf{R} \cup \textbf{B}$ of red and blue points, a separator is an object of a certain type that separates $\textbf{R}$ and $\textbf{B}$. We study the geometric separability problem when the separator is a) rectangular annulus of fixed orientation b) rectangular annulus of arbitrary orientation c) square annulus of fixed orientation d) orthogonal convex polygon. In this paper, we give polynomial time algorithms to construct separators of each of the above type that also optimizes a given parameter.

We analyse the privacy leakage of noisy stochastic gradient descent by modeling R\'enyi divergence dynamics with Langevin diffusions. Inspired by recent work on non-stochastic algorithms, we derive similar desirable properties in the stochastic setting. In particular, we prove that the privacy loss converges exponentially fast for smooth and strongly convex objectives under constant step size, which is a significant improvement over previous DP-SGD analyses. We also extend our analysis to arbitrary sequences of varying step sizes and derive new utility bounds. Last, we propose an implementation and our experiments show the practical utility of our approach compared to classical DP-SGD libraries.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P\'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that -- contrary to popular belief in the topic modeling literature -- partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司