亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $\epsilon>0$ is achieved over master/workers networks in $\Omega\big(\Delta\cdot \delta/\mu\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $\delta>0$ measures the degree of similarity of the local functions, $\mu$ is their strong convexity constant, and $\Delta$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $\Omega\big(1/{\sqrt{\rho}} \cdot {\delta}/{\mu}\cdot\log (1/\varepsilon)\big)$, where $\rho$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem.

相關內容

We consider a mobile edge computing scenario where a number of devices want to perform a linear inference $\boldsymbol{W}\boldsymbol{x}$ on some local data $\boldsymbol{x}$ given a network-side matrix $\boldsymbol{W}$. The computation is performed at the network edge over a number of edge servers. We propose a coding scheme that provides information-theoretic privacy against $z$ colluding (honest-but-curious) edge servers, while minimizing the overall latency\textemdash comprising upload, computation, download, and decoding latency\textemdash in the presence of straggling servers. The proposed scheme exploits Shamir's secret sharing to yield data privacy and straggler mitigation, combined with replication to provide spatial diversity for the download. We also propose two variants of the scheme that further reduce latency. For a considered scenario with $9$ edge servers, the proposed scheme reduces the latency by $8\%$ compared to the nonprivate scheme recently introduced by Zhang and Simeone, while providing privacy against an honest-but-curious edge server.

We derive bounds on the eigenvalues of a generic form of double saddle-point matrices. The bounds are expressed in terms of extremal eigenvalues and singular values of the associated block matrices. Inertia and algebraic multiplicity of eigenvalues are considered as well. The analysis includes bounds for preconditioned matrices based on block diagonal preconditioners using Schur complements, and it is shown that in this case the eigenvalues are clustered within a few intervals bounded away from zero. Analysis for approximations of Schur complements is included. Some numerical experiments validate our analytical findings.

This paper presents algorithms for local inversion of maps and shows how several important computational problems such as cryptanalysis of symmetric encryption algorithms, RSA algorithm and solving the elliptic curve discrete log problem (ECDLP) can be addressed as local inversion problems. The methodology is termed as the \emph{Local Inversion Attack}. It utilizes the concept of \emph{Linear Complexity} (LC) of a recurrence sequence generated by the map defined by the cryptanalysis problem and the given data. It is shown that when the LC of the recurrence is bounded by a bound of polynomial order in the bit length of the input to the map, the local inversion can be accomplished in polynomial time. Hence an incomplete local inversion algorithm which searches a solution within a specified bound on computation can estimate the density of weak cases of cryptanalysis defined by such data causing low LC. Such cases can happen accidentally but cannot be avoided in practice and are fatal insecurity flaws of cryptographic primitives which are wrongly assumed to be secure on the basis of exponential average case complexity. An incomplete algorithm is proposed for solving problems such as key recovery of symmetric encryption algorithms, decryption of RSA ciphertext without factoring the modulus, decrypting any ciphertext of RSA given one plaintext ciphertext pair created with same private key in chosen ciphertext attack and solving the discrete logarithm on elliptic curves over finite fields (ECDLP) as local inversion problems. It is shown that when the LCs of the respective recurrences for given data are small, solutions of these problems are possible in practically feasible time and memory resources.

We consider the problem of designing secure and private codes for distributed matrix-matrix multiplication. A master server owns two private matrices and hires worker nodes to help compute their product. The matrices should remain information-theoretically private from the workers. Some of the workers are malicious and return corrupted results to the master. We design a framework for security against malicious workers in private matrix-matrix multiplication. The main idea is a careful use of Freivalds' algorithm to detect erroneous matrix multiplications. Our main goal is to apply this security framework to schemes with adaptive rates. Adaptive schemes divide the workers into clusters and thus provide flexibility in trading decoding complexity for efficiency. Our new scheme, SRPM3, provides a computationally efficient security check per cluster that detects the presence of one or more malicious workers with high probability. An additional per worker check is used to identify the malicious nodes. SRPM3 can tolerate the presence of an arbitrary number of malicious workers. We provide theoretical guarantees on the complexity of the security checks and simulation results on both, the missed detection rate as well as on the time needed for the integrity check.

We prove a Central Limit Theorem for the empirical optimal transport cost, $\sqrt{\frac{nm}{n+m}}\{\mathcal{T}_c(P_n,Q_m)-\mathcal{T}_c(P,Q)\}$, in the semi discrete case, i.e when the distribution $P$ is supported in $N$ points, but without assumptions on $Q$. We show that the asymptotic distribution is the supremun of a centered Gaussian process, which is Gaussian under some additional conditions on the probability $Q$ and on the cost. Such results imply the central limit theorem for the $p$-Wassertein distance, for $p\geq 1$. This means that, for fixed $N$, the curse of dimensionality is avoided. To better understand the influence of such $N$, we provide bounds of $E|\mathcal{W}_1(P,Q_m)-\mathcal{W}_1(P,Q)|$ depending on $m$ and $N$. Finally, the semidiscrete framework provides a control on the second derivative of the dual formulation, which yields the first central limit theorem for the optimal transport potentials. The results are supported by simulations that help to visualize the given limits and bounds. We analyse also the cases where classical bootstrap works.

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde \Theta(1)$ when the local datasets heterogeneity goes to zero.

In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: $\min_{\mathbf{x}} \max_{\mathbf{y}} \{F(\mathbf{x},\mathbf{y}) :=G(\mathbf{x}) + \langle \mathbf{y}, \overline{P} \mathbf{x} \rangle - H(\mathbf{y})\}$, where the functions $G(\cdot)$, $H(\cdot)$, and the the coupling matrix $\overline{P}$ are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication. To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when $G(\cdot)$ is smooth and convex, $H(\cdot)$ is smooth and strongly convex, and the global coupling matrix $\overline{P}$ has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA to an error around the unique saddle point, which goes to zero when the coupling cost ${\langle \mathbf y, \overline{P} \mathbf x \rangle}$ is common to all nodes, or when $G(\cdot)$ and $H(\cdot)$ are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.

In this paper, we consider learning scenarios where the learned model is evaluated under an unknown test distribution which potentially differs from the training distribution (i.e. distribution shift). The learner has access to a family of weight functions such that the test distribution is a reweighting of the training distribution under one of these functions, a setting typically studied under the name of Distributionally Robust Optimization (DRO). We consider the problem of deriving regret bounds in the classical learning theory setting, and require that the resulting regret bounds hold uniformly for all potential test distributions. We show that the DRO formulation does not guarantee uniformly small regret under distribution shift. We instead propose an alternative method called Minimax Regret Optimization (MRO), and show that under suitable conditions this method achieves uniformly low regret across all test distributions. We also adapt our technique to have stronger guarantees when the test distributions are heterogeneous in their similarity to the training data. Given the widespead optimization of worst case risks in current approaches to robust machine learning, we believe that MRO can be a strong alternative to address distribution shift scenarios.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司