精品亚洲中文一区二区三区,中文字幕无码乱人伦漫画,一区二区三区四区在线视频,久久人人爽人人爽人人AB东京热,国产成人免费在线视频

from arxiv, This version (v4) added a new corollary on logistic regression, as well as more discussions on sparse Gaussian mean estimation, compared to v3

We consider parameter estimation in distributed networks, where each sensor in the network observes an independent sample from an underlying distribution and has $k$ bits to communicate its sample to a centralized processor which computes an estimate of a desired parameter. We develop lower bounds for the minimax risk of estimating the underlying parameter for a large class of losses and distributions. Our results show that under mild regularity conditions, the communication constraint reduces the effective sample size by a factor of $d$ when $k$ is small, where $d$ is the dimension of the estimated parameter. Furthermore, this penalty reduces at most exponentially with increasing $k$, which is the case for some models, e.g., estimating high-dimensional distributions. For other models however, we show that the sample size reduction is re-mediated only linearly with increasing $k$, e.g. when some sub-Gaussian structure is available. We apply our results to the distributed setting with product Bernoulli model, multinomial model, Gaussian location models, and logistic regression which recover or strengthen existing results. Our approach significantly deviates from existing approaches for developing information-theoretic lower bounds for communication-efficient estimation. We circumvent the need for strong data processing inequalities used in prior work and develop a geometric approach which builds on a new representation of the communication constraint. This approach allows us to strengthen and generalize existing results with simpler and more transparent proofs.

相關內容

估計/估計量

關注 3

Color · 圖 · 團 · 情景 · 相互獨立的 ·

2021 年 9 月 24 日

Distributed coloring and the local structure of unit-disk graphs

Louis Esperet,Sébastien Julliot,Arnaud de Mesmay

from arxiv, 24 pages, revised according to the referee reports of the conference version. A preliminary version of this work appeared in the proceedings of the 17th International Symposium on Algorithms and Experiments for Wireless Sensor Networks (ALGOSENSORS 2021)

Coloring unit-disk graphs efficiently is an important problem in the global and distributed setting, with applications in radio channel assignment problems when the communication relies on omni-directional antennas of the same power. In this context it is important to bound not only the complexity of the coloring algorithms, but also the number of colors used. In this paper, we consider two natural distributed settings. In the location-aware setting (when nodes know their coordinates in the plane), we give a constant time distributed algorithm coloring any unit-disk graph $G$ with at most $(3+\epsilon)\omega(G)+6$ colors, for any constant $\epsilon>0$, where $\omega(G)$ is the clique number of $G$. This improves upon a classical 3-approximation algorithm for this problem, for all unit-disk graphs whose chromatic number significantly exceeds their clique number. When nodes do not know their coordinates in the plane, we give a distributed algorithm in the LOCAL model that colors every unit-disk graph $G$ with at most $5.68\omega(G)$ colors in $O(\log^3 \log n)$ rounds. Moreover, when $\omega(G)=O(1)$, the algorithm runs in $O(\log^* n)$ rounds. This algorithm is based on a study of the local structure of unit-disk graphs, which is of independent interest. We conjecture that every unit-disk graph $G$ has average degree at most $4\omega(G)$, which would imply the existence of a $O(\log n)$ round algorithm coloring any unit-disk graph $G$ with (approximatively) $4\omega(G)$ colors.

Weight · 低秩矩陣近似 · 近似 · Extensibility · 秩 ·

2021 年 9 月 22 日

Weighted Low Rank Matrix Approximation and Acceleration

Elena Tuzhilina,Trevor Hastie

Low-rank matrix approximation is one of the central concepts in machine learning, with applications in dimension reduction, de-noising, multivariate statistical methodology, and many more. A recent extension to LRMA is called low-rank matrix completion (LRMC). It solves the LRMA problem when some observations are missing and is especially useful for recommender systems. In this paper, we consider an element-wise weighted generalization of LRMA. The resulting weighted low-rank matrix approximation technique therefore covers LRMC as a special case with binary weights. WLRMA has many applications. For example, it is an essential component of GLM optimization algorithms, where an exponential family is used to model the entries of a matrix, and the matrix of natural parameters admits a low-rank structure. We propose an algorithm for solving the weighted problem, as well as two acceleration techniques. Further, we develop a non-SVD modification of the proposed algorithm that is able to handle extremely high-dimensional data. We compare the performance of all the methods on a small simulation example as well as a real-data application.

估計/估計量 · MoDELS · 子采樣 · 觀測變量 · BEGAN ·

2021 年 9 月 22 日

A Wavelet Method for Panel Models with Jump Discontinuities in the Parameters

Oualid Bada,Alois Kneip,Dominik Liebl,Tim Mensinger,James Gualtieri,Robin C. Sickles

While a substantial literature on structural break change point analysis exists for univariate time series, research on large panel data models has not been as extensive. In this paper, a novel method for estimating panel models with multiple structural changes is proposed. The breaks are allowed to occur at unknown points in time and may affect the multivariate slope parameters individually. Our method adapts Haar wavelets to the structure of the observed variables in order to detect the change points of the parameters consistently. We also develop methods to address endogenous regressors within our modeling framework. The asymptotic property of our estimator is established. In our application, we examine the impact of algorithmic trading on standard measures of market quality such as liquidity and volatility over a time period that covers the financial meltdown that began in 2007. We are able to detect jumps in regression slope parameters automatically without using ad-hoc subsample selection criteria.

情景 · 圖 · Networking · 可理解性 · 相互獨立的 ·

2021 年 9 月 21 日

Near-Optimal Distributed Implementations of Dynamic Algorithms for Symmetry-Breaking Problems

Shiri Antaki,Quanquan C. Liu,Shay Solomon

from arxiv, Abstract truncated to fit arXiv limits

The field of dynamic graph algorithms aims at achieving a thorough understanding of real-world networks whose topology evolves with time. Traditionally, the focus has been on the classic sequential, centralized setting where the main quality measure of an algorithm is its update time, i.e. the time needed to restore the solution after each update. While real-life networks are very often distributed across multiple machines, the fundamental question of finding efficient dynamic, distributed graph algorithms received little attention to date. The goal in this setting is to optimize both the round and message complexities incurred per update step, ideally achieving a message complexity that matches the centralized update time in $O(1)$ (perhaps amortized) rounds. Toward initiating a systematic study of dynamic, distributed algorithms, we study some of the most central symmetry-breaking problems: maximal independent set (MIS), maximal matching/(approx-) maximum cardinality matching (MM/MCM), and $(\Delta + 1)$-vertex coloring. This paper focuses on dynamic, distributed algorithms that are deterministic, and in particular -- robust against an adaptive adversary. Most of our focus is on our MIS algorithm, which achieves $O\left(m^{2/3}\log^2 n\right)$ amortized messages in $O\left(\log^2 n\right)$ amortized rounds in the Congest model. Notably, the amortized message complexity of our algorithm matches the amortized update time of the best-known deterministic centralized MIS algorithm by Gupta and Khan [SOSA'21] up to a polylog $n$ factor. The previous best deterministic distributed MIS algorithm, by Assadi et al. [STOC'18], uses $O(m^{3/4})$ amortized messages in $O(1)$ amortized rounds, i.e., we achieve a polynomial improvement in the message complexity by a polylog $n$ increase to the round complexity; moreover, the algorithm of Assadi et al. makes an implicit assumption that the [...]

相互獨立的 · 近似 · INFORMS · 極小點 · 易處理的 ·

2021 年 9 月 21 日

On the Exponential Approximation of Type II Error Probability of Distributed Test of Independence

Sebastian Espinosa,Jorge F. Silva,Pablo Piantanida

This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.

優化器 · 近似 · 類別 · 極小點 · 情景 ·

2021 年 9 月 21 日

An Approximation Algorithm for a General Class of Multi-Parametric Optimization Problems

Stephan Helfrich,Arne Herzel,Stefan Ruzika,Clemens Thielen

In a widely studied class of multi-parametric optimization problems, the objective value of each solution is an affine function of real-valued parameters. For many important multi-parametric optimization problems, an optimal solutions set with minimum cardinality can contain super-polynomially many solutions. Consequently, any exact algorithm for such problems must output a super-polynomial number of solutions. We propose an approximation algorithm that is applicable to a general class of multi-parametric optimization problems and outputs a number of solutions that is bounded polynomially in the instance size and the inverse of the approximation guarantee. This method lifts approximation algorithms for non-parametric optimization problems to their parametric formulations, providing an approximation guarantee that is arbitrarily close to the approximation guarantee for the non-parametric problem. If the non-parametric problem can be solved exactly in polynomial time or if an FPTAS is available, the method yields an FPTAS. We discuss implications to important multi-parametric combinatorial optimizations problems. Remarkably, we obtain a $(\frac{3}{2} + \varepsilon)$-approximation algorithm for the multi-parametric metric travelling salesman problem, whereas the non-parametric version is known to be APX-complete. Furthermore, we show that the cardinality of a minimal size approximation set is in general not $\ell$-approximable for any natural number $\ell$.

估計/估計量 · 采樣法 · 核密度估計 · 累積分布函數 · 核化 ·

2021 年 9 月 21 日

Non-parametric Kernel-Based Estimation of Probability Distributions for Precipitation Modeling

Andrew Pavlides,Vasiliki Agou,Dionissios T. Hristopulos

from arxiv, 49 pages, 21 figures

The probability distribution of precipitation amount strongly depends on geography, climate zone, and time scale considered. Closed-form parametric probability distributions are not sufficiently flexible to provide accurate and universal models for precipitation amount over different time scales. In this paper we derive non-parametric estimates of the cumulative distribution function (CDF) of precipitation amount for wet time intervals. The CDF estimates are obtained by integrating the kernel density estimator leading to semi-explicit CDF expressions for different kernel functions. We investigate kernel-based CDF estimation with an adaptive plug-in bandwidth (KCDE), using both synthetic data sets and reanalysis precipitation data from the island of Crete (Greece). We show that KCDE provides better estimates of the probability distribution than the standard empirical (staircase) estimate and kernel-based estimates that use the normal reference bandwidth. We also demonstrate that KCDE enables the simulation of non-parametric precipitation amount distributions by means of the inverse transform sampling method.

MoDELS · 可約的 · 高斯混合（模型） · Neural Networks · 類別 ·

2021 年 1 月 15 日

Fundamental Tradeoffs in Distributionally Adversarial Training

Mohammad Mehrabi,Adel Javanmard,Ryan A. Rossi,Anup Rao,Tung Mai

from arxiv, 23 pages, 3 figures

Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

優化器 · Lipschitz連續 · 正則化項 · Continuity · Lipschitz ·

2018 年 6 月 1 日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Kevin Scaman,Francis Bach,Sébastien Bubeck,Yin Tat Lee,Laurent Massoulié

from arxiv, 17 pages

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.