亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Core decomposition is a classic technique for discovering densely connected regions in a graph with large range of applications. Formally, a $k$-core is a maximal subgraph where each vertex has at least $k$ neighbors. A natural extension of a $k$-core is a $(k, h)$-core, where each node must have at least $k$ nodes that can be reached with a path of length $h$. The downside in using $(k, h)$-core decomposition is the significant increase in the computational complexity: whereas the standard core decomposition can be done in $O(m)$ time, the generalization can require $O(n^2m)$ time, where $n$ and $m$ are the number of nodes and edges in the given graph. In this paper we propose a randomized algorithm that produces an $\epsilon$-approximation of $(k, h)$ core decomposition with a probability of $1 - \delta$ in $O(\epsilon^{-2} hm (\log^2 n - \log \delta))$ time. The approximation is based on sampling the neighborhoods of nodes, and we use Chernoff bound to prove the approximation guarantee. We demonstrate empirically that approximating the decomposition complements the exact computation: computing the approximation is significantly faster than computing the exact solution for the networks where computing the exact solution is slow.

相關內容

Center-based clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is undoubtedly the k-means problem, which, given a set $P$ of points from a metric space and a parameter $k<|P|$, requires to determine a subset $S$ of $k$ centers minimizing the sum of all squared distances of points in $P$ from their closest center. A more general formulation, known as k-means with $z$ outliers, introduced to deal with noisy datasets, features a further parameter $z$ and allows up to $z$ points of $P$ (outliers) to be disregarded when computing the aforementioned sum. We present a distributed coreset-based 3-round approximation algorithm for k-means with $z$ outliers for general metric spaces, using MapReduce as a computational model. Our distributed algorithm requires sublinear local memory per reducer, and yields a solution whose approximation ratio is an additive term $O(\gamma)$ away from the one achievable by the best known sequential (possibly bicriteria) algorithm, where $\gamma$ can be made arbitrarily small. An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by the doubling dimension $D$ of the metric space. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance tradeoffs for general metrics.

We describe how to approximate the intractable marginal likelihood that arises when fitting generalized linear mixed models. We prove that non-adaptive quadrature approximations yield high error asymptotically in every statistical model satisfying weak regularity conditions. We derive the rate of error incurred when using adaptive quadrature to approximate the marginal likelihood in a broad class of generalized linear mixed models, which includes non-exponential family response and non-Gaussian random effects distributions. We provide an explicit recommendation for how many quadrature points to use, and show that this recommendation recovers and explains many empirical results from published simulation studies and data analyses. Particular attention is paid to models for dependent binary and survival/time-to-event observations. Code to reproduce results in the manuscript is found at //github.com/awstringer1/glmm-aq-paper-code.

We are interested in computing the treewidth $\tw(G)$ of a given graph $G$. Our approach is to design heuristic algorithms for computing a sequence of improving upper bounds and a sequence of improving lower bounds, which would hopefully converge to $\tw(G)$ from both sides. The upper bound algorithm extends and simplifies Tamaki's unpublished work on a heuristic use of the dynamic programming algorithm for deciding treewidth due to Bouchitt\'{e} and Todinca. The lower bound algorithm is based on the well-known fact that, for every minor $H$ of $G$, we have $\tw(H) \leq \tw(G)$. Starting from a greedily computed minor $H_0$ of $G$, the algorithm tries to construct a sequence of minors $H_0$, $H_1$, \ldots $H_k$ with $\tw(H_i) < \tw(H_{i + 1})$ for $0 \leq i < k$ and hopefully $\tw(H_k) = \tw(G)$. We have implemented a treewidth solver based on this approach and have evaluated it on the bonus instances from the exact treewidth track of PACE 2017 algorithm implementation challenge. The results show that our approach is extremely effective in tackling instances that are hard for conventional solvers. Our solver has an additional advantage over conventional ones in that it attaches a compact certificate to the lower bound it computes.

We develop a general framework for statistical inference with the 1-Wasserstein distance. Recently, the Wasserstein distance has attracted considerable attention and has been widely applied to various machine learning tasks because of its excellent properties. However, hypothesis tests and a confidence analysis for the Wasserstein distance have not been established in a general multivariate setting. This is because the limit distribution of the empirical distribution with the Wasserstein distance is unavailable without strong restriction. To address this problem, in this study, we develop a novel non-asymptotic Gaussian approximation for the empirical 1-Wasserstein distance. Using the approximation method, we develop a hypothesis test and confidence analysis for the empirical 1-Wasserstein distance. Additionally, we provide a theoretical guarantee and an efficient algorithm for the proposed approximation. Our experiments validate its performance numerically.

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t,d}$ denote the smallest such factor, clearly $c_{1,0}=1$, and it can be shown that $c_{t,d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t,1}\le 2.25$ and the bound rises quickly to $c_{t,d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t,d}=2$ for all $(t,d)\ne(1,0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

We study the generalized load-balancing (GLB) problem, where we are given $n$ jobs, each of which needs to be assigned to one of $m$ unrelated machines with processing times $\{p_{ij}\}$. Under a job assignment $\sigma$, the load of each machine $i$ is $\psi_i(\mathbf{p}_{i}[\sigma])$ where $\psi_i:\mathbb{R}^n\rightarrow\mathbb{R}_{\geq0}$ is a symmetric monotone norm and $\mathbf{p}_{i}[\sigma]$ is the $n$-dimensional vector $\{p_{ij}\cdot \mathbf{1}[\sigma(j)=i]\}_{j\in [n]}$. Our goal is to minimize the generalized makespan $\phi(\mathsf{load}(\sigma))$, where $\phi:\mathbb{R}^m\rightarrow\mathbb{R}_{\geq0}$ is another symmetric monotone norm and $\mathsf{load}(\sigma)$ is the $m$-dimensional machine load vector. This problem significantly generalizes many classic optimization problems, e.g., makespan minimization, set cover, minimum-norm load-balancing, etc. We obtain a polynomial time randomized algorithm that achieves an approximation factor of $O(\log n)$, matching the lower bound of set cover up to constant factor. We achieve this by rounding a novel configuration LP relaxation with exponential number of variables. To approximately solve the configuration LP, we design an approximate separation oracle for its dual program. In particular, the separation oracle can be reduced to the norm minimization with a linear constraint (NormLin) problem and we devise a polynomial time approximation scheme (PTAS) for it, which may be of independent interest.

Given a directed graph $G$ and integers $k$ and $l$, a D-core is the maximal subgraph $H \subseteq G$ such that for every vertex of $H$, its in-degree and out-degree are no smaller than $k$ and $l$, respectively. For a directed graph $G$, the problem of D-core decomposition aims to compute the non-empty D-cores for all possible values of $k$ and $l$. In the literature, several \emph{peeling-based} algorithms have been proposed to handle D-core decomposition. However, the peeling-based algorithms that work in a sequential fashion and require global graph information during processing are mainly designed for \emph{centralized} settings, which cannot handle large-scale graphs efficiently in distributed settings. Motivated by this, we study the \emph{distributed} D-core decomposition problem in this paper. We start by defining a concept called \emph{anchored coreness}, based on which we propose a new H-index-based algorithm for distributed D-core decomposition. Furthermore, we devise a novel concept, namely \emph{skyline coreness}, and show that the D-core decomposition problem is equivalent to the computation of skyline corenesses for all vertices. We design an efficient D-index to compute the skyline corenesses distributedly. We implement the proposed algorithms under both vertex-centric and block-centric distributed graph processing frameworks. Moreover, we theoretically analyze the algorithm and message complexities. Extensive experiments on large real-world graphs with billions of edges demonstrate the efficiency of the proposed algorithms in terms of both the running time and communication overhead.

We give an efficient perfect sampling algorithm for weighted, connected induced subgraphs (or graphlets) of rooted, bounded degree graphs under a vertex-percolation subcriticality condition. We show that this subcriticality condition is optimal in the sense that the problem of (approximately) sampling weighted rooted graphlets becomes impossible for infinite graphs and intractable for finite graphs if the condition does not hold. We apply our rooted graphlet sampling algorithm as a subroutine to give a fast perfect sampling algorithm for polymer models and a fast perfect sampling algorithm for weighted non-rooted graphlets in finite graphs, two widely studied yet very different problems. We apply this polymer model algorithm to give improved sampling algorithms for spin systems at low temperatures on expander graphs and other structured families of graphs: under the least restrictive conditions known we give near linear-time algorithms, while previous algorithms in these regimes required large polynomial running times.

In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: $\min_{\mathbf{x}} \max_{\mathbf{y}} \{F(\mathbf{x},\mathbf{y}) :=G(\mathbf{x}) + \langle \mathbf{y}, \overline{P} \mathbf{x} \rangle - H(\mathbf{y})\}$, where the functions $G(\cdot)$, $H(\cdot)$, and the the coupling matrix $\overline{P}$ are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication. To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when $G(\cdot)$ is smooth and convex, $H(\cdot)$ is smooth and strongly convex, and the global coupling matrix $\overline{P}$ has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA to an error around the unique saddle point, which goes to zero when the coupling cost ${\langle \mathbf y, \overline{P} \mathbf x \rangle}$ is common to all nodes, or when $G(\cdot)$ and $H(\cdot)$ are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.

The Benjamini-Hochberg (BH) procedure is a celebrated method for multiple testing with false discovery rate (FDR) control. In this paper, we consider large-scale distributed networks where each node possesses a large number of p-values and the goal is to achieve the global BH performance in a communication-efficient manner. We propose that every node performs a local test with an adjusted test size according to the (estimated) global proportion of true null hypotheses. With suitable assumptions, our method is asymptotically equivalent to the global BH procedure. Motivated by this, we develop an algorithm for star networks where each node only needs to transmit an estimate of the (local) proportion of nulls and the (local) number of p-values to the center node; the center node then broadcasts a parameter (computed based on the global estimate and test size) to the local nodes. In the experiment section, we utilize existing estimators of the proportion of true nulls and consider various settings to evaluate the performance and robustness of our method.

北京阿比特科技有限公司