国产白浆一区二区无码视频在线_91A国产黄片精品_成人资源在线观看_成年黄网站免费观看视频_男人天堂无码在线_可以观看的黄色网址_黄色网视频免费在线

We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems are essential to data analysis and used for example in road networks and data visualization. A coreset is a compact summary of the data that approximately preserves the clustering objective for every possible center set, and it offers significant efficiency improvements in terms of running time, storage, and communication, including in streaming and distributed settings. Our main result is a near-linear time construction of a coreset for k-Median in a general graph $G$, with size $O_{\epsilon, k}(\mathrm{tw}(G))$ where $\mathrm{tw}(G)$ is the treewidth of $G$, and we complement the construction with a nearly-tight size lower bound. The construction is based on the framework of Feldman and Langberg [STOC 2011], and our main technical contribution, as required by this framework, is a uniform bound of $O(\mathrm{tw}(G))$ on the shattering dimension under any point weights. We validate our coreset on real-world road networks, and our scalable algorithm constructs tiny coresets with high accuracy, which translates to a massive speedup of existing approximation algorithms such as local search for graph k-Median.

相關內容

簇

關注 1

INTERACT · 泛化理論 · Extensibility · INFORMS · Analysis ·

2023 年 2 月 13 日

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction

Xinyu Zhang,Minghan Li,Jimmy Lin

Recent progress in information retrieval finds that embedding query and document representation into multi-vector yields a robust bi-encoder retriever on out-of-distribution datasets. In this paper, we explore whether late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [CLS] vector to compute the similarity score. Although intuitively, the attention mechanism of rerankers at the previous layers already gathers the token-level information, we find adding late interaction still brings an extra 5% improvement in average on out-of-distribution datasets, with little increase in latency and no degradation in in-domain effectiveness. Through extensive experiments and analysis, we show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures and that the improvement is more prominent on longer queries.

Analysis · 數據分析 · 樣本 · 分離的 · 可理解性 ·

2023 年 2 月 11 日

On Differential Privacy and Adaptive Data Analysis with Bounded Space

Itai Dinur,Uri Stemmer,David P. Woodruff,Samson Zhou

We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy, compared to the space needed without privacy. To the best of our knowledge, this is the first separation between the space complexity of private and non-private algorithms. (2) The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries. We revisit previous lower bounds at a foundational level, and show that they are a consequence of a space bottleneck rather than a sampling bottleneck. To obtain our results, we define and construct an encryption scheme with multiple keys that is built to withstand a limited amount of key leakage in a very particular way.

線性的 · 情景 · 論文 · 算法與數據結構 ·

2023 年 2 月 10 日

A Linear Delay Algorithm for Enumeration of 2-Edge/Vertex-connected Induced Subgraphs

Takumi Tada,Kazuya Haraguchi

from arxiv, The preliminary version of the paper has been submitted to IWOCA 2023

For a set system $(V,{\mathcal C}\subseteq 2^V)$, we call a subset $C\in{\mathcal C}$ a component. A nonempty subset $Y\subseteq C$ is a minimal removable set (MRS) of $C$ if $C\setminus Y\in{\mathcal C}$ and no proper nonempty subset $Z\subsetneq Y$ satisfies $C\setminus Z\in{\mathcal C}$. In this paper, we consider the problem of enumerating all components in a set system such that, for every two components $C,C'\in{\mathcal C}$ with $C'\subsetneq C$, every MRS $X$ of $C$ satisfies either $X\subseteq C'$ or $X\cap C'=\emptyset$. We provide a partition-based algorithm for this problem, which yields the first linear delay algorithms to enumerate all 2-edge-connected induced subgraphs, and to enumerate all 2-vertex-connected induced subgraphs.

Analysis · 生成方法 · 模型評估 · 有向 · Attention ·

2023 年 2 月 9 日

Fine spectral analysis of preconditioned matrices and matrix-sequences arising from stage-parallel implicit Runge-Kutta methods of arbitrarily high order

Ivo Dravins,Stefano Serra-Capizzano,Maya Neytcheva

The use of high order fully implicit Runge-Kutta methods is of significant importance in the context of the numerical solution of transient partial differential equations, in particular when solving large scale problems due to fine space resolution with many millions of spatial degrees of freedom and long time intervals. In this study we consider strongly A-stable implicit Runge-Kutta methods of arbitrary order of accuracy, based on Radau quadratures, for which efficient preconditioners have been introduced. A refined spectral analysis of the corresponding matrices and matrix-sequences is presented, both in terms of localization and asymptotic global distribution of the eigenvalues. Specific expressions of the eigenvectors are also obtained. The given study fully agrees with the numerically observed spectral behavior and substantially improves the theoretical studies done in this direction so far. Concluding remarks and open problems end the current work, with specific attention to the potential generalizations of the hereby suggested general approach.

圖 · Learning · 線性的 · Networking · 原點 ·

2023 年 2 月 9 日

SF-SGL: Solver-Free Spectral Graph Learning from Linear Measurements

Ying Zhang,Zhiqiang Zhao,Zhuo Feng

from arxiv, arXiv admin note: text overlap with arXiv:2104.07867

This work introduces a highly-scalable spectral graph densification framework (SGL) for learning resistor networks with linear measurements, such as node voltages and currents. We show that the proposed graph learning approach is equivalent to solving the classical graphical Lasso problems with Laplacian-like precision matrices. We prove that given $O(\log N)$ pairs of voltage and current measurements, it is possible to recover sparse $N$-node resistor networks that can well preserve the effective resistance distances on the original graph. In addition, the learned graphs also preserve the structural (spectral) properties of the original graph, which can potentially be leveraged in many circuit design and optimization tasks. To achieve more scalable performance, we also introduce a solver-free method (SF-SGL) that exploits multilevel spectral approximation of the graphs and allows for a scalable and flexible decomposition of the entire graph spectrum (to be learned) into multiple different eigenvalue clusters (frequency bands). Such a solver-free approach allows us to more efficiently identify the most spectrally-critical edges for reducing various ranges of spectral embedding distortions. Through extensive experiments for a variety of real-world test cases, we show that the proposed approach is highly scalable for learning sparse resistor networks without sacrificing solution quality. We also introduce a data-driven EDA algorithm for vectorless power/thermal integrity verifications to allow estimating worst-case voltage/temperature (gradient) distributions across the entire chip by leveraging a few voltage/temperature measurements.

binary · 線性的 · FFT · FAST · 相互獨立的 ·

2023 年 2 月 8 日

Beating binary powering for polynomial matrices

Alin Bostan,Vincent Neiger,Sergey Yurkevich

from arxiv, 10 pages, 3 figures

The $N$th power of a polynomial matrix of fixed size and degree can be computed by binary powering as fast as multiplying two polynomials of linear degree in $N$. When Fast Fourier Transform (FFT) is available, the resulting arithmetic complexity is \emph{softly linear} in $N$, i.e. linear in $N$ with extra logarithmic factors. We show that it is possible to beat binary powering, by an algorithm whose complexity is \emph{purely linear} in $N$, even in absence of FFT. The key result making this improvement possible is that the entries of the $N$th power of a polynomial matrix satisfy linear differential equations with polynomial coefficients whose orders and degrees are independent of $N$. Similar algorithms are proposed for two related problems: computing the $N$th term of a C-recursive sequence of polynomials, and modular exponentiation to the power $N$ for bivariate polynomials.

圖 · 知識圖譜 · Machine Learning · 優化器 · Continuity ·

2021 年 9 月 22 日

Updating Embeddings for Dynamic Knowledge Graphs

Christopher Wewer,Florian Lemmerich,Michael Cochez

Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.

簇 · 圖 · SC · 圖形處理器 · 匯聚 ·

2020 年 6 月 3 日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Filippo Maria Bianchi,Daniele Grattarola,Cesare Alippi

Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.

簇 · Performer · 數據集 · MoDELS · DBSCAN ·

2019 年 10 月 30 日

Meta-Learning to Cluster

Yibo Jiang,Nakul Verma

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

圖卷積神經網絡/圖卷積網絡 · 圖卷積 · 圖 · Networking · 可約的 ·

2019 年 2 月 19 日

Simplifying Graph Convolutional Networks

Felix Wu,Tianyi Zhang,Amauri Holanda de Souza Jr.,Christopher Fifty,Tao Yu,Kilian Q. Weinberger

from arxiv, Code available at //github.com/Tiiiger/SGC

Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.