Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study parallel algorithms for correlation clustering, where each pair of items is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements grouped separately. Our main contribution is a parallel algorithm that achieves a $(3 + \varepsilon)$-approximation to the minimum number of disagreements. Our algorithm builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a $3$-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable on several models of massive graph processing, such as Massively Parallel Computing (MPC) and graph streaming, where sparse graphs can typically be handled much more efficiently. For linear memory models, such as the linear memory MPC and streaming, our approach yields $O(1)$ time algorithms, where the runtime is independent of $\varepsilon$, which only appears in the memory demand.
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. Numerical results will be provided to demonstrate its efficiency and advantage over existing algorithms in both sequential and parallel computing.
The NSGA-II is one of the most prominent algorithms to solve multi-objective optimization problems. Despite numerous successful applications, several studies have shown that the NSGA-II is less effective for larger numbers of objectives. In this work, we use mathematical runtime analyses to rigorously demonstrate and quantify this phenomenon. We show that even on the simple OneMinMax benchmark, where every solution is Pareto optimal, the NSGA-II also with large population sizes cannot compute the full Pareto front (objective vectors of all Pareto optima) in sub-exponential time when the number of objectives is at least three. Our proofs suggest that the reason for this unexpected behavior lies in the fact that in the computation of the crowding distance, the different objectives are regarded independently. This is not a problem for two objectives, where any sorting of a pair-wise incomparable set of solutions according to one objective is also such a sorting according to the other objective (in the inverse order).
This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.
Presenting dynamic scenes without incurring motion artifacts visible to observers requires sustained effort from the display industry. A tool that predicts motion artifacts and simulates artifact elimination through optimizing the display configuration is highly desired to guide the design and manufacture of modern displays. Despite the popular demands, there is no such tool available in the market. In this study, we deliver an interactive toolkit, Binocular Perceived Motion Artifact Predictor (BiPMAP), as an executable file with GPU acceleration. BiPMAP accounts for an extensive collection of user-defined parameters and directly visualizes a variety of motion artifacts by presenting the perceived continuous and sampled moving stimuli side-by-side. For accurate artifact predictions, BiPMAP utilizes a novel model of the human contrast sensitivity function to effectively imitate the frequency modulation of the human visual system. In addition, BiPMAP is capable of deriving various in-plane motion artifacts for 2D displays and depth distortion in 3D stereoscopic displays.
We present a new FPTAS for the Subset Sum Ratio problem, which, given a set of integers, asks for two disjoint subsets such that the ratio of their sums is as close to $1$ as possible. Our scheme makes use of exact and approximate algorithms for the closely related Partition problem, hence any progress over those -- such as the recent improvement due to Bringmann and Nakos [SODA 2021] -- carries over to our FPTAS. Depending on the relationship between the size of the input set $n$ and the error margin $\varepsilon$, we improve upon the best currently known algorithm of Melissinos and Pagourtzis [COCOON 2018] of complexity $O(n^4 / \varepsilon)$. In particular, the exponent of $n$ in our proposed scheme may decrease down to $2$, depending on the Partition algorithm used. Furthermore, while the aforementioned state of the art complexity, expressed in the form $O((n + 1 / \varepsilon)^c)$, has constant $c = 5$, our results establish that $c < 5$.
Processing-in-memory (PIM) architectures allow software to explicitly initiate computation in the memory. This effectively makes PIM operations a new class of memory operations, alongside standard memory operations (e.g., load, store). For software correctness, it is crucial to have ordering rules for a PIM operation with other PIM operations and other memory operations, i.e., a consistency model that takes into account PIM operations is vital. To the best of our knowledge, little attention to PIM operation consistency has been given in existing works. In this paper, we focus on a specific PIM approach, named bulk-bitwise PIM. In bulk-bitwise PIM, large bitwise operations are performed directly and stored in the memory array. We show that previous solutions for the related topic of maintaining coherency of bulk-bitwise PIM have broken the host native consistency model and prevent any guaranteed correctness. As a solution, we propose and evaluate four consistency models for bulk-bitwise PIM, from strict to relaxed. Our designs also preserve coherency between PIM and the host processor. Evaluating the proposed designs' performance with a gem5 simulation, using the YCSB short-range scan benchmark and TPC-H queries, shows that the run time overhead of guaranteeing correctness is at most $6\%$, and in many cases the run time is even improved. The hardware overhead of our design is less than $0.22\%$.
A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.
Iterative steady-state solvers are widely used in computational fluid dynamics. Unfortunately, it is difficult to obtain steady-state solution for unstable problem caused by physical instability and numerical instability. Optimization is a better choice for solving unstable problem because steady-state solution is always the extreme point of optimization regardless of whether the problem is unstable or ill-conditioned, but it is difficult to solve partial differential equations (PDEs) due to too many optimization variables. In this study, we propose an Online Dimension Reduction Optimization (ODRO) method to enhance the convergence of the traditional iterative method to obtain the steady-state solution of unstable problem. This method performs proper orthogonal decomposition (POD) on the snapshots collected from a few iteration steps, optimizes PDE residual in the POD subspace to get a solution with lower residual, and then continues to iterate with the optimized solution as the initial value, repeating the above three steps until the residual converges. Several typical cases show that the proposed method can efficiently calculate the steady-state solution of unstable problem with both the high efficiency and robustness of the iterative method and the good convergence of the optimization method. In addition, this method is easy to implement in almost any iterative solver with minimal code modification.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.
In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.