The fuzzy or soft $k$-means objective is a popular generalization of the well-known $k$-means problem, extending the clustering capability of the $k$-means to datasets that are uncertain, vague, and otherwise hard to cluster. In this paper, we propose a semi-supervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide algorithms for fuzzy clustering in this setting that asks $O(\mathsf{poly}(k)\log n)$ similarity queries and run with polynomial-time-complexity, where $n$ is the number of items. The fuzzy $k$-means objective is nonconvex, with $k$-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or alternating minimization algorithms) can get stuck at a local minimum. Our results show that by making a few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.
Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a given number $z$ of points of the dataset, called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window $W$ of the most recent data items. Our algorithms achieve $O(1)$ approximation and, remarkably, require a working memory linear in $k+z$ and only logarithmic in $|W|$. As a by-product, we show how to estimate the effective diameter of the window $W$, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of our theoretical results.
Among various soft computing approaches for time series forecasting, Fuzzy Cognitive Maps (FCM) have shown remarkable results as a tool to model and analyze the dynamics of complex systems. FCM have similarities to recurrent neural networks and can be classified as a neuro-fuzzy method. In other words, FCMs are a mixture of fuzzy logic, neural network, and expert system aspects, which act as a powerful tool for simulating and studying the dynamic behavior of complex systems. The most interesting features are knowledge interpretability, dynamic characteristics and learning capability. The goal of this survey paper is mainly to present an overview on the most relevant and recent FCM-based time series forecasting models proposed in the literature. In addition, this article considers an introduction on the fundamentals of FCM model and learning methodologies. Also, this survey provides some ideas for future research to enhance the capabilities of FCM in order to cover some challenges in the real-world experiments such as handling non-stationary data and scalability issues. Moreover, equipping FCMs with fast learning algorithms is one of the major concerns in this area.
We study time-series classification (TSC), a fundamental task of time-series data mining. Prior work has approached TSC from two major directions: (1) similarity-based methods that classify time-series based on the nearest neighbors, and (2) deep learning models that directly learn the representations for classification in a data-driven manner. Motivated by the different working mechanisms within these two research lines, we aim to connect them in such a way as to jointly model time-series similarities and learn the representations. This is a challenging task because it is unclear how we should efficiently leverage similarity information. To tackle the challenge, we propose Similarity-Aware Time-Series Classification (SimTSC), a conceptually simple and general framework that models similarity information with graph neural networks (GNNs). Specifically, we formulate TSC as a node classification problem in graphs, where the nodes correspond to time-series, and the links correspond to pair-wise similarities. We further design a graph construction strategy and a batch training algorithm with negative sampling to improve training efficiency. We instantiate SimTSC with ResNet as the backbone and Dynamic Time Warping (DTW) as the similarity measure. Extensive experiments on the full UCR datasets and several multivariate datasets demonstrate the effectiveness of incorporating similarity information into deep learning models in both supervised and semi-supervised settings. Our code is available at //github.com/daochenzha/SimTSC
We study the problem of query evaluation on probabilistic graphs, namely, tuple-independent probabilistic databases over signatures of arity two. We focus on the class of queries closed under homomorphisms, or, equivalently, the infinite unions of conjunctive queries. Our main result states that the probabilistic query evaluation problem is #P-hard for all unbounded queries from this class. As bounded queries from this class are equivalent to a union of conjunctive queries, they are already classified by the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a complete data complexity dichotomy, between polynomial time and #P-hardness, on evaluating homomorphism-closed queries over probabilistic graphs. This dichotomy covers in particular all fragments of infinite unions of conjunctive queries over arity-two signatures, such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries. The dichotomy also applies to a restricted case of probabilistic query evaluation called generalized model counting, where fact probabilities must be 0, 0.5, or 1. We show the main result by reducing from the problem of counting the valuations of positive partitioned 2-DNF formulae, or from the source-to-target reliability problem in an undirected graph, depending on properties of minimal models for the query.
Let $\kappa(s,t)$ denote the maximum number of internally disjoint paths in an undirected graph $G$. We consider designing a data structure that includes a list of cuts, and answers the following query: given $s,t \in V$, determine whether $\kappa(s,t) \leq k$, and if so, return a pointer to an $st$-cut of size $\leq k$ (or to a minimum $st$-cut) in the list. A trivial data structure that includes a list of $n(n-1)/2$ cuts and requires $\Theta(kn^2)$ space can answer each query in $O(1)$ time. We obtain the following results. In the case when $G$ is $k$-connected, we show that $n$ cuts suffice, and that these cuts can be partitioned into $(2k+1)$ laminar families. Thus using space $O(kn)$ we can answers each min-cut query in $O(1)$ time, slightly improving and substantially simplifying a recent result of Pettie and Yin. We then extend this data structure to subset $k$-connectivity. In the general case we show that $(2k+1)n$ cuts suffice to return an $st$-cut of size $\leq k$,and a list of size $k(k+2)n$ contains a minimum $st$-cut for every $s,t \in V$. Combining our subset $k$-connectivity data structure with the data structure of Hsu and Lu for checking $k$-connectivity, we give an $O(k^2 n)$ space data structure that returns an $st$-cut of size $\leq k$ in $O(\log k)$ time, while $O(k^3 n)$ space enables to return a minimum $st$-cut.
Modelling and forecasting homogeneous age-specific mortality rates of multiple countries could lead to improvements in long-term forecasting. Data fed into joint models are often grouped according to nominal attributes, such as geographic regions, ethnic groups, and socioeconomic status, which may still contain heterogeneity and deteriorate the forecast results. Our paper proposes a novel clustering technique to pursue homogeneity among multiple functional time series based on functional panel data modelling to address this issue. Using a functional panel data model with fixed effects, we can extract common functional time series features. These common features could be decomposed into two components: the functional time trend and the mode of variations of functions (functional pattern). The functional time trend reflects the dynamics across time, while the functional pattern captures the fluctuations within curves. The proposed clustering method searches for homogeneous age-specific mortality rates of multiple countries by accounting for both the modes of variations and the temporal dynamics among curves. We demonstrate that the proposed clustering technique outperforms other existing methods through a Monte Carlo simulation and could handle complicated cases with slow decaying eigenvalues. In empirical data analysis, we find that the clustering results of age-specific mortality rates can be explained by the combination of geographic region, ethnic groups, and socioeconomic status. We further show that our model produces more accurate forecasts than several benchmark methods in forecasting age-specific mortality rates.
Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.
Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. For instance, given an incomplete biological knowledge graph, we might want to predict "em what drugs are likely to target proteins involved with both diseases X and Y?" -- a query that requires reasoning about all possible proteins that {\em might} interact with diseases X and Y. Here we introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. By performing logical operations within a low-dimensional embedding space, our approach achieves a time complexity that is linear in the number of query variables, compared to the exponential complexity required by a naive enumeration-based approach. We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.
We present collaborative similarity embedding (CSE), a unified framework that exploits comprehensive collaborative relations available in a user-item bipartite graph for representation learning and recommendation. In the proposed framework, we differentiate two types of proximity relations: direct proximity and k-th order neighborhood proximity. While learning from the former exploits direct user-item associations observable from the graph, learning from the latter makes use of implicit associations such as user-user similarities and item-item similarities, which can provide valuable information especially when the graph is sparse. Moreover, for improving scalability and flexibility, we propose a sampling technique that is specifically designed to capture the two types of proximity relations. Extensive experiments on eight benchmark datasets show that CSE yields significantly better performance than state-of-the-art recommendation methods.
Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.