We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) by proving sharp upper and lower bounds on the sample complexity. In the achievability proof, we demonstrate that probability of error of the maximum likelihood estimator vanishes for sufficiently large number of users and items, if all sufficient conditions are satisfied. On the other hand, the converse (impossibility) proof is based on the genie-aided maximum likelihood estimator. Under each necessary condition, we present examples of a genie-aided estimator to prove that the probability of error does not vanish for sufficiently large number of users and items. One important consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. More specifically, we analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph. Finally, we present simulation results to corroborate our theoretical findings and show that the characterized information-theoretic limit can be asymptotically achieved.
We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility -- it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.
We analyse the recovery of different roles in a network modelled by a directed graph, based on the so-called Neighbourhood Pattern Similarity approach. Our analysis uses results from random matrix theory to show that when assuming the graph is generated as a particular Stochastic Block Model with Bernoulli probability distributions for the different blocks, then the recovery is asymptotically correct when the graph has a sufficiently large dimension. Under these assumptions there is a sufficient gap between the dominant and dominated eigenvalues of the similarity matrix, which guarantees the asymptotic correct identification of the number of different roles. We also comment on the connections with the literature on Stochastic Block Models, including the case of probabilities of order log(n)/n where n is the graph size. We provide numerical experiments to assess the effectiveness of the method when applied to practical networks of finite size.
This paper considers the problem of estimating high dimensional Laplacian constrained precision matrices by minimizing Stein's loss. We obtain a necessary and sufficient condition for existence of this estimator, that boils down to checking whether a certain data dependent graph is connected. We also prove consistency in the high dimensional setting under the symmetryzed Stein loss. We show that the error rate does not depend on the graph sparsity, or other type of structure, and that Laplacian constraints are sufficient for high dimensional consistency. Our proofs exploit properties of graph Laplacians, and a characterization of the proposed estimator based on effective graph resistances. We validate our theoretical claims with numerical experiments.
Matrix completion refers to completing a low-rank matrix from a few observed elements of its entries and has been known as one of the significant and widely-used problems in recent years. The required number of observations for exact completion is directly proportional to rank and the coherency parameter of the matrix. In many applications, there might exist additional information about the low-rank matrix of interest. For example, in collaborative filtering, Netflix and dynamic channel estimation in communications, extra subspace information is available. More precisely in these applications, there are prior subspaces forming multiple angles with the ground-truth subspaces. In this paper, we propose a novel strategy to incorporate this information into the completion task. To this end, we designed a multi-weight nuclear norm minimization where the weights are such chosen to penalize each angle within the matrix subspace independently. We propose a new scheme for optimally choosing the weights. Specifically, we first calculate an upper-bound expression describing the coherency of the interested matrix. Then, we obtain the optimal weights by minimizing this expression. Simulation results certify the advantages of allowing multiple weights in the completion procedure. Explicitly, they indicate that our proposed multi-weight problem needs fewer observations compared to state-of-the-art methods.
In this work we advance the understanding of the fundamental limits of computation for Binary Polynomial Optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is beta-acyclic. We note that the beta-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on alpha-acyclic instances. Our algorithm can also be applied to any general BPO problem that contains beta-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance.
Modeling univariate block maxima by the generalized extreme value distribution constitutes one of the most widely applied approaches in extreme value statistics. It has recently been found that, for an underlying stationary time series, respective estimators may be improved by calculating block maxima in an overlapping way. A proof of concept is provided that the latter finding also holds in situations that involve certain piecewise stationarities. A weak convergence result for an empirical process of central interest is provided, and, as a case-in-point, further details are worked out explicitly for the probability weighted moment estimator. Irrespective of the serial dependence, the estimation variance is shown to be smaller for the new estimator, while the bias was found to be the same or vary comparably little in extensive simulation experiments. The results are illustrated by Monte Carlo simulation experiments and are applied to a common situation involving temperature extremes in a changing climate.
Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classification tasks demonstrate that the new TRI-GNN outperforms all 14 state-of-the-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10\% better performance under noisy scenarios.
Rankings, especially those in search and recommendation systems, often determine how people access information and how information is exposed to people. Therefore, how to balance the relevance and fairness of information exposure is considered as one of the key problems for modern IR systems. As conventional ranking frameworks that myopically sorts documents with their relevance will inevitably introduce unfair result exposure, recent studies on ranking fairness mostly focus on dynamic ranking paradigms where result rankings can be adapted in real-time to support fairness in groups (i.e., races, genders, etc.). Existing studies on fairness in dynamic learning to rank, however, often achieve the overall fairness of document exposure in ranked lists by significantly sacrificing the performance of result relevance and fairness on the top results. To address this problem, we propose a fair and unbiased ranking method named Maximal Marginal Fairness (MMF). The algorithm integrates unbiased estimators for both relevance and merit-based fairness while providing an explicit controller that balances the selection of documents to maximize the marginal relevance and fairness in top-k results. Theoretical and empirical analysis shows that, with small compromises on long list fairness, our method achieves superior efficiency and effectiveness comparing to the state-of-the-art algorithms in both relevance and fairness for top-k rankings.
Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a "distilled" version of previous GA-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the state-of-the-art negative sampling methods based on GAN.
Although Recommender Systems have been comprehensively studied in the past decade both in industry and academia, most of current recommender systems suffer from the fol- lowing issues: 1) The data sparsity of the user-item matrix seriously affect the recommender system quality. As a result, most of traditional recommender system approaches are not able to deal with the users who have rated few items, which is known as cold start problem in recommender system. 2) Traditional recommender systems assume that users are in- dependently and identically distributed and ignore the social relation between users. However, in real life scenario, due to the exponential growth of social networking service, such as facebook and Twitter, social connections between different users play an significant role for recommender system task. In this work, aiming at providing a better recommender sys- tems by incorporating user social network information, we propose a matrix factorization framework with user social connection constraints. Experimental results on the real-life dataset shows that the proposed method performs signifi- cantly better than the state-of-the-art approaches in terms of MAE and RMSE, especially for the cold start users.