Initially used to rank web pages, PageRank has now been applied in many fields. In general case, there are plenty of special vertices such as dangling vertices and unreferenced vertices in the graph. Existing PageRank algorithms usually consider them as `bad` vertices since they may take troubles. However, in this paper, we propose a parallel PageRank algorithm which can take advantage of these special vertices. For this end, we firstly interpret PageRank from the information transmitting perspective and give a constructive definition of PageRank. Then, based on the information transmitting interpretation, a parallel PageRank algorithm which we call the Information Transmitting Algorithm(ITA) is proposed. We prove that the dangling vertices can increase ITA's convergence rate and the unreferenced vertices and weak unreferenced vertices can decrease ITA's calculations. Compared with the MONTE CARLO method, ITA has lower bandwidth requirement. Compared with the power method, ITA has higher convergence rate and generates less calculations. Finally, experimental results on four data sets demonstrate that ITA is 1.5-4 times faster than the power method and converges more uniformly.
Natural policy gradient (NPG) methods with entropy regularization achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the \emph{persistence of excitation} condition, and achieves a fast convergence rate of $\tilde{O}(1/T)$ up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.
Diffusion probabilistic models (DPMs) represent a class of powerful generative models. Despite their success, the inference of DPMs is expensive since it generally needs to iterate over thousands of timesteps. A key problem in the inference is to estimate the variance in each timestep of the reverse process. In this work, we present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms w.r.t. its score function. Building upon it, we propose Analytic-DPM, a training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. Further, to correct the potential bias caused by the score-based model, we derive both lower and upper bounds of the optimal variance and clip the estimate for a better result. Empirically, our analytic-DPM improves the log-likelihood of various DPMs, produces high-quality samples, and meanwhile enjoys a 20x to 80x speed up.
Motivated by the problem of exploring discrete but very complex state spaces in Bayesian models, we propose a novel Markov Chain Monte Carlo search algorithm: the taxicab sampler. We describe the construction of this sampler and discuss how its interpretation and usage differs from that of standard Metropolis-Hastings as well as the related Hamming ball sampler. The proposed sampling algorithm is then shown to demonstrate substantial improvement in computation time without any loss of efficiency relative to a na\"ive Metropolis-Hastings search in a motivating Bayesian regression tree count model, in which we leverage the discrete state space assumption to construct a novel likelihood function that allows for flexibly describing different mean-variance relationships while preserving parameter interpretability compared to existing likelihood functions for count data.
Like most multiobjective combinatorial optimization problems, biobjective optimization problems on matroids are in general intractable and their corresponding decision problems are in general NP-hard. In this paper, we consider biobjective optimization problems on matroids where one of the objective functions is restricted to binary cost coefficients. We show that in this case the problem has a connected efficient set with respect to a natural definition of a neighborhood structure and hence, can be solved efficiently using a neighborhood search approach. This is, to the best of our knowledge, the first non-trivial problem on matroids where connectedness of the efficient set can be established. The theoretical results are validated by numerical experiments with biobjective minimum spanning tree problems (graphic matroids) and with biobjective knapsack problems with a cardinality constraint (uniform matroids). In the context of the minimum spanning tree problem, coloring all edges with cost 0 green and all edges with cost 1 red leads to an equivalent problem where we want to simultaneously minimize one general objective and the number of red edges (which defines the second objective) in a Pareto sense.
Orthogonal polynomials of several variables have a vector-valued three-term recurrence relation, much like the corresponding one-dimensional relation. This relation requires only knowledge of certain recurrence matrices, and allows simple and stable evaluation of multivariate orthogonal polynomials. In the univariate case, various algorithms can evaluate the recurrence coefficients given the ability to compute polynomial moments, but such a procedure is absent in multiple dimensions. We present a new Multivariate Stieltjes (MS) algorithm that fills this gap in the multivariate case, allowing computation of recurrence matrices assuming moments are available. The algorithm is essentially explicit in two and three dimensions, but requires the numerical solution to a non-convex problem in more than three dimensions. Compared to direct Gram-Schmidt-type orthogonalization, we demonstrate on several examples in up to three dimensions that the MS algorithm is far more stable, and allows accurate computation of orthogonal bases in the multivariate setting, in contrast to direct orthogonalization approaches.
Network Function Virtualization (NFV) carries the potential for on-demand deployment of network algorithms in virtual machines (VMs). In large clouds, however, VM resource allocation incurs delays that hinder the dynamic scaling of such NFV deployment. Parallel resource management is a promising direction for boosting performance, but it may significantly increase the communication overhead and the decline ratio of deployment attempts. Our work analyzes the performance of various placement algorithms and provides empirical evidence that state-of-the-art parallel resource management dramatically increases the decline ratio of deterministic algorithms but hardly affects randomized algorithms. We, therefore, introduce APSR -- an efficient parallel random resource management algorithm that requires information only from a small number of hosts and dynamically adjusts the degree of parallelism to provide provable decline ratio guarantees. We formally analyze APSR, evaluate it on real workloads, and integrate it into the popular OpenStack cloud management platform. Our evaluation shows that APSR matches the throughput provided by other parallel schedulers, while achieving up to 13x lower decline ratio and a reduction of over 85% in communication overheads.
In this paper, we study ideals spanned by polynomials or overconvergent series in a Tate algebra. With state-of-the-art algorithms for computing Tate Gr{\"o}bner bases, even if the input is polynomials, the size of the output grows with the required precision, both in terms of the size of the coefficients and the size of the support of the series. We prove that ideals which are spanned by polynomials admit a Tate Gr{\"o}bner basis made of polynomials, and we propose an algorithm, leveraging Mora's weak normal form algorithm, for computing it. As a result, the size of the output of this algorithm grows linearly with the precision. Following the same ideas, we propose an algorithm which computes an overconvergent basis for an ideal spanned by overconvergent series. Finally, we prove the existence of a universal analytic Gr{\"o}bner basis for polynomial ideals in Tate algebras, compatible with all convergence radii.
Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution.
Despite impressive progress in high-resource settings, Neural Machine Translation (NMT) still struggles in low-resource and out-of-domain scenarios, often failing to match the quality of phrase-based translation. We propose a novel technique that combines back-translation and multilingual NMT to improve performance in these difficult cases. Our technique trains a single model for both directions of a language pair, allowing us to back-translate source or target monolingual data without requiring an auxiliary model. We then continue training on the augmented parallel data, enabling a cycle of improvement for a single model that can incorporate any source, target, or parallel data to improve both translation directions. As a byproduct, these models can reduce training and deployment costs significantly compared to uni-directional models. Extensive experiments show that our technique outperforms standard back-translation in low-resource scenarios, improves quality on cross-domain tasks, and effectively reduces costs across the board.
State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.