We formulate a class of angular Gaussian distributions that allows different degrees of isotropy for directional random variables of arbitrary dimension. Through a series of novel reparameterization, this distribution family is indexed by parameters with meaningful statistical interpretations that can range over the entire real space of an adequate dimension. The new parameterization greatly simplifies maximum likelihood estimation of all model parameters, which in turn leads to theoretically sound and numerically stable inference procedures to infer key features of the distribution. Byproducts from the likelihood-based inference are used to develop graphical and numerical diagnostic tools for assessing goodness of fit of this distribution in a data application. Simulation study and application to data from a hydrogeology study are used to demonstrate implementation and performance of the inference procedures and diagnostics methods.
We develop a novel full-Bayesian approach for multiple correlated precision matrices, called multiple Graphical Horseshoe (mGHS). The proposed approach relies on a novel multivariate shrinkage prior based on the Horseshoe prior that borrows strength and shares sparsity patterns across groups, improving posterior edge selection when the precision matrices are similar. On the other hand, there is no loss of performance when the groups are independent. Moreover, mGHS provides a similarity matrix estimate, useful for understanding network similarities across groups. We implement an efficient Metropolis-within-Gibbs for posterior inference; specifically, local variance parameters are updated via a novel and efficient modified rejection sampling algorithm that samples from a three-parameter Gamma distribution. The method scales well with respect to the number of variables and provides one of the fastest full-Bayesian approaches for the estimation of multiple precision matrices. Finally, edge selection is performed with a novel approach based on model cuts. We empirically demonstrate that mGHS outperforms competing approaches through both simulation studies and an application to a bike-sharing dataset.
Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally intensive when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.
We study the single-site Glauber dynamics for the fugacity $\lambda$, Hard-core model on the random graph $G(n, d/n)$. We show that for the typical instances of the random graph $G(n,d/n)$ and for fugacity $\lambda < \frac{d^d}{(d-1)^{d+1}}$, the mixing time of Glauber dynamics is $n^{1 + O(1/\log \log n)}$. Our result improves on the recent elegant algorithm in [Bezakova, Galanis, Goldberg Stefankovic; ICALP'22]. The algorithm there is a MCMC based sampling algorithm, but it is not the Glauber dynamics. Our algorithm here is simpler, as we use the classic Glauber dynamics. Furthermore, the bounds on mixing time we prove are smaller than those in Bezakova et al. paper, hence our algorithm is also faster. The main challenge in our proof is handling vertices with unbounded degrees. We provide stronger results with regard the spectral independence via branching values and show that the our Gibbs distributions satisfy the approximate tensorisation of the entropy. We conjecture that the bounds we have here are optimal for $G(n,d/n)$. As corollary of our analysis for the Hard-core model, we also get bounds on the mixing time of the Glauber dynamics for the Monomer-dimer model on $G(n,d/n)$. The bounds we get for this model are slightly better than those we have for the Hard-core model
We propose a multivariate probability distribution that models a linear correlation between binary and continuous variables. The proposed distribution is a natural extension of the previously developed multivariate binary distribution. As an application of the proposed distribution, we develop a factor analysis for a mixture of continuous and binary variables. We also discuss improper solutions associated with factor analysis. As a prescription to avoid improper solutions, we propose a constraint that each row vector of factor loading matrix has the same norm. We numerically validated the proposed factor analysis and norm constraint prescription by analyzing real datasets.
In this paper, we start with a variation of the star cover problem called the Two-Squirrel problem. Given a set $P$ of $2n$ points in the plane, and two sites $c_1$ and $c_2$, compute two $n$-stars $S_1$ and $S_2$ centered at $c_1$ and $c_2$ respectively such that the maximum weight of $S_1$ and $S_2$ is minimized. This problem is strongly NP-hard by a reduction from Equal-size Set-Partition with Rationals. Then we consider two variations of the Two-Squirrel problem, namely the Two-MST and Two-TSP problem, which are both NP-hard. The NP-hardness for the latter is obvious while the former needs a non-trivial reduction from Equal-size Set-Partition with Rationals. In terms of approximation algorithms, for Two-MST and Two-TSP we give factor 3.6402 and $4+\varepsilon$ approximations respectively. Finally, we also show some interesting polynomial-time solvable cases for Two-MST.
The multivariate adaptive regression spline (MARS) is one of the popular estimation methods for nonparametric multivariate regressions. However, as MARS is based on marginal splines, to incorporate interactions of covariates, products of the marginal splines must be used, which leads to an unmanageable number of basis functions when the order of interaction is high and results in low estimation efficiency. In this paper, we improve the performance of MARS by using linear combinations of the covariates which achieve sufficient dimension reduction. The special basis functions of MARS facilitate calculation of gradients of the regression function, and estimation of the linear combinations is obtained via eigen-analysis of the outer-product of the gradients. Under some technical conditions, the asymptotic theory is established for the proposed estimation method. Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction.
We revisit the following problem: given a set of indices $S = \{1, \dots, n\}$ and weights $w_1, \dots, w_n \in \mathbb{R}_{> 0}$, provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$ gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time $O(n)$, sampling takes expected time $O(1)$, and updates of size $\Delta = O(W / n)$ can be processed in time $O(1)$. To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.
We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results. We start by reviewing theoretical bounds on the performance of approximate dynamic programming algorithms. We then introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation, both online and offline. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. According to our results: (i) high entropy data distributions are well-suited for learning in an offline manner; and (ii) a certain degree of data diversity (data coverage) and data quality (closeness to optimal policy) are jointly desirable for offline learning.
We present an efficient semiparametric variational method to approximate the Gibbs posterior distribution of Bayesian regression models, which predict the data through a linear combination of the available covariates. Remarkable cases are generalized linear mixed models, support vector machines, quantile and expectile regression. The variational optimization algorithm we propose only involves the calculation of univariate numerical integrals, when no analytic solutions are available. Neither differentiability, nor conjugacy, nor elaborate data-augmentation strategies are required. Several generalizations of the proposed approach are discussed in order to account for additive models, shrinkage priors, dynamic and spatial models, providing a unifying framework for statistical learning that cover a wide range of applications. The properties of our semiparametric variational approximation are then assessed through a theoretical analysis and an extensive simulation study, in which we compare our proposal with Markov chain Monte Carlo, conjugate mean field variational Bayes and Laplace approximation in terms of signal reconstruction, posterior approximation accuracy and execution time. A real data example is then presented through a probabilistic load forecasting application on the US power load consumption data.
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.