We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence. We report the corresponding scalar functions for the total variation distance, the Kullback-Leibler divergence, the squared Hellinger divergence, and the Jensen-Shannon divergence among others. Next, we give conditions to expand the $f$-divergences as converging infinite series of higher-order power chi divergences, and illustrate the criterion for converging Taylor series expressing the $f$-divergences between Cauchy distributions. We then show that the symmetric property of $f$-divergences holds for multivariate location-scale families with prescribed matrix scales provided that the standard density is even which includes the cases of the multivariate normal and Cauchy families. However, the $f$-divergences between multivariate Cauchy densities with different scale matrices are shown asymmetric. Finally, we present several metrizations of $f$-divergences between univariate Cauchy distributions and further report geometric embedding properties of the Kullback-Leibler divergence.
In this paper, we introduce reduced-bias estimators for the estimation of the tail index of a Pareto-type distribution. This is achieved through the use of a regularised weighted least squares with an exponential regression model for log-spacings of top order statistics. The asymptotic properties of the proposed estimators are investigated analytically and found to be asymptotically unbiased, consistent and normally distributed. Also, the finite sample behaviour of the estimators are studied through a simulations theory. The proposed estimators were found to yield low bias and MSE. In addition, the proposed estimators are illustrated through the estimation of the tail index of the underlying distribution of claims from the insurance industry.
We introduce a prior for the parameters of univariate continuous distributions, based on the Wasserstein information matrix, which is invariant under reparameterisations. We briefly discuss the links between the proposed prior with information geometry. We present several examples where we can either obtain this prior in closed-form, or propose a numerically tractable approximation for cases where the prior is not available in closed-form. Since this prior is improper in some cases, we present sufficient conditions for the propriety of the posterior distribution for general classes of models.
A triangle in a hypergraph $\mathcal{H}$ is a set of three distinct edges $e, f, g\in\mathcal{H}$ and three distinct vertices $u, v, w\in V(\mathcal{H})$ such that $\{u, v\}\subseteq e$, $\{v, w\}\subseteq f$, $\{w, u\}\subseteq g$ and $\{u, v, w\}\cap e\cap f\cap g=\emptyset$. Johansson proved in 1996 that $\chi(G)=\mathcal{O}(\Delta/\log\Delta)$ for any triangle-free graph $G$ with maximum degree $\Delta$. Cooper and Mubayi later generalized the Johansson's theorem to all rank $3$ hypergraphs. In this paper we provide a common generalization of both these results for all hypergraphs, showing that if $\mathcal{H}$ is a rank $k$, triangle-free hypergraph, then the list chromatic number \[ \chi_{\ell}(\mathcal{H})\leq \mathcal{O}\left(\max_{2\leq \ell \leq k} \left\{\left( \frac{\Delta_{\ell}}{\log \Delta_{\ell}} \right)^{\frac{1}{\ell-1}} \right\}\right), \] where $\Delta_{\ell}$ is the maximum $\ell$-degree of $\mathcal{H}$. The result is sharp apart from the constant. Moreover, our result implies, generalizes and improves several earlier results on the chromatic number and also independence number of hypergraphs, while its proof is based on a different approach than prior works in hypergraphs (and therefore provides alternative proofs to them). In particular, as an application, we establish a bound on chromatic number of sparse hypergraphs in which each vertex is contained in few triangles, and thus extend results of Alon, Krivelevich and Sudakov, and Cooper and Mubayi from hypergraphs of rank 2 and 3, respectively, to all hypergraphs.
Let $G$ be a connected graph. The eccentricity of a path $P$, denoted by ecc$_G(P)$, is the maximum distance from $P$ to any vertex in $G$. In the \textsc{Central path} (CP) problem our aim is to find a path of minimum eccentricity. This problem was introduced by Cockayne et al., in 1981, in the study of different centrality measures on graphs. They showed that CP can be solved in linear time in trees, but it is known to be NP-hard in many classes of graphs such as chordal bipartite graphs, planar 3-connected graphs, split graphs, etc. We investigate the path eccentricity of a connected graph~$G$ as a parameter. Let pe$(G)$ denote the value of ecc$_G(P)$ for a central path $P$ of $G$. We obtain tight upper bounds for pe$(G)$ in some graph classes. We show that pe$(G) \leq 1$ on biconvex graphs and that pe$(G) \leq 2$ on bipartite convex graphs. Moreover, we design algorithms that find such a path in linear time. On the other hand, by investigating the longest paths of a graph, we obtain tight upper bounds for pe$(G)$ on general graphs and $k$-connected graphs. Finally, we study the relation between a central path and a longest path in a graph. We show that on trees, and bipartite permutation graphs, a longest path is also a central path. Furthermore, for superclasses of these graphs, we exhibit counterexamples for this property.
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.
A new approach to $L_2$-consistent estimation of a general density functional using $k$-nearest neighbor distances is proposed, where the functional under consideration is in the form of the expectation of some function $f$ of the densities at each point. The estimator is designed to be asymptotically unbiased, using the convergence of the normalized volume of a $k$-nearest neighbor ball to a Gamma distribution in the large-sample limit, and naturally involves the inverse Laplace transform of a scaled version of the function $f.$ Some instantiations of the proposed estimator recover existing $k$-nearest neighbor based estimators of Shannon and R\'enyi entropies and Kullback--Leibler and R\'enyi divergences, and discover new consistent estimators for many other functionals such as logarithmic entropies and divergences. The $L_2$-consistency of the proposed estimator is established for a broad class of densities for general functionals, and the convergence rate in mean squared error is established as a function of the sample size for smooth, bounded densities.
How can we approximate sparse graphs and sequences of sparse graphs (with unbounded average degree)? We consider convergence in the first $k$ moments of the graph spectrum (equivalent to the numbers of closed $k$-walks) appropriately normalized. We introduce a simple, easy to sample, random graph model that captures the limiting spectra of many sequences of interest, including the sequence of hypercube graphs. The Random Overlapping Communities (ROC) model is specified by a distribution on pairs $(s,q)$, $s \in \mathbb{Z}_+, q \in (0,1]$. A graph on $n$ vertices with average degree $d$ is generated by repeatedly picking pairs $(s,q)$ from the distribution, adding an Erd\H{o}s-R\'{e}nyi random graph of edge density $q$ on a subset of vertices chosen by including each vertex with probability $s/n$, and repeating this process so that the expected degree is $d$. Our proof of convergence to a ROC random graph is based on the Stieltjes moment condition. We also show that the model is an effective approximation for individual graphs. For almost all possible triangle-to-edge and four-cycle-to-edge ratios, there exists a pair $(s,q)$ such that the ROC model with this single community type produces graphs with both desired ratios, a property that cannot be achieved by stochastic block models of bounded description size. Moreover, ROC graphs exhibit an inverse relationship between degree and clustering coefficient, a characteristic of many real-world networks.
It is argued that there is a need for fat-tailed distributions that become thin in the extreme tail. A 3-parameter distribution is introduced that visually resembles the t-distribution and interpolates between the normal distribution and the Cauchy distribution. It is fat-tailed, but has all moments finite, and the moment-generating function exists. It would be useful as an alternative to the t-distribution for a sensitivity analysis to check the robustness of results or for computations where finite moments are needed, such as in option-pricing. It can be motivated probabilistically in at least two ways, either as the random thinning of a long-tailed distribution, or as random variation of the variance of a normal distribution. Its properties are described, algorithms for random-number generation are provided, and examples of its use in data-fitting given. Some related distributions are also discussed, including asymmetric and multivariate distributions.
Instrumental variable models allow us to identify a causal function between covariates X and a response Y, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response Y and the hidden confounders are uncorrelated with the instruments Z. This is often motivated by a graphical separation, an argument that also justifies independence. Posing an independence condition, however, leads to strictly stronger identifiability results. We connect to existing literature in econometrics and provide a practical method for exploiting independence that can be combined with any gradient-based learning procedure. We see that even in identifiable settings, taking into account higher moments may yield better finite sample results. Furthermore, we exploit the independence for distribution generalization. We prove that the proposed estimator is invariant to distributional shifts on the instruments and worst-case optimal whenever these shifts are sufficiently strong. These results hold even in the under-identified case where the instruments are not sufficiently rich to identify the causal function.
Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology structure to learn node representations, recently proposed attributed network embedding algorithms attempt to integrate rich node content information with network topological structure for enhancing the quality of network embedding. In reality, networks often have sparse content, incomplete node attributes, as well as the discrepancy between node attribute feature space and network structure space, which severely deteriorates the performance of existing methods. In this paper, we propose a unified framework for attributed network embedding-attri2vec-that learns node embeddings by discovering a latent node attribute subspace via a network structure guided transformation performed on the original attribute space. The resultant latent subspace can respect network structure in a more consistent way towards learning high-quality node representations. We formulate an optimization problem which is solved by an efficient stochastic gradient descent algorithm, with linear time complexity to the number of nodes. We investigate a series of linear and non-linear transformations performed on node attributes and empirically validate their effectiveness on various types of networks. Another advantage of attri2vec is its ability to solve out-of-sample problems, where embeddings of new coming nodes can be inferred from their node attributes through the learned mapping function. Experiments on various types of networks confirm that attri2vec is superior to state-of-the-art baselines for node classification, node clustering, as well as out-of-sample link prediction tasks. The source code of this paper is available at //github.com/daokunzhang/attri2vec.