The circular uniform distribution on the unit circle is closed under summation, that is, the sum of independent circular uniformly distributed random variables is also circular uniformly distributed. In this study, it is shown that a family of circular distributions based on nonnegative trigonometric sums (NNTS) is also closed under summation. Given the flexibility of NNTS circular distributions to model multimodality and skewness, these are good candidates for use as alternative models to test for circular uniformity to detect different deviations from the null hypothesis of circular uniformity. The circular uniform distribution is a member of the NNTS family, but in the NNTS parameter space, it corresponds to a point on the boundary of the parameter space, implying that the regularity conditions are not satisfied when the parameters are estimated by using the maximum likelihood method. Two NNTS tests for circular uniformity were developed by considering the standardised maximum likelihood estimator and the generalised likelihood ratio. Given the nonregularity condition, the critical values of the proposed NNTS circular uniformity tests were obtained via simulation and interpolated for any sample size by the fitting of regression models. The validity of the proposed NNTS circular uniformity tests was evaluated by generating NNTS models close to the circular uniformity null hypothesis.
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
Generalized variational inference (GVI) provides an optimization-theoretic framework for statistical estimation that encapsulates many traditional estimation procedures. The typical GVI problem is to compute a distribution of parameters that maximizes the expected payoff minus the divergence of the distribution from a specified prior. In this way, GVI enables likelihood-free estimation with the ability to control the influence of the prior by tuning the so-called learning rate. Recently, GVI was shown to outperform traditional Bayesian inference when the model and prior distribution are misspecified. In this paper, we introduce and analyze a new GVI formulation based on utility theory and risk management. Our formulation is to maximize the expected payoff while enforcing constraints on the maximizing distribution. We recover the original GVI distribution by choosing the feasible set to include a constraint on the divergence of the distribution from the prior. In doing so, we automatically determine the learning rate as the Lagrange multiplier for the constraint. In this setting, we are able to transform the infinite-dimensional estimation problem into a two-dimensional convex program. This reformulation further provides an analytic expression for the optimal density of parameters. In addition, we prove asymptotic consistency results for empirical approximations of our optimal distributions. Throughout, we draw connections between our estimation procedure and risk management. In fact, we demonstrate that our estimation procedure is equivalent to evaluating a risk measure. We test our procedure on an estimation problem with a misspecified model and prior distribution, and conclude with some extensions of our approach.
Obtaining the solutions of partial differential equations based on various machine learning methods has drawn more and more attention in the fields of scientific computation and engineering applications. In this work, we first propose a coupled Extreme Learning Machine (called CELM) method incorporated with the physical laws to solve a class of fourth-order biharmonic equations by reformulating it into two well-posed Poisson problems. In addition, some activation functions including tangent, gauss, sine, and trigonometric (sin+cos) functions are introduced to assess our CELM method. Notably, the sine and trigonometric functions demonstrate a remarkable ability to effectively minimize the approximation error of the CELM model. In the end, several numerical experiments are performed to study the initializing approaches for both the weights and biases of the hidden units in our CELM model and explore the required number of hidden units. Numerical results show the proposed CELM algorithm is high-precision and efficient to address the biharmonic equation in both regular and irregular domains.
When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to imagine novel views after observing multiple viewpoints. Recent remarkable advances in multi-view object-centric learning still leaves some unresolved problems: 1) The shapes of partially or completely occluded objects can not be well reconstructed. 2) The novel viewpoint prediction depends on expensive viewpoint annotations rather than implicit rules in view representations. In this paper, we introduce a time-conditioned generative model for videos. To reconstruct the complete shape of an object accurately, we enhance the disentanglement between the latent representations of objects and views, where the latent representations of time-conditioned views are jointly inferred with a Transformer and then are input to a sequential extension of Slot Attention to learn object-centric representations. In addition, Gaussian processes are employed as priors of view latent variables for video generation and novel-view prediction without viewpoint annotations. Experiments on multiple datasets demonstrate that the proposed model can make object-centric video decomposition, reconstruct the complete shapes of occluded objects, and make novel-view predictions.
The persistent homology transform (PHT) represents a shape with a multiset of persistence diagrams parameterized by the sphere of directions in the ambient space. In this work, we describe a finite set of diagrams that discretize the PHT such that it faithfully represents the underlying shape. We provide a discretization that is exponential in the dimension of the shape. Moreover, we show that this discretization is stable with respect to various perturbations. Furthermore, we provide an algorithm for computing the discretization. Our approach relies only on knowing the heights and dimensions of topological events, which means that it can be adapted to provide discretizations of other dimension-returning topological transforms, including the Betti curve transform. With mild alterations, we also adapt our methods to faithfully discretize the Euler Characteristic curve transform.
Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on five application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.
Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional loss that encourages low-confidence predictions on OoD data during training. While outlier exposure has shown promising potential in improving OoD detection performance, all previous studies on outlier exposure have been limited to utilizing visual outliers. Drawing inspiration from the recent advancements in vision-language pre-training, this paper venture out to the uncharted territory of textual outlier exposure. First, we uncover the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents. Then, we propose various ways of generating preferable textual outliers. Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers: near-distribution, descriptiveness, and inclusion of visual semantics.
We provide several new results on the sample complexity of vector-valued linear predictors (parameterized by a matrix), and more generally neural networks. Focusing on size-independent bounds, where only the Frobenius norm distance of the parameters from some fixed reference matrix $W_0$ is controlled, we show that the sample complexity behavior can be surprisingly different than what we may expect considering the well-studied setting of scalar-valued linear predictors. This also leads to new sample complexity bounds for feed-forward neural networks, tackling some open questions in the literature, and establishing a new convex linear prediction problem that is provably learnable without uniform convergence.
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.