General log-linear models are widely used to express the association in multivariate frequency data on contingency tables. The paper focuses on the power analysis for testing the goodness-of-fit hypothesis for these models. Conventionally, for the power-related sample size calculations a deviation from the null hypothesis, aka effect size, is specified by means of the chi-square goodness-of-fit index. It is argued that the odds ratio is a more natural measure of effect size, with the advantage of having a data-relevant interpretation. Therefore, a class of log-affine models that are specified by odds ratios whose values deviate from those of the null by a small amount can be chosen as an alternative. Being expressed as sets of constraints on odds ratios, both hypotheses are represented by smooth surfaces in the probability simplex, and thus, the power analysis can be given a geometric interpretation as well. A concept of geometric power is introduced and a Monte-Carlo algorithm for its estimation is proposed. The framework is applied to the power analysis of goodness-of-fit in the context of multinomial sampling. An iterative scaling procedure for generating distributions from a log-affine model is described and its convergence is proved. To illustrate, the geometric power analysis is carried out for data from a clinical study.
We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore, if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.
We develop a novel discontinuous Galerkin method for solving the rotating thermal shallow water equations (TRSW) on a curvilinear mesh. Our method is provably entropy stable, conserves mass, buoyancy and vorticity, while also semi-discretely conserving energy. This is achieved by using novel numerical fluxes and splitting the pressure and convection operators. We implement our method on a cubed sphere mesh and numerically verify our theoretical results. Our experiments demonstrate the robustness of the method for a regime of well developed turbulence, where it can be run stably without any dissipation. The entropy stable fluxes are sufficient to control the grid scale noise generated by geostrophic turbulence, eliminating the need for artificial stabilization.
Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.
Parameter identification problems in partial differential equations (PDEs) consist in determining one or more unknown functional parameters in a PDE. Here, the Bayesian nonparametric approach to such problems is considered. Focusing on the representative example of inferring the diffusivity function in an elliptic PDE from noisy observations of the PDE solution, the performance of Bayesian procedures based on Gaussian process priors is investigated. Recent asymptotic theoretical guarantees establishing posterior consistency and convergence rates are reviewed and expanded upon. An implementation of the associated posterior-based inference is provided, and illustrated via a numerical simulation study where two different discretisation strategies are devised. The reproducible code is available at: //github.com/MattGiord.
High-dimensional, higher-order tensor data are gaining prominence in a variety of fields, including but not limited to computer vision and network analysis. Tensor factor models, induced from noisy versions of tensor decomposition or factorization, are natural potent instruments to study a collection of tensor-variate objects that may be dependent or independent. However, it is still in the early stage of developing statistical inferential theories for estimation of various low-rank structures, which are customary to play the role of signals of tensor factor models. In this paper, starting from tensor matricization, we aim to ``decode" estimation of a higher-order tensor factor model in the sense that, we recast it into mode-wise traditional high-dimensional vector/fiber factor models so as to deploy the conventional estimation of principle components analysis (PCA). Demonstrated by the Tucker tensor factor model (TuTFaM), which is induced from most popular Tucker decomposition, we summarize that estimations on signal components are essentially mode-wise PCA techniques, and the involvement of projection and iteration will enhance the signal-to-noise ratio to various extend. We establish the inferential theory of the proposed estimations and conduct rich simulation experiments under TuTFaM, and illustrate how the proposed estimations can work in tensor reconstruction, clustering for video and economic datasets, respectively.
We study the statistical capacity of the classical binary perceptrons with general thresholds $\kappa$. After recognizing the connection between the capacity and the bilinearly indexed (bli) random processes, we utilize a recent progress in studying such processes to characterize the capacity. In particular, we rely on \emph{fully lifted} random duality theory (fl RDT) established in \cite{Stojnicflrdt23} to create a general framework for studying the perceptrons' capacities. Successful underlying numerical evaluations are required for the framework (and ultimately the entire fl RDT machinery) to become fully practically operational. We present results obtained in that directions and uncover that the capacity characterizations are achieved on the second (first non-trivial) level of \emph{stationarized} full lifting. The obtained results \emph{exactly} match the replica symmetry breaking predictions obtained through statistical physics replica methods in \cite{KraMez89}. Most notably, for the famous zero-threshold scenario, $\kappa=0$, we uncover the well known $\alpha\approx0.8330786$ scaled capacity.
We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.
Mean-field molecular dynamics based on path integrals is used to approximate canonical quantum observables for particle systems consisting of nuclei and electrons. A computational bottleneck is the sampling from the Gibbs density of the electron operator, which due to the fermion sign problem has a computational complexity that scales exponentially with the number of electrons. In this work we construct an algorithm that approximates the mean-field Hamiltonian by path integrals for fermions. The algorithm is based on the determinant of a matrix with components based on Brownian bridges connecting permuted electron coordinates. The computational work for $n$ electrons is $\mathcal O(n^3)$, which reduces the computational complexity associated with the fermion sign problem. We analyze a bias resulting from this approximation and provide a computational error indicator. It remains to rigorously explain the surprisingly high accuracy.
Deep generative models are key-enabling technology to computer vision, text generation and large language models. Denoising diffusion probabilistic models (DDPMs) have recently gained much attention due to their ability to generate diverse and high-quality samples in many computer vision tasks, as well as to incorporate flexible model architectures and relatively simple training scheme. Quantum generative models, empowered by entanglement and superposition, have brought new insight to learning classical and quantum data. Inspired by the classical counterpart, we propose the quantum denoising diffusion probabilistic models (QuDDPM) to enable efficiently trainable generative learning of quantum data. QuDDPM adopts sufficient layers of circuits to guarantee expressivity, while introduces multiple intermediate training tasks as interpolation between the target distribution and noise to avoid barren plateau and guarantee efficient training. We provide bounds on the learning error and demonstrate QuDDPM's capability in learning correlated quantum noise model, quantum many-body phases and topological structure of quantum data. The results provide a paradigm for versatile and efficient quantum generative learning.
We study the optimal sample complexity of neighbourhood selection in linear structural equation models, and compare this to best subset selection (BSS) for linear models under general design. We show by example that -- even when the structure is \emph{unknown} -- the existence of underlying structure can reduce the sample complexity of neighbourhood selection. This result is complicated by the possibility of path cancellation, which we study in detail, and show that improvements are still possible in the presence of path cancellation. Finally, we support these theoretical observations with experiments. The proof introduces a modified BSS estimator, called klBSS, and compares its performance to BSS. The analysis of klBSS may also be of independent interest since it applies to arbitrary structured models, not necessarily those induced by a structural equation model. Our results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.