In this paper we investigate computational properties of the Diophantine problem for spherical equations in some classes of finite groups. We classify the complexity of different variations of the problem, e.g., when $G$ is fixed and when $G$ is a part of the input. When the group $G$ is constant or given as multiplication table, we show that the problem always can be solved in polynomial time. On the other hand, for the permutation groups $S_n$ (with $n$ part of the input), the problem is NP-complete. The situation for matrix groups is quite involved: while we exhibit sequences of 2-by-2 matrices where the problem is NP-complete, in the full group $GL(2,p)$ ($p$ prime and part of the input) it can be solved in polynomial time. We also find a similar behaviour with subgroups of matrices of arbitrary dimension over a constant ring.
Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.
This paper makes two theoretical contributions. First, we establish a novel class of Hamiltonian systems, called arc Hamiltonian systems, for saddle Hamiltonian functions over infinite-dimensional metric spaces. Arc Hamiltonian systems generate a flow that satisfies the law of conservation of energy everywhere in a metric space. They are governed by an extension of Hamilton's equation formulated based on (i) the framework of arc fields and (ii) an infinite-dimensional gradient, termed the arc gradient, of a Hamiltonian function. We derive conditions for the existence of a flow generated by an arc Hamiltonian system, showing that they reduce to local Lipschitz continuity of the arc gradient under sufficient regularity. Second, we present two Hamiltonian functions, called the cumulant generating functional and the centred cumulant generating functional, over a metric space of log-likelihoods and measures. The former characterises the posterior of Bayesian inference as a part of the arc gradient that induces a flow of log-likelihoods and non-negative measures. The latter characterises the difference of the posterior and the prior as a part of the arc gradient that induces a flow of log-likelihoods and probability measures. Our results reveal an implication of the belief updating mechanism from the prior to the posterior as an infinitesimal change of a measure in the infinite-dimensional Hamiltonian flows.
Understanding the power of parameterized quantum circuits (PQCs) in accomplishing machine learning tasks is one of the most important questions in quantum machine learning. In this paper, we analyze the expressivity of PQCs through the lens of function approximation. Previously established universal approximation theorems for PQCs are mainly nonconstructive, leading us to the following question: How large do the PQCs need to be to approximate the target function up to a given error? We exhibit explicit constructions of data re-uploading PQCs for approximating continuous and smooth functions and establish quantitative approximation error bounds in terms of the width, the depth and the number of trainable parameters of the PQCs. To achieve this, we utilize techniques from quantum signal processing and linear combinations of unitaries to construct PQCs that implement multivariate polynomials. We implement global and local approximation techniques using Bernstein polynomials and local Taylor expansion and analyze their performances in the quantum setting. We also compare our proposed PQCs to nearly optimal deep neural networks in approximating high-dimensional smooth functions, showing that the ratio between model sizes of PQC and deep neural networks is exponentially small with respect to the input dimension. This suggests a potentially novel avenue for showcasing quantum advantages in quantum machine learning.
In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
This paper adopts a tool from computational topology, the Euler characteristic curve (ECC) of a sample, to perform one- and two-sample goodness of fit tests. We call our procedure TopoTests. The presented tests work for samples of arbitrary dimension, having comparable power to the state-of-the-art tests in the one-dimensional case. It is demonstrated that the type I error of TopoTests can be controlled and their type II error vanishes exponentially with increasing sample size. Extensive numerical simulations of TopoTests are conducted to demonstrate their power for samples of various sizes.
In this paper we consider Bayesian parameter inference associated to a class of partially observed stochastic differential equations (SDE) driven by jump processes. Such type of models can be routinely found in applications, of which we focus upon the case of neuroscience. The data are assumed to be observed regularly in time and driven by the SDE model with unknown parameters. In practice the SDE may not have an analytically tractable solution and this leads naturally to a time-discretization. We adapt the multilevel Markov chain Monte Carlo method of [11], which works with a hierarchy of time discretizations and show empirically and theoretically that this is preferable to using one single time discretization. The improvement is in terms of the computational cost needed to obtain a pre-specified numerical error. Our approach is illustrated on models that are found in neuroscience.
Inspired by recent findings that generative diffusion models learn semantically meaningful representations, we use them to discover the intrinsic hierarchical structure in biomedical 3D images using unsupervised segmentation. We show that features of diffusion models from different stages of a U-Net-based ladder-like architecture capture different hierarchy levels in 3D biomedical images. We design three losses to train a predictive unsupervised segmentation network that encourages the decomposition of 3D volumes into meaningful nested subvolumes that represent a hierarchy. First, we pretrain 3D diffusion models and use the consistency of their features across subvolumes. Second, we use the visual consistency between subvolumes. Third, we use the invariance to photometric augmentations as a regularizer. Our models achieve better performance than prior unsupervised structure discovery approaches on challenging biologically-inspired synthetic datasets and on a real-world brain tumor MRI dataset.
Motivated by the growing interest in correlation-robust stochastic optimization, we investigate stochastic selection problems beyond independence. Specifically, we consider the instructive case of pairwise-independent priors and matroid constraints. We obtain essentially-optimal bounds for offline contention resolution and prophet inequalities against the almighty online adversary. The impetus for our work comes from the recent work of \cite{pi-uniform-prophet}, who derived a constant-approximation for the single-choice prophet inequality with pairwise-independent priors. For general matroids, our results are tight and largely negative. For both contention resolution and prophet inequalities, our impossibility results hold for the full linear matroid over a finite field. We explicitly construct pairwise-independent distributions which rule out an $\omega\left(\frac{1}{\rank}\right)$-balanced offline CRS and an $\omega\left(\frac{1}{\log \rank}\right)$-competitive prophet inequality. For both results, we employ a generic approach for constructing pairwise-independent random vectors -- one which unifies and generalizes existing pairwise-independence constructions from the literature on universal hash functions and pseudorandomness. Specifically, our approach is based on our observation that random linear maps turn linear independence into stochastic independence. We then examine the class of matroids which satisfy the so-called partition property -- these include most common matroids encountered in optimization. We obtain positive results for both contention resolution and prophet inequalities with pairwise-independent priors on such matroids, approximately matching the corresponding guarantees for fully independent priors.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.