Our aim is to estimate the largest community (a.k.a., mode) in a population composed of multiple disjoint communities. This estimation is performed in a fixed confidence setting via sequential sampling of individuals with replacement. We consider two sampling models: (i) an identityless model, wherein only the community of each sampled individual is revealed, and (ii) an identity-based model, wherein the learner is able to discern whether or not each sampled individual has been sampled before, in addition to the community of that individual. The former model corresponds to the classical problem of identifying the mode of a discrete distribution, whereas the latter seeks to capture the utility of identity information in mode estimation. For each of these models, we establish information theoretic lower bounds on the expected number of samples needed to meet the prescribed confidence level, and propose sound algorithms with a sample complexity that is provably asymptotically optimal. Our analysis highlights that identity information can indeed be utilized to improve the efficiency of community mode estimation.
Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex constrained optimization that sequentially minimizes majorizing surrogates of the objective function in each block coordinate while the other coordinates are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We establish that for general constrained nonconvex optimization, BMM with strongly convex surrogates can produce an $\epsilon$-stationary point within $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$ iterations and asymptotically converges to the set of stationary points. Furthermore, we propose a trust-region variant of BMM that can handle surrogates that are only convex and still obtain the same iteration complexity and asymptotic stationarity. These results hold robustly even when the convex sub-problems are inexactly solved as long as the optimality gaps are summable. As an application, we show that a regularized version of the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung has iteration complexity of $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$. The same result holds for a wide class of regularized nonnegative tensor decomposition algorithms as well as the classical block projected gradient descent algorithm. These theoretical results are validated through various numerical experiments.
Recently proposed Generalized Time-domain Velocity Vector (GTVV) is a generalization of relative room impulse response in spherical harmonic (aka Ambisonic) domain that allows for blind estimation of early-echo parameters: the directions and relative delays of individual reflections. However, the derived closed-form expression of GTVV mandates few assumptions to hold, most important being that the impulse response of the reference signal needs to be a minimum-phase filter. In practice, the reference is obtained by spatial filtering towards the Direction-of-Arrival of the source, and the aforementioned condition is bounded by the performance of the applied beamformer (and thus, by the Ambisonic array order). In the present work, we suggest to circumvent this problem by directly modeling the impulse responses constituting the GTVV time series, which permits not only to relax the initial assumptions, but also to extract the information therein in a more consistent and efficient manner, entering the realm of blind system identification. Experiments using measured room impulse responses confirm the effectiveness of the proposed approach.
We import the algebro-geometric notion of a complete collineation into the study of maximum likelihood estimation in directed Gaussian graphical models. A complete collineation produces a perturbation of sample data, which we call a stabilisation of the sample. While a maximum likelihood estimate (MLE) may not exist or be unique given sample data, it is always unique given a stabilisation. We relate the MLE given a stabilisation to the MLE given original sample data, when one exists, providing necessary and sufficient conditions for the MLE given a stabilisation to be one given the original sample. For linear regression models, we show that the MLE given any stabilisation is the minimal norm choice among the MLEs given an original sample. We show that the MLE has a well-defined limit as the stabilisation of a sample tends to the original sample, and that the limit is an MLE given the original sample, when one exists. Finally, we study which MLEs given a sample can arise as such limits. We reduce this to a question regarding the non-emptiness of certain algebraic varieties.
This is the second in a series of articles aimed at exploring the relationship between the complexity classes of P and NP. The research in this article aims to find conditions of an algorithmic nature that are necessary and sufficient to transform any Boolean function in conjunctive normal form into a specific form that guarantees the satisfiability of this function. To find such conditions, we use the concept of a special covering of a set introduced in [13], and investigate the connection between this concept and the notion of satisfiability of Boolean functions. As shown, the problem of existence of a special covering for a set is equivalent to the Boolean satisfiability problem. Thus, an important result is the proof of the existence of necessary and sufficient conditions that make it possible to find out if there is a special covering for the set under the special decomposition. This result allows us to formulate the necessary and sufficient algorithmic conditions for Boolean satisfiability, considering the function in conjunctive normal form as a set of clauses. In parallel, as a result of the aforementioned algorithmic procedure, we obtain the values of the variables that ensure the satisfiability of this function. The terminology used related to graph theory, set theory, Boolean functions and complexity theory is consistent with the terminology in [1], [2], [3], [4]. The newly introduced terms are not found in use by other authors and do not contradict to other terms.
Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.
Quantum computing has recently emerged as a transformative technology. Yet, its promised advantages rely on efficiently translating quantum operations into viable physical realizations. In this work, we use generative machine learning models, specifically denoising diffusion models (DMs), to facilitate this transformation. Leveraging text-conditioning, we steer the model to produce desired quantum operations within gate-based quantum circuits. Notably, DMs allow to sidestep during training the exponential overhead inherent in the classical simulation of quantum dynamics -- a consistent bottleneck in preceding ML techniques. We demonstrate the model's capabilities across two tasks: entanglement generation and unitary compilation. The model excels at generating new circuits and supports typical DM extensions such as masking and editing to, for instance, align the circuit generation to the constraints of the targeted quantum device. Given their flexibility and generalization abilities, we envision DMs as pivotal in quantum circuit synthesis, enhancing both practical applications but also insights into theoretical quantum computation.
If part of a population is hidden but two or more sources are available that each cover parts of this population, dual- or multiple-system(s) estimation can be applied to estimate this population. For this it is common to use the log-linear model, estimated with maximum likelihood. These maximum likelihood estimates are based on a non-linear model and therefore suffer from finite-sample bias, which can be substantial in case of small samples or a small population size. This problem was recognised by Chapman, who derived an estimator with good small sample properties in case of two available sources. However, he did not derive an estimator for more than two sources. We propose an estimator that is an extension of Chapman's estimator to three or more sources and compare this estimator with other bias-reduced estimators in a simulation study. The proposed estimator performs well, and much better than the other estimators. A real data example on homelessness in the Netherlands shows that our proposed model can make a substantial difference.
Using fault-tolerant constructions, computations performed with unreliable components can simulate their noiseless counterparts though the introduction of a modest amount of redundancy. Given the modest overhead required to achieve fault-tolerance, and the fact that increasing the reliability of basic components often comes at a cost, are there situations where fault-tolerance may be more economical? We present a general framework to account for this overhead cost in order to effectively compare fault-tolerant to non-fault-tolerant approaches for computation, in the limit of small logical error rates. Using this detailed accounting, we determine explicit boundaries at which fault-tolerant designs become more efficient than designs that achieve comparable reliability through direct consumption of resources. We find that the fault-tolerant construction is always preferred in the limit of high reliability in cases where the resources required to construct a basic unit grows faster than $\log(1 / \epsilon)$ asymptotically for small $\epsilon$.
We study stability properties of the expected utility function in Bayesian optimal experimental design. We provide a framework for this problem in a non-parametric setting and prove a convergence rate of the expected utility with respect to a likelihood perturbation. This rate is uniform over the design space and its sharpness in the general setting is demonstrated by proving a lower bound in a special case. To make the problem more concrete we proceed by considering non-linear Bayesian inverse problems with Gaussian likelihood and prove that the assumptions set out for the general case are satisfied and regain the stability of the expected utility with respect to perturbations to the observation map. Theoretical convergence rates are demonstrated numerically in three different examples.
Large-sample Bayesian analogs exist for many frequentist methods, but are less well-known for the widely-used 'sandwich' or 'robust' variance estimates. We review existing approaches to Bayesian analogs of sandwich variance estimates and propose a new analog, as the Bayes rule under a form of balanced loss function, that combines elements of standard parametric inference with fidelity of the data to the model. Our development is general, for essentially any regression setting with independent outcomes. Being the large-sample equivalent of its frequentist counterpart, we show by simulation that Bayesian robust standard error estimates can faithfully quantify the variability of parameter estimates even under model misspecification -- thus retaining the major attraction of the original frequentist version. We demonstrate our Bayesian analog of standard error estimates when studying the association between age and systolic blood pressure in NHANES.