The self-random number generation (SRNG) problem is considered for general setting. In the literature, the optimum SRNG rate with respect to the variational distance has been discussed. In this paper, we first try to characterize the optimum SRNG rate with respect to a subclass of $f$-divergences. The subclass of $f$-divergences considered in this paper includes typical distance measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence our result can be considered as a generalization of the previous result with respect to the variational distance. Next, we consider the obtained optimum SRNG rate from several viewpoints. The $\varepsilon$-source coding problem is one of related problems with the SRNG problem. Our results reveal how the SRNG problem with the $f$-divergence relate to the $\varepsilon$-fixed-length source coding problem. We also apply our results to the rate distortion perception (RDP) function. As a result, we can establish a lower bound for the RDP function with respect to $f$-divergences using our findings. Finally, we discuss the representation of the optimum SRNG rate using the smooth R\'enyi entropy.
Vector Error Correction Model (VECM) is a classic method to analyse cointegration relationships amongst multivariate non-stationary time series. In this paper, we focus on high dimensional setting and seek for sample-size-efficient methodology to determine the level of cointegration. Our investigation centres at a Bayesian approach to analyse the cointegration matrix, henceforth determining the cointegration rank. We design two algorithms and implement them on simulated examples, yielding promising results particularly when dealing with high number of variables and relatively low number of observations. Furthermore, we extend this methodology to empirically investigate the constituents of the S&P 500 index, where low-volatility portfolios can be found during both in-sample training and out-of-sample testing periods.
The Yang and Prentice (YP) regression models have garnered interest from the scientific community due to their ability to analyze data whose survival curves exhibit intersection. These models include proportional hazards (PH) and proportional odds (PO) models as specific cases. However, they encounter limitations when dealing with multivariate survival data due to potential dependencies between the times-to-event. A solution is introducing a frailty term into the hazard functions, making it possible for the times-to-event to be considered independent, given the frailty term. In this study, we propose a new class of YP models that incorporate frailty. We use the exponential distribution, the piecewise exponential distribution (PE), and Bernstein polynomials (BP) as baseline functions. Our approach adopts a Bayesian methodology. The proposed models are evaluated through a simulation study, which shows that the YP frailty models with BP and PE baselines perform similarly to the generator parametric model of the data. We apply the models in two real data sets.
Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. Existing works show that appropriate prompt design, such as Chain-of-Thoughts, can unlock LLM's powerful capacity in diverse areas. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, existing prompting strategies either suffers from insufficient expressive power or intermediate errors triggered by hallucination. To make LLM more discerning to such intermediate errors, we propose to guide LLM with a Divide-and-Conquer program that simultaneously ensures superior expressive power and disentangles task decomposition, sub-task resolution, and resolution assembly process. Theoretic analysis reveals that our strategy can guide LLM to extend the expressive power of fixed-depth Transformer. Experiments indicate that our proposed method can achieve better performance than typical prompting strategies in tasks bothered by intermediate errors and deceptive contents, such as large integer multiplication, hallucination detection and misinformation detection.
We present Bayesian Diffusion Models (BDM), a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure via joint diffusion processes. We show the effectiveness of BDM on the 3D shape reconstruction task. Compared to prototypical deep learning data-driven approaches trained on paired (supervised) data-labels (e.g. image-point clouds) datasets, our BDM brings in rich prior information from standalone labels (e.g. point clouds) to improve the bottom-up 3D reconstruction. As opposed to the standard Bayesian frameworks where explicit prior and likelihood are required for the inference, BDM performs seamless information fusion via coupled diffusion processes with learned gradient computation networks. The specialty of our BDM lies in its capability to engage the active and effective information exchange and fusion of the top-down and bottom-up processes where each itself is a diffusion process. We demonstrate state-of-the-art results on both synthetic and real-world benchmarks for 3D shape reconstruction.
Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, ``She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact that ``sneeze'' in this context causes movement cannot be explained. We form the hypothesis that this remains challenging even for state-of-the-art Large Language Models (LLMs), for which we devise a test based on substituting the verb with a prototypical motion verb. To be able to perform this test at statistically significant scale, in the absence of adequate CxG corpora, we develop a novel pipeline of NLP-assisted collection of linguistically annotated text. We show how dependency parsing and GPT-3.5 can be used to significantly reduce annotation cost and thus enable the annotation of rare phenomena at scale. We then evaluate GPT, Gemini, Llama2 and Mistral models for their understanding of the CMC using the newly collected corpus. We find that all models struggle with understanding the motion component that the CMC adds to a sentence.
We prove the expected disturbance caused to a quantum system by a sequence of randomly ordered two-outcome projective measurements is upper bounded by the square root of the probability that at least one measurement in the sequence accepts. We call this bound the Gentle Random Measurement Lemma. We then consider problems in which we are given sample access to an unknown state $\rho$ and asked to estimate properties of the accepting probabilities $\text{Tr}[M_i \rho]$ of a set of measurements $\{M_1, M_2, \ldots , M_m\}$. We call these types of problems Quantum Event Learning Problems. Using the gentle random measurement lemma, we show randomly ordering projective measurements solves the Quantum OR problem, answering an open question of Aaronson. We also give a Quantum OR protocol which works on non-projective measurements but which requires a more complicated type of measurement, which we call a Blended Measurement. Given additional guarantees on the set of measurements $\{M_1, \ldots, M_m\}$, we show the Quantum OR protocols developed in this paper can also be used to find a measurement $M_i$ such that $\text{Tr}[M_i \rho]$ is large. We also give a blended measurement based protocol for estimating the average accepting probability of a set of measurements on an unknown state. Finally we consider the Threshold Search Problem described by O'Donnell and B\u{a}descu. By building on our Quantum Event Finding result we show that randomly ordered (or blended) measurements can be used to solve this problem using $O(\log^2(m) / \epsilon^2)$ copies of $\rho$. Consequently, we obtain an algorithm for Shadow Tomography which requires $\tilde{O}(\log^2(m)\log(d)/\epsilon^4)$ samples, matching the current best known sample complexity. This algorithm does not require injected noise in the quantum measurements, but does require measurements to be made in a random order and so is no longer online.
We propose a nonlinear difference-in-differences method to estimate multivariate counterfactual distributions in classical treatment and control study designs with observational data. Our approach sheds a new light on existing approaches like the changes-in-changes and the classical semiparametric difference-in-differences estimator and generalizes them to settings with multivariate heterogeneity in the outcomes. The main benefit of this extension is that it allows for arbitrary dependence and heterogeneity in the joint outcomes. We demonstrate its utility both on synthetic and real data. In particular, we revisit the classical Card \& Krueger dataset, examining the effect of a minimum wage increase on employment in fast food restaurants; a reanalysis with our method reveals that restaurants tend to substitute full-time with part-time labor after a minimum wage increase at a faster pace. A previous version of this work was entitled "An optimal transport approach to causal inference.
Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance remains a delicate process. In this light, we present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within BO. At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can improve model-based BO. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. We empirically validate LLAMBO's efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.
We revisit the problem of spurious modes that are sometimes encountered in partial differential equations discretizations. It is generally suspected that one of the causes for spurious modes is due to how boundary conditions are treated, and we use this as the starting point of our investigations. By regarding boundary conditions as algebraic constraints on a differential equation, we point out that any differential equation with homogeneous boundary conditions also admits a typically infinite number of hidden or implicit boundary conditions. In most discretization schemes, these additional implicit boundary conditions are violated, and we argue that this is what leads to the emergence of spurious modes. These observations motivate two definitions of the quality of computed eigenvalues based on violations of derivatives of boundary conditions on the one hand, and on the Grassmann distance between subspaces associated with computed eigenspaces on the other. Both of these tests are based on a standardized treatment of boundary conditions and do not require a priori knowledge of eigenvalue locations. The effectiveness of these tests is demonstrated on several examples known to have spurious modes. In addition, these quality tests show that in most problems, about half the computed spectrum of a differential operator is of low quality. The tests also specifically identify the low accuracy modes, which can then be projected out as a type of model reduction scheme.
We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [SODA 2010]. The current best known expected approximation ratio for an $\epsilon$-DP algorithm is $O\left(\frac{\log n}{\sqrt{\epsilon}}\right)$ due to Cohen-Addad et al. [AISTATS 2022] where $n$ denote the size of the metric space, meanwhile the best known lower bound is $\Omega(1/\sqrt{\epsilon})$ [NeurIPS 2019]. In this short note, we give a lower bound of $\tilde{\Omega}\left(\min\left\{\log n, \sqrt{\frac{\log n}{\epsilon}}\right\}\right)$ on the expected approximation ratio of any $\epsilon$-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.