Semi-definite programs represent a frontier of efficient computation. While there has been much progress on semi-definite optimization, with moderate-sized instances currently solvable in practice by the interior-point method, the basic problem of sampling semi-definite solutions remains a formidable challenge. The direct application of known polynomial-time algorithms for sampling general convex bodies to semi-definite sampling leads to a prohibitively high running time. In addition, known general methods require an expensive rounding phase as pre-processing. Here we analyze the Dikin walk, by first adapting it to general metrics, then devising suitable metrics for the PSD cone with affine constraints. The resulting mixing time and per-step complexity are considerably smaller, and by an appropriate choice of the metric, the dependence on the number of constraints can be made polylogarithmic. We introduce a refined notion of self-concordant matrix functions and give rules for combining different metrics. Along the way, we further develop the theory of interior-point methods for sampling.
Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used because of its simplicity, it is often outperformed by modified training schemes. In this work, we identify a specific bottleneck for the performance of temperature scaling. We show that for empirical risk minimizers for a general set of distributions in which the supports of classes have overlaps, the performance of temperature scaling degrades with the amount of overlap between classes, and asymptotically becomes no better than random when there are a large number of classes. On the other hand, we prove that optimizing a modified form of the empirical risk induced by the Mixup data augmentation technique can in fact lead to reasonably good calibration performance, showing that training-time calibration may be necessary in some situations. We also verify that our theoretical results reflect practice by showing that Mixup significantly outperforms empirical risk minimization (with respect to multiple calibration metrics) on image classification benchmarks with class overlaps introduced in the form of label noise.
High-fidelity 3D scene reconstruction has been substantially advanced by recent progress in neural fields. However, most existing methods train a separate network from scratch for each individual scene. This is not scalable, inefficient, and unable to yield good results given limited views. While learning-based multi-view stereo methods alleviate this issue to some extent, their multi-view setting makes it less flexible to scale up and to broad applications. Instead, we introduce training generalizable Neural Fields incorporating scene Priors (NFPs). The NFP network maps any single-view RGB-D image into signed distance and radiance values. A complete scene can be reconstructed by merging individual frames in the volumetric space WITHOUT a fusion module, which provides better flexibility. The scene priors can be trained on large-scale datasets, allowing for fast adaptation to the reconstruction of a new scene with fewer views. NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields. More qualitative results are available at: //oasisyang.github.io/neural-prior
We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbitrary distribution "shifted by $x^*$" so that $x^*$ satisfies each constraint. For an $n$ variable semirandom planted instance of a $k$-arity CSP, our algorithm runs in polynomial time and outputs an assignment that satisfies all but a $o(1)$-fraction of constraints, provided that the instance has at least $\tilde{O}(n^{k/2})$ constraints. This matches, up to $polylog(n)$ factors, the clause threshold for algorithms that solve fully random planted CSPs [FPV15], as well as algorithms that refute random and semirandom CSPs [AOW15, AGK21]. Our result shows that despite having worst-case clause structure, the randomness in the literal patterns makes semirandom planted CSPs significantly easier than worst-case, where analogous results require $O(n^k)$ constraints [AKK95, FLP16]. Perhaps surprisingly, our algorithm follows a significantly different conceptual framework when compared to the recent resolution of semirandom CSP refutation. This turns out to be inherent and, at a technical level, can be attributed to the need for relative spectral approximation of certain random matrices - reminiscent of the classical spectral sparsification - which ensures that an SDP can certify the uniqueness of the planted assignment. In contrast, in the refutation setting, it suffices to obtain a weaker guarantee of absolute upper bounds on the spectral norm of related matrices.
The quantum rate-distortion function plays a fundamental role in quantum information theory, however there is currently no practical algorithm which can efficiently compute this function to high accuracy for moderate channel dimensions. In this paper, we show how symmetry reduction can significantly simplify common instances of the entanglement-assisted quantum rate-distortion problems, allowing for more efficient computation regardless of the numerical algorithm being used. For some of these problem instances, symmetry reduction allows us to derive closed-form expressions for the quantum rate-distortion function. Additionally, we propose an inexact variant of the mirror descent algorithm to compute the quantum rate-distortion function with provable sublinear convergence rates. We show how this mirror descent algorithm is related to Blahut-Arimoto and expectation-maximization methods previously used to solve similar problems in information theory. Using these techniques, we present the first numerical experiments to compute a multi-qubit quantum rate-distortion function, and show that our proposed algorithm solves faster and to higher accuracy when compared to existing methods.
In this paper, we show that utilizing multiple protocols offers a unique opportunity to improve IP alias resolution and dual-stack inference substantially. Our key observation is that prevalent protocols, e.g., SSH and BGP, reply to unsolicited requests with a set of values that can be combined to form a unique device identifier. More importantly, this is possible by just completing the TCP hand-shake. Our empirical study shows that utilizing readily available scans and our active measurements can double the discovered IPv4 alias sets and more than 30x the dual-stack sets compared to the state-of-the-art techniques. We provide insights into our method's accuracy and performance compared to popular techniques.
This work studies the pure-exploration setting for the convex hull membership (CHM) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHM problem in the one-dimensional setting. We present the first asymptotically optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule. In addition, we extend the algorithm to settings that generalize several important problems in the multi-armed bandit literature. Furthermore, we discuss the extension of Thompson-CHM to higher dimensions. Finally, we provide numerical experiments to demonstrate the empirical behavior of the algorithm matches our theoretical results for realistic time horizons.
Existing approaches to estimating politicians' latent positions along specific dimensions often fail when relevant data is limited. We leverage the embedded knowledge in generative large language models (LLMs) to address this challenge and measure lawmakers' positions along specific political or policy dimensions. We prompt an instruction/dialogue-tuned LLM to pairwise compare lawmakers and then scale the resulting graph using the Bradley-Terry model. We estimate novel measures of U.S. senators' positions on liberal-conservative ideology, gun control, and abortion. Our liberal-conservative scale, used to validate LLM-driven scaling, strongly correlates with existing measures and offsets interpretive gaps, suggesting LLMs synthesize relevant data from internet and digitized media rather than memorizing existing measures. Our gun control and abortion measures -- the first of their kind -- differ from the liberal-conservative scale in face-valid ways and predict interest group ratings and legislator votes better than ideology alone. Our findings suggest LLMs hold promise for solving complex social science measurement problems.
We propose a general framework for obtaining probabilistic solutions to PDE-based inverse problems. Bayesian methods are attractive for uncertainty quantification but assume knowledge of the likelihood model or data generation process. This assumption is difficult to justify in many inverse problems, where the specification of the data generation process is not obvious. We adopt a Gibbs posterior framework that directly posits a regularized variational problem on the space of probability distributions of the parameter. We propose a novel model comparison framework that evaluates the optimality of a given loss based on its "predictive performance". We provide cross-validation procedures to calibrate the regularization parameter of the variational objective and compare multiple loss functions. Some novel theoretical properties of Gibbs posteriors are also presented. We illustrate the utility of our framework via a simulated example, motivated by dispersion-based wave models used to characterize arterial vessels in ultrasound vibrometry.
The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization. Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.