Horn-satisfiability or Horn-SAT is the problem of deciding whether a satisfying assignment exists for a Horn formula, a conjunction of clauses each with at most one positive literal (also known as Horn clauses). It is a well-known P-complete problem, which implies that unless P = NC, it is a hard problem to parallelize. In this paper, we empirically show that, under a known simple random model for generating the Horn formula, the ratio of hard-to-parallelize instances (closer to the worst-case behavior) is infinitesimally small. We show that the depth of a parallel algorithm for Horn-SAT is polylogarithmic on average, for almost all instances, while keeping the work linear. This challenges theoreticians and programmers to look beyond worst-case analysis and come up with practical algorithms coupled with respective performance guarantees.
We present an extended validation of semi-analytical, semi-empirical covariance matrices for the two-point correlation function (2PCF) on simulated catalogs representative of Luminous Red Galaxies (LRG) data collected during the initial two months of operations of the Stage-IV ground-based Dark Energy Spectroscopic Instrument (DESI). We run the pipeline on multiple extended Zel'dovich (EZ) mock galaxy catalogs with the corresponding cuts applied and compare the results with the mock sample covariance to assess the accuracy and its fluctuations. We propose an extension of the previously developed formalism for catalogs processed with standard reconstruction algorithms. We consider methods for comparing covariance matrices in detail, highlighting their interpretation and statistical properties caused by sample variance, in particular, nontrivial expectation values of certain metrics even when the external covariance estimate is perfect. With improved mocks and validation techniques, we confirm a good agreement between our predictions and sample covariance. This allows one to generate covariance matrices for comparable datasets without the need to create numerous mock galaxy catalogs with matching clustering, only requiring 2PCF measurements from the data itself. The code used in this paper is publicly available at //github.com/oliverphilcox/RascalC.
Approximating significance scans of searches for new particles in high-energy physics experiments as Gaussian fields is a well-established way to estimate the trials factors required to quantify global significances. We propose a novel, highly efficient method to estimate the covariance matrix of such a Gaussian field. The method is based on the linear approximation of statistical fluctuations of the signal amplitude. For one-dimensional searches the upper bound on the trials factor can then be calculated directly from the covariance matrix. For higher dimensions, the Gaussian process described by this covariance matrix may be sampled to calculate the trials factor directly. This method also serves as the theoretical basis for a recent study of the trials factor with an empirically constructed set of Asmiov-like background datasets. We illustrate the method with studies of a $H \rightarrow \gamma \gamma$ inspired model that was used in the empirical paper.
We seek to extract a small number of representative scenarios from large and high-dimensional panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal picks important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute, and lend themselves to consistent scenario-based modeling and high-dimensional numerical integration. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.
Benaloh challenge allows the voter to audit the encryption of her vote, and in particular to check whether the vote has been represented correctly. An interesting analysis of the mechanism has been presented by Culnane and Teague. The authors propose a natural game-theoretic model of the interaction between the voter and a corrupt, malicious encryption device. Then, they claim that there is no "natural" rational strategy for the voter to play the game. In consequence, the authorities cannot provide the voter with a sensible auditing strategy, which undermines the whole idea. Here, we claim the contrary, i.e., that there exist simple rational strategies that justify the usefulness of Benaloh challenge.
A computational framework is presented to numerically simulate the effects of antihypertensive drugs, in particular calcium channel blockers, on the mechanical response of arterial walls. A stretch-dependent smooth muscle model by Uhlmann and Balzani is modified to describe the interaction of pharmacological drugs and the inhibition of smooth muscle activation. The coupled deformation-diffusion problem is then solved using the finite element software FEDDLib and overlapping Schwarz preconditioners from the Trilinos package FROSch. These preconditioners include highly scalable parallel GDSW (generalized Dryja-Smith-Widlund) and RDSW (reduced GDSW) preconditioners. Simulation results show the expected increase in the lumen diameter of an idealized artery due to the drug-induced reduction of smooth muscle contraction, as well as a decrease in the rate of arterial contraction in the presence of calcium channel blockers. Strong and weak parallel scalability of the resulting computational implementation are also analyzed.
Elementary trapping sets (ETSs) are the main culprits for the performance of LDPC codes in the error floor region. Due to the large quantity, complex structures, and computational difficulties of ETSs, how to eliminate dominant ETSs in designing LDPC codes becomes a pivotal issue to improve the error floor behavior. In practice, researchers commonly address this problem by avoiding some special graph structures to free specific ETSs in Tanner graph. In this paper, we deduce the accurate Tur\'an number of $\theta(1,2,2)$ and prove that all $(a,b)$-ETSs in Tanner graph with variable-regular degree $d_L(v)=\gamma$ must satisfy the bound $b\geq a\gamma-\frac{1}{2}a^2$, which improves the lower bound obtained by Amirzade when the girth is 6. For the case of girth 8, by limiting the relation between any two 8-cycles in the Tanner graph, we prove a similar inequality $b\geq a\gamma-\frac{a(\sqrt{8a-7}-1)}{2}$. The simulation results show that the designed codes have good performance with lower error floor over additive white Gaussian noise channels.
Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.
Few-shot learning (FSL) has emerged as an effective learning method and shows great potential. Despite the recent creative works in tackling FSL tasks, learning valid information rapidly from just a few or even zero samples still remains a serious challenge. In this context, we extensively investigated 200+ latest papers on FSL published in the past three years, aiming to present a timely and comprehensive overview of the most recent advances in FSL along with impartial comparisons of the strengths and weaknesses of the existing works. For the sake of avoiding conceptual confusion, we first elaborate and compare a set of similar concepts including few-shot learning, transfer learning, and meta-learning. Furthermore, we propose a novel taxonomy to classify the existing work according to the level of abstraction of knowledge in accordance with the challenges of FSL. To enrich this survey, in each subsection we provide in-depth analysis and insightful discussion about recent advances on these topics. Moreover, taking computer vision as an example, we highlight the important application of FSL, covering various research hotspots. Finally, we conclude the survey with unique insights into the technology evolution trends together with potential future research opportunities in the hope of providing guidance to follow-up research.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.