Ruzsa's equivalence theorem provided a framework for converting certain families of inequalities in additive combinatorics to entropic inequalities (which sometimes did not possess stand-alone entropic proofs). In this work, we first establish formal equivalences between some families (different from Ruzsa) of inequalities in additive combinatorics and entropic ones. Secondly, we provide stand-alone entropic proofs for some previously known entropic inequalities established via Ruzsa's equivalence theorem. As a first step to further these equivalences, we establish an information-theoretic characterization of the magnification ratio that could also be of independent interest.
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. The difficulty of reliably capturing geometric context in existing methods impedes their ability to accurately enforce the consistency between the different geometric properties, thereby leading to a bottleneck of geometric estimation quality. We therefore propose the Adaptive Surface Normal (ASN) constraint, a simple yet efficient method. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. By dynamically determining reliable local geometry from randomly sampled candidates, we establish a surface normal constraint, where the validity of these candidates is evaluated using the geometric context. Furthermore, our normal estimation leverages the geometric context to prioritize regions that exhibit significant geometric variations, which makes the predicted normals accurately capture intricate and detailed geometric information. Through the integration of geometric context, our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images. We validate the superiority of our approach over state-of-the-art methods through extensive evaluations and comparisons on diverse indoor and outdoor datasets, showcasing its efficiency and robustness.
This paper studies the principal components (PC) estimator for high dimensional approximate factor models with weak factors in that the factor loading ($\boldsymbol{\Lambda}^0$) scales sublinearly in the number $N$ of cross-section units, i.e., $\boldsymbol{\Lambda}^{0\top} \boldsymbol{\Lambda}^0 / N^\alpha$ is positive definite in the limit for some $\alpha \in (0,1)$. While the consistency and asymptotic normality of these estimates are by now well known when the factors are strong, i.e., $\alpha=1$, the statistical properties for weak factors remain less explored. Here, we show that the PC estimator maintains consistency and asymptotical normality for any $\alpha\in(0,1)$, provided suitable conditions regarding the dependence structure in the noise are met. This complements earlier result by Onatski (2012) that the PC estimator is inconsistent when $\alpha=0$, and the more recent work by Bai and Ng (2023) who established the asymptotic normality of the PC estimator when $\alpha \in (1/2,1)$. Our proof strategy integrates the traditional eigendecomposition-based approach for factor models with leave-one-out analysis similar in spirit to those used in matrix completion and other settings. This combination allows us to deal with factors weaker than the former and at the same time relax the incoherence and independence assumptions often associated with the later.
The synthesis problem asks to automatically generate, if it exists, an algorithm from a specification of correct input-output pairs. In this paper, we consider the synthesis of computable functions of infinite words, for a classical Turing computability notion over infinite inputs. We consider specifications which are rational relations of infinite words, i.e., specifications defined non-deterministic parity transducers. We prove that the synthesis problem of computable functions from rational specifications is undecidable. We provide an incomplete but sound reduction to some parity game, such that if Eve wins the game, then the rational specification is realizable by a computable function. We prove that this function is even computable by a deterministic two-way transducer. We provide a sufficient condition under which the latter game reduction is complete. This entails the decidability of the synthesis problem of computable functions, which we prove to be ExpTime-complete, for a large subclass of rational specifications, namely deterministic rational specifications. This subclass contains the class of automatic relations over infinite words, a yardstick in reactive synthesis.
A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.
We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baseline hazard function in the likelihood function. Not only do the bundled parameters cause immense numerical difficulties, but they also result in new challenges in theoretical development. By developing a general theoretical framework, we overcome the challenges arising from the bundled parameters and derive the convergence rate of the proposed estimator. Furthermore, we prove that the finite-dimensional estimator is $\sqrt{n}$-consistent, asymptotically normal and achieves the semiparametric information bound. The proposed inference procedures are evaluated by extensive simulation studies and illustrated with an application to the sequential organ failure assessment data from the Improving Care of Acute Lung Injury Patients study.
Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery from time series data originating from a mixture of causal models. We propose a general variational inference-based framework called MCD to infer the underlying causal models as well as the mixing probability of each sample. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood. We present two variants: MCD-Linear for linear relationships and independent noise, and MCD-Nonlinear for nonlinear causal relationships and history-dependent noise. We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks through extensive experimentation on synthetic and real-world datasets, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.
This study investigates the asymptotic dynamics of alternating minimization applied to optimize a bilinear non-convex function with normally distributed covariates. We employ the replica method from statistical physics in a multi-step approach to precisely trace the algorithm's evolution. Our findings indicate that the dynamics can be described effectively by a two--dimensional discrete stochastic process, where each step depends on all previous time steps, revealing a memory dependency in the procedure. The theoretical framework developed in this work is broadly applicable for the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.
Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.