An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
We consider two-step estimation of latent variable models, in which just the measurement model is estimated in the first step and the measurement parameters are then fixed at their estimated values in the second step where the structural model is estimated. We show how this approach can be implemented for latent trait models (item response theory models) where the latent variables are continuous and their measurement indicators are categorical variables. The properties of two-step estimators are examined using simulation studies and applied examples. They perform well, and have attractive practical and conceptual properties compared to the alternative one-step and three-step approaches. These results are in line with previous findings for other families of latent variable models. This provides strong evidence that two-step estimation is a flexible and useful general method of estimation for different types of latent variable models.
We introduce a general-purpose univariate signal deconvolution method based on the principles of an approach to Artificial General Intelligence. This approach is based on a generative model that combines information theory and algorithmic probability that required a large calculation of an estimation of a `universal distribution' to build a general-purpose model of models independent of probability distributions. This was used to investigate how non-random data may encode information about the physical properties such as dimension and length scales in which a signal or message may have been originally encoded, embedded, or generated. This multidimensional space reconstruction method is based on information theory and algorithmic probability, and it is agnostic, but not independent, with respect to the chosen computable or semi-computable approximation method or encoding-decoding scheme. The results presented in this paper are useful for applications in coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages sent by generating sources of unknown nature for which no prior knowledge is available. We argue that this can have strong potential for cryptography, signal processing, causal deconvolution, life, and techno signature detection.
We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. More specifically, for $\sigma$-sub-Gaussian label noise, our analysis provides a regret upper bound of $O(\sigma^2 d \log T) + o(\log T)$, where $d$ is the dimension of the input vector, $T$ is the total number of rounds. We also prove a $\Omega(\sigma^2d\log(T/d))$ lower bound for stochastic online linear regression, which indicates that our upper bound is nearly optimal. In addition, we extend our analysis to a more refined Bernstein noise condition. As an application, we study generalized linear bandits with heteroscedastic noise and propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
Causal discovery from observational data is a very challenging, often impossible, task. However, estimating the causal structure is possible under certain assumptions on the data-generating process. Many commonly used methods rely on the additivity of the noise in the structural equation models. Additivity implies that the variance or the tail of the effect, given the causes, is invariant; the cause only affects the mean. In many applications, it is desirable to model the tail or other characteristics of the random variable since they can provide different information about the causal structure. However, models for causal inference in such cases have received only very little attention. It has been shown that the causal graph is identifiable under different models, such as linear non-Gaussian, post-nonlinear, or quadratic variance functional models. We introduce a new class of models called the Conditional Parametric Causal Models (CPCM), where the cause affects the effect in some of the characteristics of interest.We use the concept of sufficient statistics to show the identifiability of the CPCM models, focusing mostly on the exponential family of conditional distributions.We also propose an algorithm for estimating the causal structure from a random sample under CPCM. Its empirical properties are studied for various data sets, including an application on the expenditure behavior of residents of the Philippines.
Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.
This paper proposes novel inferential procedures for the network Granger causality in high-dimensional vector autoregressive models. In particular, we offer two multiple testing procedures designed to control discovered networks' false discovery rate (FDR). The first procedure is based on the limiting normal distribution of the $t$-statistics constructed by the debiased lasso estimator. The second procedure is based on the bootstrap distributions of the $t$-statistics made by imposing the null hypotheses. Their theoretical properties, including FDR control and power guarantee, are investigated. The finite sample evidence suggests that both procedures can successfully control the FDR while maintaining high power. Finally, the proposed methods are applied to discovering the network Granger causality in a large number of macroeconomic variables and regional house prices in the UK.
This paper proposes a computational framework for the design optimization of stable structures under large deformations by incorporating nonlinear buckling constraints. A novel strategy for suppressing spurious buckling modes related to low-density elements is proposed. The strategy depends on constructing a pseudo-mass matrix that assigns small pseudo masses for DOFs surrounded by only low-density elements and degenerates to an identity matrix for the solid region. A novel optimization procedure is developed that can handle both simple and multiple eigenvalues wherein consistent sensitivities of simple eigenvalues and directional derivatives of multiple eigenvalues are derived and utilized in a gradient-based optimization algorithm - the method of moving asymptotes. An adaptive linear energy interpolation method is also incorporated in nonlinear analyses to handle the low-density elements distortion under large deformations. The numerical results demonstrate that, for systems with either low or high symmetries, the nonlinear stability constraints can ensure structural stability at the target load under large deformations. Post-analysis on the B-spline fitted designs shows that the safety margin, i.e., the gap between the target load and the 1st critical load, of the optimized structures can be well controlled by selecting different stability constraint values. Interesting structural behaviors such as mode switching and multiple bifurcations are also demonstrated.
We consider identification and inference for the average treatment effect and heterogeneous treatment effect conditional on observable covariates in the presence of unmeasured confounding. Since point identification of these treatment effects is not achievable without strong assumptions, we obtain bounds on these treatment effects by leveraging differential effects, a tool that allows for using a second treatment to learn the effect of the first treatment. The differential effect is the effect of using one treatment in lieu of the other. We provide conditions under which differential treatment effects can be used to point identify or partially identify treatment effects. Under these conditions, we develop a flexible and easy-to-implement semi-parametric framework to estimate bounds and establish asymptotic properties over the support for conducting statistical inference. The proposed method is examined through a simulation study and two case studies that investigate the effect of smoking on the blood level of lead and cadmium using the National Health and Nutrition Examination Survey, and the effect of soft drink consumption on the occurrence of physical fights in teenagers using the Youth Risk Behavior Surveillance System.
Learning disentanglement aims at finding a low dimensional representation which consists of multiple explanatory and generative factors of the observational data. The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations. However, in real scenarios, factors with semantics are not necessarily independent. Instead, there might be an underlying causal structure which renders these factors dependent. We thus propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent exogenous factors into causal endogenous ones that correspond to causally related concepts in data. We further analyze the model identifiabitily, showing that the proposed model learned from observations recovers the true one up to a certain degree. Experiments are conducted on various datasets, including synthetic and real word benchmark CelebA. Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy. Furthermore, we demonstrate that the proposed CausalVAE model is able to generate counterfactual data through "do-operation" to the causal factors.