It is well-known that the approximate factor models have the rotation indeterminacy. It has been considered that the principal component (PC) estimators estimate some rotations of the true factors and factor loadings, but the rotation matrix commonly used in the literature depends on the PC estimator itself. This raises a question: what does the PC estimator consistently estimate? This paper aims to explore the answer. We first show that, assuming a quite general weak factor model with the $r$ signal eigenvalues diverging possibly at different rates, there always exists a unique rotation matrix composed only of the true factors and loadings, such that it rotates the true model to the identifiable model satisfying the standard $r^2$ restrictions. We call the rotated factors and loadings the pseudo-true parameters. We next establish the consistency and asymptotic normality of the PC estimator for this pseudo-true parameter. The results give an answer for the question: the PC estimator consistently estimates the pseudo-true parameter. We also investigate similar problems in the factor augmented regression. Finite sample experiments confirm the excellent approximation of the theoretical results.
Many functions characterising physical systems are additively separable. This is the case, for instance, of mechanical Hamiltonian functions in physics, population growth equations in biology, and consumer preference and utility functions in economics. We consider the scenario in which a surrogate of a function is to be tested for additive separability. The detection that the surrogate is additively separable can be leveraged to improve further learning. Hence, it is beneficial to have the ability to test for such separability in surrogates. The mathematical approach is to test if the mixed partial derivative of the surrogate is zero; or empirically, lower than a threshold. We present and comparatively and empirically evaluate the eight methods to compute the mixed partial derivative of a surrogate function.
Mediation analysis is an important statistical tool in many research fields. Its aim is to investigate the mechanism along the causal pathway between an exposure and an outcome. The joint significance test is widely utilized as a prominent statistical approach for examining mediation effects in practical applications. Nevertheless, the limitation of this mediation testing method stems from its conservative Type I error, which reduces its statistical power and imposes certain constraints on its popularity and utility. The proposed solution to address this gap is the adaptive joint significance test for one mediator, a novel data-adaptive test for mediation effect that exhibits significant advancements compared to traditional joint significance test. The proposed method is designed to be user-friendly, eliminating the need for complicated procedures. We have derived explicit expressions for size and power, ensuring the theoretical validity of our approach. Furthermore, we extend the proposed adaptive joint significance tests for small-scale mediation hypotheses with family-wise error rate (FWER) control. Additionally, a novel adaptive Sobel-type approach is proposed for the estimation of confidence intervals for the mediation effects, demonstrating significant advancements over conventional Sobel's confidence intervals in terms of achieving desirable coverage probabilities. Our mediation testing and confidence intervals procedure is evaluated through comprehensive simulations, and compared with numerous existing approaches. Finally, we illustrate the usefulness of our method by analysing three real-world datasets with continuous, binary and time-to-event outcomes, respectively.
Cognitive diagnosis models have been popularly used in fields such as education, psychology, and social sciences. While parametric likelihood estimation is a prevailing method for fitting cognitive diagnosis models, nonparametric methodologies are attracting increasing attention due to their ease of implementation and robustness, particularly when sample sizes are relatively small. However, existing clustering consistency results of the nonparametric estimation methods often rely on certain restrictive conditions, which may not be easily satisfied in practice. In this article, the clustering consistency of the general nonparametric classification method is reestablished under weaker and more practical conditions.
Local fields, and fields complete with respect to a discrete valuation, are essential objects in commutative algebra, with applications to number theory and algebraic geometry. We formalize in Lean the basic theory of discretely valued fields. In particular, we prove that the unit ball with respect to a discrete valuation on a field is a discrete valuation ring and, conversely, that the adic valuation on the field of fractions of a discrete valuation ring is discrete. We define finite extensions of valuations and of discrete valuation rings, and prove some global-to-local results. Building on this general theory, we formalize the abstract definition and some fundamental properties of local fields. As an application, we show that finite extensions of the field $\mathbb{Q}_p$ of $p$-adic numbers and of the field $\mathbb{F}_p(\!(X)\!)$ of Laurent series over $\mathbb{F}_p$ are local fields.
Quantum low-density parity-check (QLDPC) codes have emerged as a promising technique for quantum error correction. A variety of decoders have been proposed for QLDPC codes and many of them utilize belief propagation (BP) decoding in some fashion. However, the use of BP decoding for degenerate QLDPC codes is known to face issues with convergence. These issues are commonly attributed to short cycles in the Tanner graph and multiple syndrome-matching error patterns due to code degeneracy. Although various methods have been proposed to mitigate the non-convergence issue, such as BP with ordered statistics decoding (BP-OSD) and BP with stabilizer inactivation (BP-SI), achieving better performance with lower complexity remains an active area of research. In this work, we propose to decode QLDPC codes with BP guided decimation (BPGD), which has been previously studied for constraint satisfaction and lossy compression problems. The decimation process is applicable to both binary BP and quaternary BP and involves sequentially freezing the value of the most reliable qubits to encourage BP convergence. Despite its simplicity, we find that BPGD significantly reduces BP failures due to non-convergence while maintaining a low probability of error given convergence, achieving performance on par with BP-OSD and BP-SI. To better understand how and why BPGD improves performance, we discuss several interpretations of BPGD and their connection to BP syndrome decoding.
Many applications, such as optimization, uncertainty quantification and inverse problems, require repeatedly performing simulations of large-dimensional physical systems for different choices of parameters. This can be prohibitively expensive. In order to save computational cost, one can construct surrogate models by expressing the system in a low-dimensional basis, obtained from training data. This is referred to as model reduction. Past investigations have shown that, when performing model reduction of Hamiltonian systems, it is crucial to preserve the symplectic structure associated with the system in order to ensure long-term numerical stability. Up to this point structure-preserving reductions have largely been limited to linear transformations. We propose a new neural network architecture in the spirit of autoencoders, which are established tools for dimension reduction and feature extraction in data science, to obtain more general mappings. In order to train the network, a non-standard gradient descent approach is applied that leverages the differential-geometric structure emerging from the network design. The new architecture is shown to significantly outperform existing designs in accuracy.
Distinguishing the cause and effect from bivariate observational data is the foundational problem that finds applications in many scientific disciplines. One solution to this problem is assuming that cause and effect are generated from a structural causal model, enabling identification of the causal direction after estimating the model in each direction. The heteroscedastic noise model is a type of structural causal model where the cause can contribute to both the mean and variance of the noise. Current methods for estimating heteroscedastic noise models choose the Gaussian likelihood as the optimization objective which can be suboptimal and unstable when the data has a non-Gaussian distribution. To address this limitation, we propose a novel approach to estimating this model with Student's $t$-distribution, which is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape. This adaptability is beneficial for capturing the parameters of the noise distribution in heteroscedastic noise models. Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
Spectral clustering (SC) is a popular clustering technique to find strongly connected communities on a graph. SC can be used in Graph Neural Networks (GNNs) to implement pooling operations that aggregate nodes belonging to the same cluster. However, the eigendecomposition of the Laplacian is expensive and, since clustering results are graph-specific, pooling methods based on SC must perform a new optimization for each new sample. In this paper, we propose a graph clustering approach that addresses these limitations of SC. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.