We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.
A general a posteriori error analysis applies to five lowest-order finite element methods for two fourth-order semi-linear problems with trilinear non-linearity and a general source. A quasi-optimal smoother extends the source term to the discrete trial space, and more importantly, modifies the trilinear term in the stream-function vorticity formulation of the incompressible 2D Navier-Stokes and the von K\'{a}rm\'{a}n equations. This enables the first efficient and reliable a posteriori error estimates for the 2D Navier-Stokes equations in the stream-function vorticity formulation for Morley, two discontinuous Galerkin, $C^0$ interior penalty, and WOPSIP discretizations with piecewise quadratic polynomials.
Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. Motivated by the Health Professionals Follow-up Study (HPFS), we investigate the impact of exposure measurement error on assessing mediation with a survival outcome, based on the Cox proportional hazards outcome model. When the outcome is rare and there is no exposure-mediator interaction, we show that the uncorrected estimators of the natural indirect and direct effects can be biased into either direction, but the uncorrected estimator of the mediation proportion is approximately unbiased as long as the measurement error is not large or the mediator-exposure association is not strong. We develop ordinary regression calibration and risk set regression calibration approaches to correct the exposure measurement error-induced bias when estimating mediation effects and allowing for an exposure-mediator interaction in the Cox outcome model. The proposed approaches require a validation study to characterize the measurement error process. We apply the proposed approaches to the HPFS (1986-2016) to evaluate extent to which reduced body mass index mediates the protective effect of vigorous physical activity on the risk of cardiovascular diseases, and compare the finite-sample properties of the proposed estimators via simulations.
The notion that algorithmic systems should be "transparent" and "explainable" is common in the many statements of consensus principles developed by governments, companies, and advocacy organizations. But what exactly do policy and legal actors want from these technical concepts, and how do their desiderata compare with the explainability techniques developed in the machine learning literature? In hopes of better connecting the policy and technical communities, we provide case studies illustrating five ways in which algorithmic transparency and explainability have been used in policy settings: specific requirements for explanations; in nonbinding guidelines for internal governance of algorithms; in regulations applicable to highly regulated settings; in guidelines meant to increase the utility of legal liability for algorithms; and broad requirements for model and data transparency. The case studies span a spectrum from precise requirements for specific types of explanations to nonspecific requirements focused on broader notions of transparency, illustrating the diverse needs, constraints, and capacities of various policy actors and contexts. Drawing on these case studies, we discuss promising ways in which transparency and explanation could be used in policy, as well as common factors limiting policymakers' use of algorithmic explainability. We conclude with recommendations for researchers and policymakers.
Transition amplitudes and transition probabilities are relevant to many areas of physics simulation, including the calculation of response properties and correlation functions. These quantities can also be related to solving linear systems of equations. Here we present three related algorithms for calculating transition probabilities. First, we extend a previously published short-depth algorithm, allowing for the two input states to be non-orthogonal. Building on this first procedure, we then derive a higher-depth algorithm based on Trotterization and Richardson extrapolation that requires fewer circuit evaluations. Third, we introduce a tunable algorithm that allows for trading off circuit depth and measurement complexity, yielding an algorithm that can be tailored to specific hardware characteristics. Finally, we implement proof-of-principle numerics for models in physics and chemistry and for a subroutine in variational quantum linear solving (VQLS). The primary benefits of our approaches are that (a) arbitrary non-orthogonal states may now be used with small increases in quantum resources, (b) we (like another recently proposed method) entirely avoid subroutines such as the Hadamard test that may require three-qubit gates to be decomposed, and (c) in some cases fewer quantum circuit evaluations are required as compared to the previous state-of-the-art in NISQ algorithms for transition probabilities.
Iterative refinement (IR) is a popular scheme for solving a linear system of equations based on gradually improving the accuracy of an initial approximation. Originally developed to improve upon the accuracy of Gaussian elimination, interest in IR has been revived because of its suitability for execution on fast low-precision hardware such as analog devices and graphics processing units. IR generally converges when the error associated with the solution method is small, but is known to diverge when this error is large. We propose and analyze a novel enhancement to the IR algorithm by adding a line search optimization step that guarantees the algorithm will not diverge. Numerical experiments verify our theoretical results and illustrate the effectiveness of our proposed scheme.
Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: //github.com/xuxiran/ASAD_DenseNet
This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM$_{2.5}$ concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM$_{2.5}$ concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates, extending beyond the scope of previous works that only focused on Stochastic Gradient Descent (SGD). Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds when the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step for developing practical generalization theories.
We propose reinforcement learning to control the dynamical self-assembly of the dodecagonal quasicrystal (DDQC) from patchy particles. The patchy particles have anisotropic interactions with other particles and form DDQC. However, their structures at steady states are significantly influenced by the kinetic pathways of their structural formation. We estimate the best policy of temperature control trained by the Q-learning method and demonstrate that we can generate DDQC with few defects using the estimated policy. The temperature schedule obtained by reinforcement learning can reproduce the desired structure more efficiently than the conventional pre-fixed temperature schedule, such as annealing. To clarify the success of the learning, we also analyse a simple model describing the kinetics of structural changes through the motion in a triple-well potential. We have found that reinforcement learning autonomously discovers the critical temperature at which structural fluctuations enhance the chance of forming a globally stable state. The estimated policy guides the system toward the critical temperature to assist the formation of DDQC.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.