Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool. In this article, we extend SIR to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, named as federated sliced inverse regression (FSIR), facilitates collaborative estimation of the sufficient dimension reduction subspace among multiple clients, solely sharing local estimates to protect sensitive datasets from exposure. To guard against potential adversary attacks, FSIR further employs diverse perturbation strategies, including a novel vectorized Gaussian mechanism that guarantees differential privacy at a low cost of statistical accuracy. Additionally, FSIR naturally incorporates a collaborative variable screening step, enabling effective handling of high-dimensional client data. Theoretical properties of FSIR are established for both low-dimensional and high-dimensional settings, supported by extensive numerical experiments and real data analysis.
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
Post's Correspondence Problem (the PCP) is a classical decision problem in theoretical computer science that asks whether for pairs of free monoid morphisms $g, h\colon\Sigma^*\to\Delta^*$ there exists any non-trivial $x\in\Sigma^*$ such that $g(x)=h(x)$. Post's Correspondence Problem for a group $\Gamma$ takes pairs of group homomorphisms $g, h\colon F(\Sigma)\to \Gamma$ instead, and similarly asks whether there exists an $x$ such that $g(x)=h(x)$ holds for non-elementary reasons. The restrictions imposed on $x$ in order to get non-elementary solutions lead to several interpretations of the problem; we consider the natural restriction asking that $x \notin \ker(g) \cap \ker(h)$ and prove that the resulting interpretation of the PCP is undecidable for arbitrary hyperbolic $\Gamma$, but decidable when $\Gamma$ is virtually nilpotent. We also study this problem for group constructions such as subgroups, direct products and finite extensions. This problem is equivalent to an interpretation due to Myasnikov, Nikolev and Ushakov when one map is injective.
We consider the general class of time-homogeneous stochastic dynamical systems, both discrete and continuous, and study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learn the transfer operator of the system, that in turn can be used for numerous tasks, such as forecasting and interpreting the system dynamics. We show that the search for a good representation can be cast as an optimization problem over neural networks. Our approach is supported by recent results in statistical learning theory, highlighting the role of approximation error and metric distortion in the context of transfer operator regression. The objective function we propose is associated with projection operators from the representation space to the data space, overcomes metric distortion, and can be empirically estimated from data. In the discrete time setting, we further derive a relaxed objective function that is differentiable and numerically well-conditioned. We compare our method against state-of-the-art approaches on different datasets, showing better performance across the board.
This paper concerns about the limiting distributions of change point estimators, in a high-dimensional linear regression time series context, where a regression object $(y_t, X_t) \in \mathbb{R} \times \mathbb{R}^p$ is observed at every time point $t \in \{1, \ldots, n\}$. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in $\ell_2$-norm. We provide limiting distributions of the change point estimators in the regimes where the minimal jump size vanishes and where it remains a constant. We allow for both the covariate and noise sequences to be temporally dependent, in the functional dependence framework, which is the first time seen in the change point inference literature. We show that a block-type long-run variance estimator is consistent under the functional dependence, which facilitates the practical implementation of our derived limiting distributions. We also present a few important byproducts of our analysis, which are of their own interest. These include a novel variant of the dynamic programming algorithm to boost the computational efficiency, consistent change point localisation rates under temporal dependence and a new Bernstein inequality for data possessing functional dependence. Extensive numerical results are provided to support our theoretical results. The proposed methods are implemented in the R package \texttt{changepoints} \citep{changepoints_R}.
What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
We propose a linear algebraic method, rooted in the spectral properties of graphs, that can be used to prove lower bounds in communication complexity. Our proof technique effectively marries spectral bounds with information-theoretic inequalities. The key insight is the observation that, in specific settings, even when data sets $X$ and $Y$ are closely correlated and have high mutual information, the owner of $X$ cannot convey a reasonably short message that maintains substantial mutual information with $Y$. In essence, from the perspective of the owner of $Y$, any sufficiently brief message $m=m(X)$ would appear nearly indistinguishable from a random bit sequence. We employ this argument in several problems of communication complexity. Our main result concerns cryptographic protocols. We establish a lower bound for communication complexity of multi-party secret key agreement with unconditional, i.e., information-theoretic security. Specifically, for one-round protocols (simultaneous messages model) of secret key agreement with three participants we obtain an asymptotically tight lower bound. This bound implies optimality of the previously known omniscience communication protocol (this result applies to a non-interactive secret key agreement with three parties and input data sets with an arbitrary symmetric information profile). We consider communication problems in one-shot scenarios when the parties' inputs are not produced by any i.i.d. sources, and there are no ergodicity assumptions on the input data. In this setting, we found it natural to present our results using the framework of Kolmogorov complexity.
The convergence of the first order Euler scheme and an approximative variant thereof, along with convergence rates, are established for rough differential equations driven by c\`adl\`ag paths satisfying a suitable criterion, namely the so-called Property (RIE), along time discretizations with vanishing mesh size. This property is then verified for almost all sample paths of Brownian motion, It\^o processes, L\'evy processes and general c\`adl\`ag semimartingales, as well as the driving signals of both mixed and rough stochastic differential equations, relative to various time discretizations. Consequently, we obtain pathwise convergence in p-variation of the Euler--Maruyama scheme for stochastic differential equations driven by these processes.
We investigate the frequentist guarantees of the variational sparse Gaussian process regression model. In the theoretical analysis, we focus on the variational approach with spectral features as inducing variables. We derive guarantees and limitations for the frequentist coverage of the resulting variational credible sets. We also derive sufficient and necessary lower bounds for the number of inducing variables required to achieve minimax posterior contraction rates. The implications of these results are demonstrated for different choices of priors. In a numerical analysis we consider a wider range of inducing variable methods and observe similar phenomena beyond the scope of our theoretical findings.
We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.
Run-times of quantum algorithms are often studied via an asymptotic, worst-case analysis. Whilst useful, such a comparison can often fall short: it is not uncommon for algorithms with a large worst-case run-time to end up performing well on instances of practical interest. To remedy this it is necessary to resort to run-time analyses of a more empirical nature, which for sufficiently small input sizes can be performed on a quantum device or a simulation thereof. For larger input sizes, alternative approaches are required. In this paper we consider an approach that combines classical emulation with detailed complexity bounds that include all constants. We simulate quantum algorithms by running classical versions of the sub-routines, whilst simultaneously collecting information about what the run-time of the quantum routine would have been if it were run instead. To do this accurately and efficiently for very large input sizes, we describe an estimation procedure and prove that it obtains upper bounds on the true expected complexity of the quantum algorithms. We apply our method to some simple quantum speedups of classical heuristic algorithms for solving the well-studied MAX-$k$-SAT optimization problem. This requires rigorous bounds (including all constants) on the expected- and worst-case complexities of two important quantum sub-routines: Grover search with an unknown number of marked items, and quantum maximum-finding. These improve upon existing results and might be of broader interest. Amongst other results, we found that the classical heuristic algorithms we studied did not offer significant quantum speedups despite the existence of a theoretical per-step speedup. This suggests that an empirical analysis such as the one we implement in this paper already yields insights beyond those that can be seen by an asymptotic analysis alone.