Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to design novel algorithms that perform well in practice. In this paper, we systematically survey the types of theoretical analysis techniques that have been applied to edit distance and evaluate the extent to which each one has achieved these two goals. These techniques include traditional worst-case analysis, worst-case analysis parametrized by edit distance or entropy or compressibility, average-case analysis, semi-random models, and advice-based models. We find that the track record is mixed. On one hand, two algorithms widely used in practice have been born out of theoretical analysis and their empirical performance is captured well by theoretical predictions. On the other hand, all the algorithms developed using theoretical analysis as a yardstick since then have not had any practical relevance. We conclude by discussing the remaining open problems and how they can be tackled.
In this article, we propose a new class of consistent tests for $p$-variate normality. These tests are based on the characterization of the standard multivariate normal distribution, that the Hessian of the corresponding cumulant generating function is identical to the $p\times p$ identity matrix and the idea of decomposing the information from the joint distribution into the dependence copula and all marginal distributions. Under the null hypothesis of multivariate normality, our proposed test statistic is independent of the unknown mean vector and covariance matrix so that the distribution-free critical value of the test can be obtained by Monte Carlo simulation. We also derive the asymptotic null distribution of proposed test statistic and establish the consistency of the test against different fixed alternatives. Last but not least, a comprehensive and extensive Monte Carlo study also illustrates that our test is a superb yet computationally convenient competitor to many well-known existing test statistics.
In this short note, I derive the Bell-CHSH inequalities as an elementary result in the present-day theory of statistical causality based on graphical models or Bayes' nets, defined in terms of DAGs (Directed Acyclic Graphs) representing direct statistical causal influences between a number of observed and unobserved random variables. I show how spatio-temporal constraints in loophole-free Bell experiments, and natural classical statistical causality considerations, lead to Bell's notion of local hidden variables, and thence to the CHSH inequalities. The word "local" applies to the way that the chosen settings influence the observed outcomes. The case of contextual setting-dependent hidden variables (thought of as being located in the measurement devices and dependent on the measurement settings) is automatically covered, despite recent claims that Bell's conclusions can be circumvented in this way.
In computational biology, $k$-mers and edit distance are fundamental concepts. However, little is known about the metric space of all $k$-mers equipped with the edit distance. In this work, we explore the structure of the $k$-mer space by studying its maximal independent sets (MISs). An MIS is a sparse sketch of all $k$-mers with nice theoretical properties, and therefore admits critical applications in clustering, indexing, hashing, and sketching large-scale sequencing data, particularly those with high error-rates. Finding an MIS is a challenging problem, as the size of a $k$-mer space grows geometrically with respect to $k$. We propose three algorithms for this problem. The first and the most intuitive one uses a greedy strategy. The second method implements two techniques to avoid redundant comparisons by taking advantage of the locality-property of the $k$-mer space and the estimated bounds on the edit distance. The last algorithm avoids expensive calculations of the edit distance by translating the edit distance into the shortest path in a specifically designed graph. These algorithms are implemented and the calculated MISs of $k$-mer spaces and their statistical properties are reported and analyzed for $k$ up to 15. Source code is freely available at //github.com/Shao-Group/kmerspace .
Vision Transformers (ViTs) with self-attention modules have recently achieved great empirical success in many vision tasks. Due to non-convex interactions across layers, however, theoretical learning and generalization analysis is mostly elusive. Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. We characterize the sample complexity to achieve a zero generalization error. Our sample complexity bound is positively correlated with the inverse of the fraction of label-relevant tokens, the token noise level, and the initial model error. We also prove that a training process using stochastic gradient descent (SGD) leads to a sparse attention map, which is a formal verification of the general intuition about the success of attention. Moreover, this paper indicates that a proper token sparsification can improve the test performance by removing label-irrelevant and/or noisy tokens, including spurious correlations. Empirical experiments on synthetic data and CIFAR-10 dataset justify our theoretical results and generalize to deeper ViTs.
Two genomes over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length and paths. Then, the breakpoint distance is equal to n - (c_2 + p_0/2), where n is the number of genes, c_2 is the number of cycles of length 2 and p_0 is the number of paths of length 0. Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance is n - (c + p_e/2), where c is the total number of cycles and p_e is the total number of even paths. The distance formulation is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance. Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider the {\sigma}_k distance, defined to be n - [c_2 + c_4 + ... + c_k + (p_0 + p_2 + ... +p_k)/2], and increasingly investigate the complexities of median and double distance for the {\sigma}_4 distance, then the {\sigma}_6 distance, and so on. While for the median much effort was done in our and in other research groups but no progress was obtained even for the {\sigma}_4 distance, for solving the double distance under {\sigma}_4 and {\sigma}_6 distances we could devise linear time algorithms, which we present here.
Quadratization of polynomial and nonpolynomial systems of ordinary differential equations is advantageous in a variety of disciplines, such as systems theory, fluid mechanics, chemical reaction modeling and mathematical analysis. A quadratization reveals new variables and structures of a model, which may be easier to analyze, simulate, control, and provides a convenient parametrization for learning. This paper presents novel theory, algorithms and software capabilities for quadratization of non-autonomous ODEs. We provide existence results, depending on the regularity of the input function, for cases when a quadratic-bilinear system can be obtained through quadratization. We further develop existence results and an algorithm that generalizes the process of quadratization for systems with arbitrary dimension that retain the nonlinear structure when the dimension grows. For such systems, we provide dimension-agnostic quadratization. An example is semi-discretized PDEs, where the nonlinear terms remain symbolically identical when the discretization size increases. As an important aspect for practical adoption of this research, we extended the capabilities of the QBee software towards both non-autonomous systems of ODEs and ODEs with arbitrary dimension. We present several examples of ODEs that were previously reported in the literature, and where our new algorithms find quadratized ODE systems with lower dimension than the previously reported lifting transformations. We further highlight an important area of quadratization: reduced-order model learning. This area can benefit significantly from working in the optimal lifting variables, where quadratic models provide a direct parametrization of the model that also avoids additional hyperreduction for the nonlinear terms. A solar wind example highlights these advantages.
The Wasserstein distance between mixing measures has come to occupy a central place in the statistical analysis of mixture models. This work proposes a new canonical interpretation of this distance and provides tools to perform inference on the Wasserstein distance between mixing measures in topic models. We consider the general setting of an identifiable mixture model consisting of mixtures of distributions from a set $\mathcal{A}$ equipped with an arbitrary metric $d$, and show that the Wasserstein distance between mixing measures is uniquely characterized as the most discriminative convex extension of the metric $d$ to the set of mixtures of elements of $\mathcal{A}$. The Wasserstein distance between mixing measures has been widely used in the study of such models, but without axiomatic justification. Our results establish this metric to be a canonical choice. Specializing our results to topic models, we consider estimation and inference of this distance. Though upper bounds for its estimation have been recently established elsewhere, we prove the first minimax lower bounds for the estimation of the Wasserstein distance in topic models. We also establish fully data-driven inferential tools for the Wasserstein distance in the topic model context. Our results apply to potentially sparse mixtures of high-dimensional discrete probability distributions. These results allow us to obtain the first asymptotically valid confidence intervals for the Wasserstein distance in topic models.
Blind source separation (BSS) aims to recover an unobserved signal $S$ from its mixture $X=f(S)$ under the condition that the effecting transformation $f$ is invertible but unknown. As this is a basic problem with many practical applications, a fundamental issue is to understand how the solutions to this problem behave when their supporting statistical prior assumptions are violated. In the classical context of linear mixtures, we present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$. Modelling $S$ as a multidimensional stochastic process, we introduce an informative topology on the space of possible causes underlying a mixture $X$, and show that the behaviour of a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees with respect to this topology. This allows for a flexible and convenient quantification of general model uncertainty scenarios and amounts to the first comprehensive robustness framework for BSS. Our approach is entirely constructive, and we demonstrate its utility with novel theoretical guarantees for a number of statistical applications.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at //github.com/gitgiter/Graph-Adversarial-Learning.