A Krylov subspace recycling method for the efficient evaluation of a sequence of matrix functions acting on a set of vectors is developed. The method improves over the recycling methods presented in [Burke et al., arXiv:2209.14163, 2022] in that it uses a closed-form expression for the augmented FOM approximants and hence circumvents the use of numerical quadrature. We further extend our method to use randomized sketching in order to avoid the arithmetic cost of orthogonalizing a full Krylov basis, offering an attractive solution to the fact that recycling algorithms built from shifted augmented FOM cannot easily be restarted. The efficacy of the proposed algorithms is demonstrated with numerical experiments.
Class incremental learning (CIL) is a challenging setting of continual learning, which learns a series of tasks sequentially. Each task consists of a set of unique classes. The key feature of CIL is that no task identifier (or task-id) is provided at test time for each test sample. Predicting the task-id for each test sample is a challenging problem. An emerging theoretically justified and effective approach is to train a task-specific model for each task in a shared network for all tasks based on a task-incremental learning (TIL) method to deal with forgetting. The model for each task in this approach is an out-of-distribution (OOD) detector rather than a conventional classifier. The OOD detector can perform both within-task (in-distribution (IND)) class prediction and OOD detection. The OOD detection capability is the key for task-id prediction during inference for each test sample. However, this paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information (e.g., the replay data and the learned tasks) available in CIL can be exploited to design a better and principled method for task-id prediction. We call the new method TPLR (Task-id Prediction based on Likelihood Ratio}). TPLR markedly outperforms strong CIL baselines.
While the design of blind image quality assessment (IQA) algorithms has improved significantly, the distribution shift between the training and testing scenarios often leads to a poor performance of these methods at inference time. This motivates the study of test time adaptation (TTA) techniques to improve their performance at inference time. Existing auxiliary tasks and loss functions used for TTA may not be relevant for quality-aware adaptation of the pre-trained model. In this work, we introduce two novel quality-relevant auxiliary tasks at the batch and sample levels to enable TTA for blind IQA. In particular, we introduce a group contrastive loss at the batch level and a relative rank loss at the sample level to make the model quality aware and adapt to the target data. Our experiments reveal that even using a small batch of images from the test distribution helps achieve significant improvement in performance by updating the batch normalization statistics of the source model.
The factor graph decentralized data fusion (FG-DDF) framework was developed for the analysis and exploitation of conditional independence in {heterogeneous Bayesian decentralized fusion problems, in which robots update and fuse pdfs over different, but overlapping subsets of random states. This allows robots to efficiently use smaller probabilistic models and sparse message passing to accurately and scalably fuse relevant local parts of a larger global joint state pdf while accounting for data dependencies between robots. Whereas prior work required limiting assumptions about network connectivity and model linearity, this paper relaxes these to explore the applicability and robustness of FG-DDF in more general settings. We develop a new heterogeneous fusion rule which generalizes the homogeneous covariance intersection algorithm for such cases and test it in multi-robot tracking and localization scenarios with non-linear motion/observation models under communication dropouts. Simulation and hardware experiments show that, in practice, the FG-DDF continues to provide consistent filtered estimates under these more practical operating conditions, while reducing computation and communication costs by more than 99\%, thus enabling the design of scalable real-world multi-robot systems.
Federated Learning (FL) is a decentralized machine learning framework that enables collaborative model training while respecting data privacy. In various applications, non-uniform availability or participation of users is unavoidable due to an adverse or stochastic environment, the latter often being uncontrollable during learning. Here, we posit a generic user selection mechanism implementing a possibly randomized, stationary selection policy, suggestively termed as a Random Access Model (RAM). We propose a new formulation of the FL problem which effectively captures and mitigates limited participation of data originating from infrequent, or restricted users, at the presence of a RAM. By employing the Conditional Value-at-Risk (CVaR) over the (unknown) RAM distribution, we extend the expected loss FL objective to a risk-aware objective, enabling the design of an efficient training algorithm that is completely oblivious to the RAM, and with essentially identical complexity as FedAvg. Our experiments on synthetic and benchmark datasets show that the proposed approach achieves significantly improved performance as compared with standard FL, under a variety of setups.
The all pairs shortest path problem (APSP) is one of the foundational problems in computer science. For weighted dense graphs on $n$ vertices, no truly sub-cubic algorithms exist to compute APSP exactly even for undirected graphs. This is popularly known as the APSP conjecture and has played a prominent role in developing the field of fine-grained complexity. The seminal result of Seidel uses fast matrix multiplication (FMM) to compute APSP on unweighted undirected graphs exactly in $\tilde{O}(n^{\omega})$ time, where $\omega=2.372$. Even for unweighted undirected graphs, it is not possible to obtain a $(2-\epsilon)$-approximation of APSP in $o(n^\omega)$ time. In this paper, we provide a multitude of new results for multiplicative and additive approximations of APSP in undirected graphs for both unweighted and weighted cases. We provide new algorithms for multiplicative 2-approximation of unweighted graphs: a deterministic one that runs in $\tilde{O}(n^{2.072})$ time and a randomized one that runs in $\tilde{O}(n^{2.032})$ on expectation improving upon the best known bound of $\tilde{O}(n^{2.25})$ by Roditty (STOC, 2023). For $2$-approximating paths of length $\geq k$, $k \geq 4$, we provide the first improvement after Dor, Halperin, Zwick (2000) for dense graphs even just using combinatorial methods, and then improve it further using FMM. We next consider additive approximations, and provide improved bounds for all additive $\beta$-approximations, $\beta \geq 4$. For weighted graphs, we show that by allowing small additive errors along with an $(1+\epsilon)$-multiplicative approximation, it is possible to improve upon Zwick's $\tilde{O}(n^\omega)$ algorithm. Our results point out the crucial role that FMM can play even on approximating APSP on unweighted undirected graphs, and reveal new bottlenecks towards achieving a quadratic running time to approximate APSP.
Emerging from the monolithic pairwise attention mechanism in conventional Transformer models, there is a growing interest in leveraging sparse interactions that align more closely with biological principles. Approaches including the Set Transformer and the Perceiver employ cross-attention consolidated with a latent space that forms an attention bottleneck with limited capacity. Building upon recent neuroscience studies of Global Workspace Theory and associative memory, we propose the Associative Transformer (AiT). AiT induces low-rank explicit memory that serves as both priors to guide bottleneck attention in the shared workspace and attractors within associative memory of a Hopfield network. Through joint end-to-end training, these priors naturally develop module specialization, each contributing a distinct inductive bias to form attention bottlenecks. A bottleneck can foster competition among inputs for writing information into the memory. We show that AiT is a sparse representation learner, learning distinct priors through the bottlenecks that are complexity-invariant to input quantities and dimensions. AiT demonstrates its superiority over methods such as the Set Transformer, Vision Transformer, and Coordination in various vision tasks.
In random sample consensus (RANSAC), the problem of ellipsoid fitting can be formulated as a problem of minimization of point-to-model distance, which is realized by maximizing model score. Hence, the performance of ellipsoid fitting is affected by distance metric. In this paper, we proposed a novel distance metric called the axial distance, which is converted from the algebraic distance by introducing a scaling factor to solve nongeometric problems of the algebraic distance. There is complementarity between the axial distance and Sampson distance because their combination is a stricter metric when calculating the model score of sample consensus and the weight of the weighted least squares (WLS) fitting. Subsequently, a novel sample-consensus-based ellipsoid fitting method is proposed by using the combination between the axial distance and Sampson distance (CAS). We compare the proposed method with several representative fitting methods through experiments on synthetic and real datasets. The results show that the proposed method has a higher robustness against outliers, consistently high accuracy, and a speed close to that of the method based on sample consensus.
Virtual element methods (VEMs) without extrinsic stabilization in arbitrary degree of polynomial are developed for second order elliptic problems, including a nonconforming VEM and a conforming VEM in arbitrary dimension. The key is to construct local $H(\textrm{div})$-conforming macro finite element spaces such that the associated $L^2$ projection of the gradient of virtual element functions is computable, and the $L^2$ projector has a uniform lower bound on the gradient of virtual element function spaces in $L^2$ norm. Optimal error estimates are derived for these VEMs. Numerical experiments are provided to test the VEMs without extrinsic stabilization.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.