We revisit the sample and computational complexity of completing a rank-1 tensor in $\otimes_{i=1}^{N} \mathbb{R}^{d}$, given a uniformly sampled subset of its entries. We present a characterization of the problem (i.e. nonzero entries) which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems. For example, when $N = \Theta(1)$, we prove it uses no more than $m = O(d^2 \log d)$ samples and runs in $O(md^2)$ time. Moreover, we show any algorithm requires $\Omega(d\log d)$ samples. By contrast, existing upper bounds on the sample complexity are at least as large as $d^{1.5} \mu^{\Omega(1)} \log^{\Omega(1)} d$, where $\mu$ can be $\Theta(d)$ in the worst case. Prior work obtained these looser guarantees in higher rank versions of our problem, and tend to involve more complicated algorithms.
The computation of off-diagonal blocks of matrix functions $f(T)$, where $T$ is block triangular, poses a challenging problem in scientific computing. We present a novel algorithm that exploits the structure of block triangular matrices, generalizing the algorithm of Al-Mohy and Higham for computing the Fr\'echet derivative of the matrix exponential. This work has significant applications in fields such as exponential integrators for solving systems of first-order differential equations, Hamiltonian linear systems in control theory, and option pricing in finance. Our approach introduces a linear operator that maps off-diagonal blocks of $T$ into their counterparts in $f(T)$. By studying the algebraic properties of the operator, we establish a comprehensive computational framework, paving the way to extend existing Fr\'echet derivative algorithms of matrix functions to more general settings. For the matrix exponential, in particular, the algorithm employs the scaling and squaring method with diagonal Pad\'e approximants to $\exp(x)$, with parameters chosen based on a rigorous backward error analysis, which notably does not depend on the norm of the off-diagonal blocks. The numerical experiment demonstrates that our algorithm surpasses existing algorithms in terms of accuracy and efficiency, making it highly valuable for a wide range of applications.
Solving time-dependent parametric partial differential equations (PDEs) is challenging, as models must adapt to variations in parameters such as coefficients, forcing terms, and boundary conditions. Data-driven neural solvers either train on data sampled from the PDE parameters distribution in the hope that the model generalizes to new instances or rely on gradient-based adaptation and meta-learning to implicitly encode the dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context trajectories or preceding states. This approach enables Zebra to flexibly handle arbitrarily sized context inputs and supports uncertainty quantification through the sampling of multiple solution trajectories. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.
We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhances the model's accuracy with minimal computational overhead, and even outperforms TDT results in some cases. These results highlight HAINAN's flexibility in balancing accuracy and speed, positioning it as a strong candidate for real-world speech recognition applications.
We introduce $5/2$- and $7/2$-order $L^2$-accurate randomized Runge-Kutta-Nystr\"{o}m methods, tailored for approximating Hamiltonian flows within non-reversible Markov chain Monte Carlo samplers, such as unadjusted Hamiltonian Monte Carlo and unadjusted kinetic Langevin Monte Carlo. We establish quantitative $5/2$-order $L^2$-accuracy upper bounds under gradient and Hessian Lipschitz assumptions on the potential energy function. The numerical experiments demonstrate the superior efficiency of the proposed unadjusted samplers on a variety of well-behaved, high-dimensional target distributions.
A major reason behind the recent success of large language models (LLMs) is their \textit{in-context learning} capability, which makes it possible to rapidly adapt them to downstream text-based tasks by prompting them with a small number of relevant demonstrations. While large vision-language models (VLMs) have recently been developed for tasks requiring both text and images, they largely lack in-context learning over visual information, especially in understanding and generating text about videos. In this work, we implement \textbf{E}mergent \textbf{I}n-context \textbf{Le}arning on \textbf{V}ideos (\eilev{}), a novel training paradigm that induces in-context learning over video and text by capturing key properties of pre-training data found by prior work to be essential for in-context learning in transformers. In our experiments, we show that \eilev-trained models outperform other off-the-shelf VLMs in few-shot video narration for novel, rare actions. Furthermore, we demonstrate that these key properties of bursty distributions, skewed marginal distributions, and dynamic meaning each contribute to varying degrees to VLMs' in-context learning capability in narrating procedural videos. Our results, analysis, and \eilev{}-trained models yield numerous insights about the emergence of in-context learning over video and text, creating a foundation for future work to optimize and scale VLMs for open-domain video understanding and reasoning. Our code and demo are available at \url{//github.com/yukw777/EILEV}.
Nutrient load simulators are large, deterministic, models that simulate the hydrodynamics and biogeochemical processes in aquatic ecosystems. They are central tools for planning cost efficient actions to fight eutrophication since they allow scenario predictions on impacts of nutrient load reductions to, e.g., harmful algal biomass growth. Due to being computationally heavy, the uncertainties related to these predictions are typically not rigorously assessed though. In this work, we developed a novel Bayesian computational approach for estimating the uncertainties in predictions of the Finnish coastal nutrient load model FICOS. First, we constructed a likelihood function for the multivariate spatiotemporal outputs of the FICOS model. Then, we used Bayes optimization to locate the posterior mode for the model parameters conditional on long term monitoring data. After that, we constructed a space filling design for FICOS model runs around the posterior mode and used it to train a Gaussian process emulator for the (log) posterior density of the model parameters. We then integrated over this (approximate) parameter posterior to produce probabilistic predictions for algal biomass and chlorophyll a concentration under alternative nutrient load reduction scenarios. Our computational algorithm allowed for fast posterior inference and the Gaussian process emulator had good predictive accuracy within the highest posterior probability mass region. The posterior predictive scenarios showed that the probability to reach the EUs Water Framework Directive objectives in the Finnish Archipelago Sea is generally low even under large load reductions.
We present the first polynomial-time algorithm to exactly compute the number of labeled chordal graphs on $n$ vertices. Our algorithm solves a more general problem: given $n$ and $\omega$ as input, it computes the number of $\omega$-colorable labeled chordal graphs on $n$ vertices, using $O(n^7)$ arithmetic operations. A standard sampling-to-counting reduction then yields a polynomial-time exact sampler that generates an $\omega$-colorable labeled chordal graph on $n$ vertices uniformly at random. Our counting algorithm improves upon the previous best result by Wormald (1985), which computes the number of labeled chordal graphs on $n$ vertices in time exponential in $n$. An implementation of the polynomial-time counting algorithm gives the number of labeled chordal graphs on up to $30$ vertices in less than three minutes on a standard desktop computer. Previously, the number of labeled chordal graphs was only known for graphs on up to $15$ vertices. In addition, we design two approximation algorithms: (1) an approximate counting algorithm that computes a $(1\pm\varepsilon)$-approximation of the number of $n$-vertex labeled chordal graphs, and (2) an approximate sampling algorithm that generates a random labeled chordal graph according to a distribution whose total variation distance from the uniform distribution is at most $\varepsilon$. The approximate counting algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ time, and the approximate sampling algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ expected time.
We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, based on the group representation theory. ``Constructive'' here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Notably, this class includes fully-connected networks, which are not group-equivariant but are joint-group-equivariant. Moreover, our main theorem also unifies the universal approximation theorems for both shallow and deep networks. While the universality of shallow networks has been investigated in a unified manner by the ridgelet transform, the universality of deep networks has been investigated in a case-by-case manner.
Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.
Significant work has been done on computing the ``average'' optimal solution value for various $\mathsf{NP}$-complete problems using the Erd\"{o}s-R\'{e}nyi model to establish \emph{critical thresholds}. Critical thresholds define narrow bounds for the optimal solution of a problem instance such that the probability that the solution value lies outside these bounds vanishes as the instance size approaches infinity. In this paper, we extend the Erd\"{o}s-R\'{e}nyi model to general hypergraphs on $n$ vertices and $M$ hyperedges. We consider the problem of determining critical thresholds for the largest cardinality matching, and we show that for $M=o(1.155^n)$ the size of the maximum cardinality matching is almost surely 1. On the other hand, if $M=\Theta(2^n)$ then the size of the maximum cardinality matching is $\Omega(n^{\frac12-\gamma})$ for an arbitrary $\gamma >0$. Lastly, we address the gap where $\Omega(1.155^n)=M=o(2^n)$ empirically through computer simulations.