在线直接观看免费的黄片视频_曰本中文字幕一区二区三区高清_国产专区青青青在线视频_色欲AV亚洲一区二区三区_久久久高清国产免费观看_亚洲视频制度丝袜_亚洲人成在线观看网站精品

We revisit the sample and computational complexity of completing a rank-1 tensor in $\otimes_{i=1}^{N} \mathbb{R}^{d}$, given a uniformly sampled subset of its entries. We present a characterization of the problem (i.e. nonzero entries) which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems. For example, when $N = \Theta(1)$, we prove it uses no more than $m = O(d^2 \log d)$ samples and runs in $O(md^2)$ time. Moreover, we show any algorithm requires $\Omega(d\log d)$ samples. By contrast, existing upper bounds on the sample complexity are at least as large as $d^{1.5} \mu^{\Omega(1)} \log^{\Omega(1)} d$, where $\mu$ can be $\Theta(d)$ in the worst case. Prior work obtained these looser guarantees in higher rank versions of our problem, and tend to involve more complicated algorithms.

相關內容

樣本

關注 2

塊 · 線性的 · 泛函 · Integration · 方陣 ·

2024 年 10 月 4 日

Generalizing the Fréchet Derivative Algorithm for the Matrix Exponential

Awad H. Al-Mohy

The computation of off-diagonal blocks of matrix functions $f(T)$, where $T$ is block triangular, poses a challenging problem in scientific computing. We present a novel algorithm that exploits the structure of block triangular matrices, generalizing the algorithm of Al-Mohy and Higham for computing the Fr\'echet derivative of the matrix exponential. This work has significant applications in fields such as exponential integrators for solving systems of first-order differential equations, Hamiltonian linear systems in control theory, and option pricing in finance. Our approach introduces a linear operator that maps off-diagonal blocks of $T$ into their counterparts in $f(T)$. By studying the algebraic properties of the operator, we establish a comprehensive computational framework, paving the way to extend existing Fr\'echet derivative algorithms of matrix functions to more general settings. For the matrix exponential, in particular, the algorithm employs the scaling and squaring method with diagonal Pad\'e approximants to $\exp(x)$, with parameters chosen based on a rigorous backward error analysis, which notably does not depend on the norm of the off-diagonal blocks. The numerical experiment demonstrates that our algorithm surpasses existing algorithms in terms of accuracy and efficiency, making it highly valuable for a wide range of applications.

推斷 · MoDELS · PDE · 樣本 · ForCES ·

2024 年 10 月 4 日

Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs

Louis Serrano,Armand Kassa? Koupa?,Thomas X Wang,Pierre Erbacher,Patrick Gallinari

Solving time-dependent parametric partial differential equations (PDEs) is challenging, as models must adapt to variations in parameters such as coefficients, forcing terms, and boundary conditions. Data-driven neural solvers either train on data sampled from the PDE parameters distribution in the hope that the model generalizes to new instances or rely on gradient-based adaptation and meta-learning to implicitly encode the dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context trajectories or preceding states. This approach enables Zebra to flexibly handle arbitrarily sized context inputs and supports uncertainty quantification through the sampling of multiple solution trajectories. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.

推斷 · 語音識別 · 模型評估 · 預測器/決策函數 · FAST ·

2024 年 10 月 3 日

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

Hainan Xu,Travis M. Bartley,Vladimir Bataev,Boris Ginsburg

We present \textbf{H}ybrid-\textbf{A}utoregressive \textbf{IN}ference Tr\textbf{AN}sducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhances the model's accuracy with minimal computational overhead, and even outperforms TDT results in some cases. These results highlight HAINAN's flexibility in balancing accuracy and speed, positioning it as a strong candidate for real-world speech recognition applications.

蒙特卡羅 · Markov · 馬爾可夫鏈 · 馬爾可夫鏈蒙特卡羅 · Lipschitz ·

2024 年 10 月 3 日

Randomized Runge-Kutta-Nystr?m Methods for Unadjusted Hamiltonian and Kinetic Langevin Monte Carlo

Nawaf Bou-Rabee,Tore Selland Kleppe

We introduce $5/2$- and $7/2$-order $L^2$-accurate randomized Runge-Kutta-Nystr\"{o}m methods, tailored for approximating Hamiltonian flows within non-reversible Markov chain Monte Carlo samplers, such as unadjusted Hamiltonian Monte Carlo and unadjusted kinetic Langevin Monte Carlo. We establish quantitative $5/2$-order $L^2$-accuracy upper bounds under gradient and Hessian Lipschitz assumptions on the potential energy function. The numerical experiments demonstrate the superior efficiency of the proposed unadjusted samplers on a variety of well-behaved, high-dimensional target distributions.

Learning · MoDELS · 可理解性 · 語言模型化 · 邊緣化 ·

2024 年 10 月 3 日

Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties

Keunwoo Peter Yu,Zheyuan Zhang,Fengyuan Hu,Shane Storks,Joyce Chai

from arxiv, 16 pages, LaTeX; Accepted to EMNLP 2024 Main

A major reason behind the recent success of large language models (LLMs) is their \textit{in-context learning} capability, which makes it possible to rapidly adapt them to downstream text-based tasks by prompting them with a small number of relevant demonstrations. While large vision-language models (VLMs) have recently been developed for tasks requiring both text and images, they largely lack in-context learning over visual information, especially in understanding and generating text about videos. In this work, we implement \textbf{E}mergent \textbf{I}n-context \textbf{Le}arning on \textbf{V}ideos (\eilev{}), a novel training paradigm that induces in-context learning over video and text by capturing key properties of pre-training data found by prior work to be essential for in-context learning in transformers. In our experiments, we show that \eilev-trained models outperform other off-the-shelf VLMs in few-shot video narration for novel, rare actions. Furthermore, we demonstrate that these key properties of bursty distributions, skewed marginal distributions, and dynamic meaning each contribute to varying degrees to VLMs' in-context learning capability in narrating procedural videos. Our results, analysis, and \eilev{}-trained models yield numerous insights about the emergence of in-context learning over video and text, creating a foundation for future work to optimize and scale VLMs for open-domain video understanding and reasoning. Our code and demo are available at \url{//github.com/yukw777/EILEV}.

MoDELS · Processing（編程語言） · 峰值 · TOOLS · 估計/估計量 ·

2024 年 10 月 3 日

Bayesian Calibration and Uncertainty Quantification for a Large Nutrient Load Impact Model

Karel Kaurila,Risto Lignell,Frede Thingstad,Harri Kuosa,Jarno Vanhatalo

from arxiv, 29 pages and 7 figures

Nutrient load simulators are large, deterministic, models that simulate the hydrodynamics and biogeochemical processes in aquatic ecosystems. They are central tools for planning cost efficient actions to fight eutrophication since they allow scenario predictions on impacts of nutrient load reductions to, e.g., harmful algal biomass growth. Due to being computationally heavy, the uncertainties related to these predictions are typically not rigorously assessed though. In this work, we developed a novel Bayesian computational approach for estimating the uncertainties in predictions of the Finnish coastal nutrient load model FICOS. First, we constructed a likelihood function for the multivariate spatiotemporal outputs of the FICOS model. Then, we used Bayes optimization to locate the posterior mode for the model parameters conditional on long term monitoring data. After that, we constructed a space filling design for FICOS model runs around the posterior mode and used it to train a Gaussian process emulator for the (log) posterior density of the model parameters. We then integrated over this (approximate) parameter posterior to produce probabilistic predictions for algal biomass and chlorophyll a concentration under alternative nutrient load reduction scenarios. Our computational algorithm allowed for fast posterior inference and the Gaussian process emulator had good predictive accuracy within the highest posterior probability mass region. The posterior predictive scenarios showed that the probability to reach the EUs Water Framework Directive objectives in the Finnish Archipelago Sea is generally low even under large load reductions.

弦圖 · 標注 · 圖 · 近似 · 樣本 ·

2024 年 10 月 3 日

Counting and Sampling Labeled Chordal Graphs in Polynomial Time

Ursula Hebert-Johnson,Daniel Lokshtanov,Eric Vigoda

from arxiv, Accepted for publication at ESA 2023 (European Symposium on Algorithms); 52 pages, 4 figures

We present the first polynomial-time algorithm to exactly compute the number of labeled chordal graphs on $n$ vertices. Our algorithm solves a more general problem: given $n$ and $\omega$ as input, it computes the number of $\omega$-colorable labeled chordal graphs on $n$ vertices, using $O(n^7)$ arithmetic operations. A standard sampling-to-counting reduction then yields a polynomial-time exact sampler that generates an $\omega$-colorable labeled chordal graph on $n$ vertices uniformly at random. Our counting algorithm improves upon the previous best result by Wormald (1985), which computes the number of labeled chordal graphs on $n$ vertices in time exponential in $n$. An implementation of the polynomial-time counting algorithm gives the number of labeled chordal graphs on up to $30$ vertices in less than three minutes on a standard desktop computer. Previously, the number of labeled chordal graphs was only known for graphs on up to $15$ vertices. In addition, we design two approximation algorithms: (1) an approximate counting algorithm that computes a $(1\pm\varepsilon)$-approximation of the number of $n$-vertex labeled chordal graphs, and (2) an approximate sampling algorithm that generates a random labeled chordal graph according to a distribution whose total variation distance from the uniform distribution is at most $\varepsilon$. The approximate counting algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ time, and the approximate sampling algorithm runs in $O(n^3\log{n}\log^7(1/\varepsilon))$ expected time.

大學 · 通用近似器 · 通用近似定理 · Networking · Networks ·

2024 年 10 月 3 日

Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines

Sho Sonoda,Yuka Hashimoto,Isao Ishikawa,Masahiro Ikeda

We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, based on the group representation theory. ``Constructive'' here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Notably, this class includes fully-connected networks, which are not group-equivariant but are joint-group-equivariant. Moreover, our main theorem also unifies the universal approximation theorems for both shallow and deep networks. While the universality of shallow networks has been investigated in a unified manner by the ridgelet transform, the universality of deep networks has been investigated in a case-by-case manner.

方差 · 可約的 · 估計/估計量 · 近似 · 方差減小 ·

2024 年 10 月 2 日

Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks

Alfredo Reichlin,Gustaf Tegnér,Miguel Vasco,Hang Yin,M?rten Bj?rkman,Danica Kragic

Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the case in meta-regression tasks. In such cases, the estimated adaptation strategy is subject to high variance due to the limited amount of support data for each task, which often leads to sub-optimal generalization performance. In this work, we address the problem of variance reduction in gradient-based meta-learning and formalize the class of problems prone to this, a condition we refer to as \emph{task overlap}. Specifically, we propose a novel approach that reduces the variance of the gradient estimate by weighing each support point individually by the variance of its posterior over the parameters. To estimate the posterior, we utilize the Laplace approximation, which allows us to express the variance in terms of the curvature of the loss landscape of our meta-learner. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of variance reduction in meta-learning.

評論員 · 閾值 · 優化器 · 示例 · MoDELS ·

2024 年 10 月 2 日

Critical Thresholds for Maximum Cardinality Matching on General Hypergraphs

Christopher Sumnicht,Jamison W. Weber,Dhanush R. Giriyan,Arunabha Sen

Significant work has been done on computing the ``average'' optimal solution value for various $\mathsf{NP}$-complete problems using the Erd\"{o}s-R\'{e}nyi model to establish \emph{critical thresholds}. Critical thresholds define narrow bounds for the optimal solution of a problem instance such that the probability that the solution value lies outside these bounds vanishes as the instance size approaches infinity. In this paper, we extend the Erd\"{o}s-R\'{e}nyi model to general hypergraphs on $n$ vertices and $M$ hyperedges. We consider the problem of determining critical thresholds for the largest cardinality matching, and we show that for $M=o(1.155^n)$ the size of the maximum cardinality matching is almost surely 1. On the other hand, if $M=\Theta(2^n)$ then the size of the maximum cardinality matching is $\Omega(n^{\frac12-\gamma})$ for an arbitrary $\gamma >0$. Lastly, we address the gap where $\Omega(1.155^n)=M=o(2^n)$ empirically through computer simulations.