又黄又爽又色的视频免费_免费黄色视频地址_免费午夜国产在线_欧美亚洲日本国产综合A_久久一本大到香蕉精品_国产精品一区二区正在播放_色综合天天狠天天透天天伊人

This note is concerned with deterministic constructions of $m \times N$ matrices satisfying a restricted isometry property from $\ell_2$ to $\ell_1$ on $s$-sparse vectors. Similarly to the standard ($\ell_2$ to $\ell_2$) restricted isometry property, such constructions can be found in the regime $m \asymp s^2$, at least in theory. With effectiveness of implementation in mind, two simple constructions are presented in the less pleasing but still relevant regime $m \asymp s^4$. The first one, executing a Las Vegas strategy, is quasideterministic and applies in the real setting. The second one, exploiting Golomb rulers, is explicit and applies to the complex setting. As a stepping stone, an explicit isometric embedding from $\ell_2^n(\mathbb{C})$ to $\ell_4^{cn^2}(\mathbb{C})$ is presented. Finally, the extension of the problem from sparse vectors to low-rank matrices is raised as an open question.

相關內容

向量化

關注 1

異常點 · 類別 · Learning · 樣本 · Extensibility ·

2023 年 12 月 17 日

Out-of-Distribution Detection in Long-Tailed Recognition with Calibrated Outlier Class Learning

Wenjun Miao,Guansong Pang,Tianqi Li,Xiao Bai,Jin Zheng

Existing out-of-distribution (OOD) methods have shown great success on balanced datasets but become ineffective in long-tailed recognition (LTR) scenarios where 1) OOD samples are often wrongly classified into head classes and/or 2) tail-class samples are treated as OOD samples. To address these issues, current studies fit a prior distribution of auxiliary/pseudo OOD data to the long-tailed in-distribution (ID) data. However, it is difficult to obtain such an accurate prior distribution given the unknowingness of real OOD samples and heavy class imbalance in LTR. A straightforward solution to avoid the requirement of this prior is to learn an outlier class to encapsulate the OOD samples. The main challenge is then to tackle the aforementioned confusion between OOD samples and head/tail-class samples when learning the outlier class. To this end, we introduce a novel calibrated outlier class learning (COCL) approach, in which 1) a debiased large margin learning method is introduced in the outlier class learning to distinguish OOD samples from both head and tail classes in the representation space and 2) an outlier-class-aware logit calibration method is defined to enhance the long-tailed classification confidence. Extensive empirical results on three popular benchmarks CIFAR10-LT, CIFAR100-LT, and ImageNet-LT demonstrate that COCL substantially outperforms state-of-the-art OOD detection methods in LTR while being able to improve the classification accuracy on ID data. Code is available at //github.com/mala-lab/COCL.

重參數化 · 推斷 · 潛在 · 近似 · 向量空間 ·

2023 年 12 月 16 日

Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs

Kevin Course,Prasanth B. Nair

from arxiv, In Advances in Neural Information Processing Systems. 2023

We consider the problem of inferring latent stochastic differential equations (SDEs) with a time and memory cost that scales independently with the amount of data, the total length of the time series, and the stiffness of the approximate differential equations. This is in stark contrast to typical methods for inferring latent differential equations which, despite their constant memory cost, have a time complexity that is heavily dependent on the stiffness of the approximate differential equation. We achieve this computational advancement by removing the need to solve differential equations when approximating gradients using a novel amortization strategy coupled with a recently derived reparametrization of expectations under linear SDEs. We show that, in practice, this allows us to achieve similar performance to methods based on adjoint sensitivities with more than an order of magnitude fewer evaluations of the model in training.

邊緣化 · Processing（編程語言） · MoDELS · 方差 · 馬爾可夫鏈蒙特卡羅 ·

2023 年 12 月 16 日

The Dynamic Triple Gamma Prior as a Shrinkage Process Prior for Time-Varying Parameter Models

Peter Knaus,Sylvia Frühwirth-Schnatter

Many current approaches to shrinkage within the time-varying parameter framework assume that each state is equipped with only one innovation variance for all time points. Sparsity is then induced by shrinking this variance towards zero. We argue that this is not sufficient if the states display large jumps or structural changes, something which is often the case in time series analysis. To remedy this, we propose the dynamic triple gamma prior, a stochastic process that has a well-known triple gamma marginal form, while still allowing for autocorrelation. Crucially, the triple gamma has many interesting limiting and special cases (including the horseshoe shrinkage prior) which can also be chosen as the marginal distribution. Not only is the marginal form well understood, we further derive many interesting properties of the dynamic triple gamma, which showcase its dynamic shrinkage characteristics. We develop an efficient Markov chain Monte Carlo algorithm to sample from the posterior and demonstrate the performance through sparse covariance modeling and forecasting of the returns of the components of the EURO STOXX 50 index.

TD · Markov · 近似 · 泛函 · 馬爾可夫鏈 ·

2023 年 12 月 16 日

A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak,Vivek S. Borkar

from arxiv, Submitted to Stochastic Systems

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

近似 · 簇 · FAST · state-of-the-art · 分解的 ·

2023 年 12 月 15 日

Fast Approximations and Coresets for (k, l)-Median under Dynamic Time Warping

Jacobus Conradi,Benedikt Kolbe,Ioannis Psarros,Dennis Rohde

We present algorithms for the computation of $\varepsilon$-coresets for $k$-median clustering of point sequences in $\mathbb{R}^d$ under the $p$-dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the $\varepsilon$-coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the $k$-median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given $n$ curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating $(k,\ell)$-median clustering under DTW.

上下文窗口 · 大語言模型 · 語言模型化 · 可約的 · INFORMS ·

2023 年 12 月 15 日

Extending Context Window of Large Language Models via Semantic Compression

Weizhi Fei,Xueyan Niu,Pingyi Zhou,Lu Hou,Bo Bai,Lei Deng,Wei Han

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational costs or requiring fine-tuning. Our proposed framework draws inspiration from source coding in information theory and employs a pre-trained model to reduce the semantic redundancy of long inputs before passing them to the LLMs for downstream tasks. Experimental results demonstrate that our method effectively extends the context window of LLMs across a range of tasks including question answering, summarization, few-shot learning, and information retrieval. Furthermore, the proposed semantic compression method exhibits consistent fluency in text generation while reducing the associated computational overhead.

優化器 · 離散化 · 樣本 · 情景 · 噪聲 ·

2023 年 12 月 15 日

Distributionally-Robust Optimization with Noisy Data for Discrete Uncertainties Using Total Variation Distance

Farhad Farokhi

from arxiv, Fixed a typo in the statement of Corollary 4.1

Stochastic programs where the uncertainty distribution must be inferred from noisy data samples are considered. The stochastic programs are approximated with distributionally-robust optimizations that minimize the worst-case expected cost over ambiguity sets, i.e., sets of distributions that are sufficiently compatible with the observed data. In this paper, the ambiguity sets capture the set of probability distributions whose convolution with the noise distribution remains within a ball centered at the empirical noisy distribution of data samples parameterized by the total variation distance. Using the prescribed ambiguity set, the solutions of the distributionally-robust optimizations converge to the solutions of the original stochastic programs when the numbers of the data samples grow to infinity. Therefore, the proposed distributionally-robust optimization problems are asymptotically consistent. This is proved under the assumption that the distribution of the noise is uniformly diagonally dominant. More importantly, the distributionally-robust optimization problems can be cast as tractable convex optimization problems and are therefore amenable to large-scale stochastic problems.

蒸餾 · 數據集 · 原點 · INFORMS · state-of-the-art ·

2023 年 12 月 14 日

Dataset Distillation via Adversarial Prediction Matching

Mingyang Chen,Bo Huang,Junda Lu,Bing Li,Yi Wang,Minhao Cheng,Wei Wang

Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large original dataset and on the small distilled dataset, as a conduit for condensing information from the raw data into the distilled version. An adversarial framework is proposed to solve the problem efficiently. In contrast to existing distillation methods involving nested optimization or long-range gradient unrolling, our approach hinges on single-level optimization. This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6.5GB of GPU memory. Under the optimal tradeoff strategy, it requires only 2.5$\times$ less memory and 5$\times$ less runtime compared to the state-of-the-art. Empirically, our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets including ImageNet-1K, significantly surpassing state-of-the-art. Additionally, extensive tests reveal that our distilled datasets excel in cross-architecture generalization capabilities.

MoDELS · 蒸餾 · Performer · 語言模型化 · Less ·

2023 年 5 月 3 日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Cheng-Yu Hsieh,Chun-Liang Li,Chih-Kuan Yeh,Hootan Nakhost,Yasuhisa Fujii,Alexander Ratner,Ranjay Krishna,Chen-Yu Lee,Tomas Pfister

from arxiv, Accepted to Findings of ACL 2023

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.

圖形處理器 · 圖 · Neural Networks · Networking · entity ·

2019 年 11 月 6 日

Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

Deming Ye,Yankai Lin,Zhenghao Liu,Zhiyuan Liu,Maosong Sun

Multi-paragraph reasoning is indispensable for open-domain question answering (OpenQA), which receives less attention in the current OpenQA systems. In this work, we propose a knowledge-enhanced graph neural network (KGNN), which performs reasoning over multiple paragraphs with entities. To explicitly capture the entities' relatedness, KGNN utilizes relational facts in knowledge graph to build the entity graph. The experimental results show that KGNN outperforms in both distractor and full wiki settings than baselines methods on HotpotQA dataset. And our further analysis illustrates KGNN is effective and robust with more retrieved paragraphs.