日本人体黄色三级视频,GOGOGO高清在线播放,久久久精品亚州字幕,国产丝袜不卡一区二区三区四区,色欧美88888久久久久久影院

A sum-rank-metric code attaining the Singleton bound is called maximum sum-rank distance (MSRD). MSRD codes have applications in space-time coding and construction of partial-MDS codes for repair in distributed storage. MSRD codes have been constructed in some parameter cases. In this paper we construct a ${\bf F}_q$-linear MSRD code over some field ${\bf F}_q$ with different matrix sizes $n_1>n_2>\cdots>n_t$ satisfying $n_i \geq n_{i+1}^2+\cdots+n_t^2$ for $i=1, 2, \ldots, t-1$ for any given minimum sum-rank distance. Many good linear sum-rank-metric codes over small fields with such different matrix sizes are given. A lower bound on the dimensions of constructed ${\bf F}_{q^2}$-linear sum-rank-metric codes over ${\bf F}_{q^2}$ with such different matrix sizes and given minimum sum-rank distances is also presented.

相關內容

極小點

關注 0

通用近似器 · 近似 · 通用近似定理 · Learning · Continuity ·

2022 年 7 月 25 日

Universal Approximation Theorems for Differentiable Geometric Deep Learning

Anastasis Kratsios,Leonie Papon

from arxiv, Keywords: Geometric Deep Learning, Symmetric Positive-Definite Matrices, Hyperbolic Neural Networks, Deep Kalman Filter, Shape Space, Riemannian Manifolds, Curse of Dimensionality. Additional Information: 33 Pages + 30 Pages Appendix + Bibliography, 2 Tables, 7 Figures;

This paper addresses the growing need to process non-Euclidean data, by introducing a geometric deep learning (GDL) framework for building universal feedforward-type models compatible with differentiable manifold geometries. We show that our GDL models can approximate any continuous target function uniformly on compact sets of a controlled maximum diameter. We obtain curvature-dependent lower-bounds on this maximum diameter and upper-bounds on the depth of our approximating GDL models. Conversely, we find that there is always a continuous function between any two non-degenerate compact manifolds that any "locally-defined" GDL model cannot uniformly approximate. Our last main result identifies data-dependent conditions guaranteeing that the GDL model implementing our approximation breaks "the curse of dimensionality." We find that any "real-world" (i.e. finite) dataset always satisfies our condition and, conversely, any dataset satisfies our requirement if the target function is smooth. As applications, we confirm the universal approximation capabilities of the following GDL models: Ganea et al. (2018)'s hyperbolic feedforward networks, the architecture implementing Krishnan et al. (2015)'s deep Kalman-Filter, and deep softmax classifiers. We build universal extensions/variants of: the SPD-matrix regressor of Meyer et al. (2011), and Fletcher (2003)'s Procrustean regressor. In the Euclidean setting, our results imply a quantitative version of Kidger and Lyons (2020)'s approximation theorem and a data-dependent version of Yarotsky and Zhevnerchuk (2019)'s uncursed approximation rates.

隨機梯度下降 · Analysis · 損失函數（機器學習） · 二分類 · binary ·

2022 年 7 月 25 日

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

Atsushi Nitanda,Taiji Suzuki

from arxiv, 15 pages, 2 figures

We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.

矩陣論 · 情景 · MoDELS · Integration · 經驗模型 ·

2022 年 7 月 24 日

Test Set Sizing Via Random Matrix Theory

Alexander Dubbs

This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.

秩 · 不變 · 判別器 · 離散化 · 情景 ·

2022 年 7 月 23 日

The discriminating power of the generalized rank invariant

Nate Clause,Woojin Kim,Facundo Memoli

from arxiv, 23 pages, 3 figures

It is a well-known fact that there is no complete and discrete invariant on the collection of all multiparameter persistence modules. Nonetheless, many invariants have been proposed in the literature to study multiparameter persistence modules, though each invariant will lose some amount of information. One such invariant is the generalized rank invariant. This invariant is known to be complete on the class of interval decomposable persistence modules in general, under mild assumptions on the indexing poset $P$. There is often a trade-off, where the stronger an invariant is, the more expensive it is to compute in practice. The generalized rank invariant on its own is difficult to compute, whereas the standard rank invariant is readily computable through software implementations such as RIVET. We can interpolate between these two to induce new invariants via restricting the domain of the generalized rank invariant, and this family exhibits the aforementioned trade-off. This work studies the tension which exists between computational efficiency and retaining strength when restricting the domain of the generalized rank invariant. We provide a characterization result on where such restrictions are complete invariants in the setting where $P$ is finite, and furthermore show that such restricted generalized rank invariants are stable.

估計/估計量 · 無偏 · Extensibility · FAST · 無偏估計 ·

2022 年 7 月 22 日

Local search for efficient causal effect estimation

Debo Cheng,Jiuyong Li,Lin Liu,Jiji Zhang,Jixue Liu,Thuc Duy Le

from arxiv, 14 pages, 8 figures and 2 tables

Causal effect estimation from observational data is a challenging problem, especially with high dimensional data and in the presence of unobserved variables. The available data-driven methods for tackling the problem either provide an estimation of the bounds of a causal effect (i.e. nonunique estimation) or have low efficiency. The major hurdle for achieving high efficiency while trying to obtain unique and unbiased causal effect estimation is how to find a proper adjustment set for confounding control in a fast way, given the huge covariate space and considering unobserved variables. In this paper, we approach the problem as a local search task for finding valid adjustment sets in data. We establish the theorems to support the local search for adjustment sets, and we show that unique and unbiased estimation can be achieved from observational data even when there exist unobserved variables. We then propose a data-driven algorithm that is fast and consistent under mild assumptions. We also make use of a frequent pattern mining method to further speed up the search of minimal adjustment sets for causal effect estimation. Experiments conducted on extensive synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art criteria/estimators in both accuracy and time-efficiency.

優化器 · 代價函數 · 泛函 · 代價 · Sphering ·

2022 年 7 月 22 日

Strong c-concavity and stability in optimal transport

Anatole Gallou?t,Quentin Mérigot,Boris Thibert

The stability of solutions to optimal transport problems under variation of the measures is fundamental from a mathematical viewpoint: it is closely related to the convergence of numerical approaches to solve optimal transport problems and justifies many of the applications of optimal transport. In this article, we introduce the notion of strong c-concavity, and we show that it plays an important role for proving stability results in optimal transport for general cost functions c. We then introduce a differential criterion for proving that a function is strongly c-concave, under an hypothesis on the cost introduced originally by Ma-Trudinger-Wang for establishing regularity of optimal transport maps. Finally, we provide two examples where this stability result can be applied, for cost functions taking value +$\infty$ on the sphere: the reflector problem and the Gaussian curvature measure prescription problem.

可約的 · 權共享 · 相關系數 · MoDELS · Weight ·

2022 年 7 月 22 日

Analyzing and Mitigating Interference in Neural Architecture Search

Jin Xu,Xu Tan,Kaitao Song,Renqian Luo,Yichong Leng,Tao Qin,Tie-Yan Liu,Jian Li

from arxiv, ICML 2022, Spotlight

Weight sharing is a popular approach to reduce the cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the rank correlation between the estimated accuracy and ground truth accuracy of those child models is low due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) MAGIC-T: rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) MAGIC-A: forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our discovered architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 points and ELECTRA$_{\rm base}$ by 1.6 and 1.1 points on the dev and test set of GLUE benchmark. Extensive results on the BERT compression, reading comprehension and ImageNet task demonstrate the effectiveness and generality of our proposed methods.

線性的 · 代碼 · 可辨認的 · INFORMS · 可理解性 ·

2022 年 7 月 21 日

Efficient Linear and Affine Codes for Correcting Insertions/Deletions

Kuan Cheng,Venkatesan Guruswami,Bernhard Haeupler,Xin Li

This paper studies \emph{linear} and \emph{affine} error-correcting codes for correcting synchronization errors such as insertions and deletions. We call such codes linear/affine insdel codes. Linear codes that can correct even a single deletion are limited to have information rate at most $1/2$ (achieved by the trivial 2-fold repetition code). Previously, it was (erroneously) reported that more generally no non-trivial linear codes correcting $k$ deletions exist, i.e., that the $(k+1)$-fold repetition codes and its rate of $1/(k+1)$ are basically optimal for any $k$. We disprove this and show the existence of binary linear codes of length $n$ and rate just below $1/2$ capable of correcting $\Omega(n)$ insertions and deletions. This identifies rate $1/2$ as a sharp threshold for recovery from deletions for linear codes, and reopens the quest for a better understanding of the capabilities of linear codes for correcting insertions/deletions. We prove novel outer bounds and existential inner bounds for the rate vs. (edit) distance trade-off of linear insdel codes. We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear code for insdel errors. Lastly, we show that the $\frac{1}{2}$-rate limitation does not hold for affine codes by giving an explicit affine code of rate $1-\epsilon$ which can efficiently correct a constant fraction of insdel errors.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

平滑 · 注意力機制 · 反向傳播 · 維特比算法 · 正則化項 ·

2018 年 2 月 20 日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch,Mathieu Blondel

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.