日本三级网站在线播放_日韩欧美国产AⅤ另类_甜味弥漫一区二区在线观看_女性潮喷黄色网站在线浏览_无码日韩三及啪啪_国产精品无遮挡无打码黄污网_国产小视频在线观看免费

Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just $\log\sigma + O(1)$ bits per edge, $\sigma$ being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input $\mathcal A$ is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in $O(|\mathcal A|\log |\mathcal A|)$ time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear $O(|\mathcal A|)$ time. When run on de Bruijn WDFAs built from real DNA datasets, an implementation of our algorithm reduces the number of nodes from 14% to 51% at a speed of more than 1 million nodes per second.

相關內容

可約的

關注 2

高斯過程回歸 · 核化 · Processing（編程語言） · 頻率主義學派 · 核嶺回歸 ·

2022 年 1 月 10 日

Gaussian Process Regression in the Flat Limit

Simon Barthelmé,Pierre-Olivier Amblard,Nicolas Tremblay,Konstantin Usevich

Gaussian process (GP) regression is a fundamental tool in Bayesian statistics. It is also known as kriging and is the Bayesian counterpart to the frequentist kernel ridge regression. Most of the theoretical work on GP regression has focused on a large-$n$ asymptotics, characterising the behaviour of GP regression as the amount of data increases. Fixed-sample analysis is much more difficult outside of simple cases, such as locations on a regular grid. In this work we perform a fixed-sample analysis that was first studied in the context of approximation theory by Driscoll & Fornberg (2002), called the "flat limit". In flat-limit asymptotics, the goal is to characterise kernel methods as the length-scale of the kernel function tends to infinity, so that kernels appear flat over the range of the data. Surprisingly, this limit is well-defined, and displays interesting behaviour: Driscoll & Fornberg showed that radial basis interpolation converges in the flat limit to polynomial interpolation, if the kernel is Gaussian. Leveraging recent results on the spectral behaviour of kernel matrices in the flat limit, we study the flat limit of Gaussian process regression. Results show that Gaussian process regression tends in the flat limit to (multivariate) polynomial regression, or (polyharmonic) spline regression, depending on the kernel. Importantly, this holds for both the predictive mean and the predictive variance, so that the posterior predictive distributions become equivalent. Our results have practical consequences: for instance, they show that optimal GP predictions in the sense of leave-one-out loss may occur at very large length-scales, which would be invisible to current implementations because of numerical difficulties.

解碼 · 圖 · Better · 正則化項 · 相互獨立的 ·

2022 年 1 月 10 日

Improved Decoding of Expander Codes

Xue Chen,Kuan Cheng,Xin Li,Minghui Ouyang

We study the classical expander codes, introduced by Sipser and Spielman \cite{SS96}. Given any constants $0< \alpha, \varepsilon < 1/2$, and an arbitrary bipartite graph with $N$ vertices on the left, $M < N$ vertices on the right, and left degree $D$ such that any left subset $S$ of size at most $\alpha N$ has at least $(1-\varepsilon)|S|D$ neighbors, we show that the corresponding linear code given by parity checks on the right has distance at least roughly $\frac{\alpha N}{2 \varepsilon }$. This is strictly better than the best known previous result of $2(1-\varepsilon ) \alpha N$ \cite{Sudan2000note, Viderman13b} whenever $\varepsilon < 1/2$, and improves the previous result significantly when $\varepsilon $ is small. Furthermore, we show that this distance is tight in general, thus providing a complete characterization of the distance of general expander codes. Next, we provide several efficient decoding algorithms, which vastly improve previous results in terms of the fraction of errors corrected, whenever $\varepsilon < \frac{1}{4}$. Finally, we also give a bound on the list-decoding radius of general expander codes, which beats the classical Johnson bound in certain situations (e.g., when the graph is almost regular and the code has a high rate). Our techniques exploit novel combinatorial properties of bipartite expander graphs. In particular, we establish a new size-expansion tradeoff, which may be of independent interests.

熵編碼 · 查準率/準確率 · 模型評估 · 優化器 · 霍夫曼編碼 ·

2022 年 1 月 7 日

The Efficiency of the ANS Entropy Encoding

Dmitry Kosolobov

from arxiv, 15 pages, 5 figures, 2 algorithms

The Asymmetric Numeral Systems (ANS) is a class of entropy encoders by Duda that had an immense impact on the data compression, substituting arithmetic and Huffman coding. The optimality of ANS was studied by Duda et al. but the precise asymptotic behaviour of its redundancy (in comparison to the entropy) was not completely understood. In this paper we establish an optimal bound on the redundancy for the tabled ANS (tANS), the most popular ANS variant. Given a sequence $a_1,\ldots,a_n$ of letters from an alphabet $\{0,\ldots,\sigma-1\}$ such that each letter $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of length (frequencies are not included in the encoding size): $$ \sum\limits_{a\in [0..\sigma)}f_a\cdot\log\frac{n}{f_a}+O(\sigma+r), $$ where $O(\sigma + r)$ can be bounded by $\sigma\log e+r$. The $r$-bit term is an encoder artifact indispensable to ANS; the rest incurs a redundancy of $O(\frac{\sigma}{n})$ bits per letter. We complement this bound by a series of examples showing that an $\Omega(\sigma+r)$ redundancy is necessary when $\sigma > n/3$, where $\Omega(\sigma + r)$ is at least $\frac{\sigma-1}{4}+r-2$. We argue that similar examples exist for any methods that distribute letters in tANS tables using only the knowledge about frequencies. Thus, we refute Duda's conjecture that the redundancy is $O(\frac{\sigma}{n^2})$ bits per letter. We also propose a new variant of range ANS (rANS), called rANS with fixed accuracy, that is parameterized by $k \ge 1$. In this variant the integer division, which is unavoidable in rANS, is performed only in cases when its result belongs to $[2^k..2^{k+1})$. Hence, the division can be computed by faster methods provided $k$ is small. We bound the redundancy for the rANS with fixed accuracy $k$ by $\frac{n}{2^k-1}\log e+r$.

向量化 · 可辨認的 · CASE · 優化器 · 線性的 ·

2022 年 1 月 6 日

Complexity of Source-Sink Monotone 2-Parameter Min Cut

Maxwell Allman,Venus Lo,S. Thomas McCormick

There are many applications of max flow with capacities that depend on one or more parameters. Many of these applications fall into the "Source-Sink Monotone" framework, a special case of Topkis's monotonic optimization framework, which implies that the parametric min cuts are nested. When there is a single parameter, this property implies that the number of distinct min cuts is linear in the number of nodes, which is quite useful for constructing algorithms to identify all possible min cuts. When there are multiple Source-Sink Monotone parameters and the vector of parameters are ordered in the usual vector sense, the resulting min cuts are still nested. However, the number of distinct min cuts was an open question. We show that even with only two parameters, the number of distinct min cuts can be exponential in the number of nodes.

極小點 · GM · 代價 · 估計/估計量 · 情景 ·

2022 年 1 月 6 日

A note on efficient minimum cost adjustment sets in causal graphical models

Ezequiel Smucler,Andrea Rotnitzky

We study the selection of adjustment sets for estimating the interventional mean under an individualized treatment rule. We assume a non-parametric causal graphical model with, possibly, hidden variables and at least one adjustment set comprised of observable variables. Moreover, we assume that observable variables have positive costs associated with them. We define the cost of an observable adjustment set as the sum of the costs of the variables that comprise it. We show that in this setting there exist adjustment sets that are minimum cost optimal, in the sense that they yield non-parametric estimators of the interventional mean with the smallest asymptotic variance among those that control for observable adjustment sets that have minimum cost. Our results are based on the construction of a special flow network associated with the original causal graph. We show that a minimum cost optimal adjustment set can be found by computing a maximum flow on the network, and then finding the set of vertices that are reachable from the source by augmenting paths. The optimaladj Python package implements the algorithms introduced in this paper.

Extensibility · 塊 · 損失函數（機器學習） · 泛函 · 經驗損失 ·

2022 年 1 月 5 日

Convergence and Complexity of Stochastic Block Majorization-Minimization

Hanbaek Lyu

from arxiv, 42 pages, 1 figure

Stochastic majorization-minimization (SMM) is an online extension of the classical principle of majorization-minimization, which consists of sampling i.i.d. data points from a fixed data distribution and minimizing a recursively defined majorizing surrogate of an objective function. In this paper, we introduce stochastic block majorization-minimization, where the surrogates can now be only block multi-convex and a single block is optimized at a time within a diminishing radius. Relaxing the standard strong convexity requirements for surrogates in SMM, our framework gives wider applicability including online CANDECOMP/PARAFAC (CP) dictionary learning and yields greater computational efficiency especially when the problem dimension is large. We provide an extensive convergence analysis on the proposed algorithm, which we derive under possibly dependent data streams, relaxing the standard i.i.d. assumption on data samples. We show that the proposed algorithm converges almost surely to the set of stationary points of a nonconvex objective under constraints at a rate $O((\log n)^{1+\eps}/n^{1/2})$ for the empirical loss function and $O((\log n)^{1+\eps}/n^{1/4})$ for the expected loss function, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\eps}/n^{1/2})$. Our results provide first convergence rate bounds for various online matrix and tensor decomposition algorithms under a general Markovian data setting.

豪斯多夫距離 · 歐幾里得范數 · 極小點 · 分解的 · 相似度 ·

2022 年 1 月 4 日

Polyline Simplification under the Local Fréchet Distance has Subcubic Complexity in 2D

Sabine Storandt,Johannes Zink

Given a polyline on $n$ vertices, the polyline simplification problem asks for a minimum size subsequence of these vertices defining a new polyline whose distance to the original polyline is at most a given threshold under some distance measure. In this paper, we improve the long-standing running time bound for the simplification of polylines under the local Fr\'echet distance. The best algorithm known so far is by Imai and Iri and has a cubic running time in $n$. We present an algorithm with a running time of $O(n^2)$ under the $L_1$ and $L_\infty$ norm, and $O(n^2 \log n)$ under $L_{p \in (1,\infty)}$ (including the Euclidean norm $L_2$). Our approach is based on the ideas of Chan and Chin, who showed that under the local Hausdorff distance, the Imai-Iri algorithm can be improved to run in quadratic time for $L_1$, $L_2$, and $L_\infty$. However, the Hausdorff distance does not take the order of the points along the polyline into account. The Fr\'echet distance, which is sensitive to the course of the polylines, is hence often deemed the superior distance measure for polyline similarity but it also more intricate to compute. So far, the significantly faster simplification algorithms for the Hausdorff distance made them preferable for practical application. But our new algorithm for simplification under the Fr\'echet distance matches the running time bounds for the Hausdorff distance up to logarithmic factors and thus allows the usage of this more suitable distance measure.

極小點 · Information Systems · INFORMS · binary · 無限 ·

2022 年 1 月 4 日

Time and space complexity of deterministic and nondeterministic decision trees

Mikhail Moshkov

In this paper, we study arbitrary infinite binary information systems each of which consists of an infinite set called universe and an infinite set of two-valued functions (attributes) defined on the universe. We consider the notion of a problem over information system which is described by a finite number of attributes and a mapping corresponding a decision to each tuple of attribute values. As algorithms for problem solving, we use deterministic and nondeterministic decision trees. As time and space complexity, we study the depth and the number of nodes in the decision trees. In the worst case, with the growth of the number of attributes in the problem description, (i) the minimum depth of deterministic decision trees grows either almost as logarithm or linearly, (ii) the minimum depth of nondeterministic decision trees either is bounded from above by a constant or grows linearly, (iii) the minimum number of nodes in deterministic decision trees has either polynomial or exponential growth, and (iv) the minimum number of nodes in nondeterministic decision trees has either polynomial or exponential growth. Based on these results, we divide the set of all infinite binary information systems into five complexity classes, and study for each class issues related to time-space trade-off for decision trees.

鞍點 · SimPLe · 駐點 · 平穩的 · 冪法 ·

2021 年 11 月 28 日

Escape saddle points by a simple gradient-descent based algorithm

Chenyi Zhang,Tongyang Li

from arxiv, 34 pages, 8 figures, to appear in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or $\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

優化器 · 可約的 · 近似 · 控制器 · Principle ·

2020 年 6 月 29 日

Differential Dynamic Programming Neural Optimizer

Guan-Horng Liu,Tianrong Chen,Evangelos A. Theodorou

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.