云南虫谷在线观看免费观看电视剧_中文字幕无码乱人伦漫画_日本国产品爽爽美女二级视频_色吧亚洲欧美另类_精品国产一区二区三区综合在线_你懂的黄色手机网站_五月婷婷之综合缴情

We study the internal dictionary matching (IDM) problem where a dictionary $\mathcal{D}$ containing $d$ substrings of a text $T$ is given, and each query concerns the occurrences of patterns in $\mathcal{D}$ in another substring of $T.$ We propose a novel $O(n)$-sized data structure named Basic Substring Structure (BASS) where $n$ is the length of the text $T.$ With BASS, we are able to handle all types of queries in the IDM problem in nearly optimal query and preprocessing time. Specifically, our results include: - The first algorithm that answers the *CountDistinct* query in $\tilde{O}(1)$ time with $\tilde{O}(n+d)$ preprocessing, where we need to compute the number of distinct patterns that exist in $T[i..j]$. Previously, the best result was $\tilde{O}(m)$ time per query after $\tilde{O}(n^2/m+d)$ or $\tilde{O}(nd/m+d)$ preprocessing, where $m$ is a chosen parameter. - Faster algorithms for two other types of internal queries. We improve the runtime for \textbf{(1)} Pattern counting (Count) queries to $O(\log n/\log\log n)$ time per query with $O(n+d\sqrt{\log n})$ preprocessing from $O(\log^2 n/\log\log n)$ time per query with $O(n\log n/\log \log n+d\log^{3/2} n)$ preprocessing. \textbf{(2)} Distinct pattern reporting (ReportDistinct) queries to $O(1+|\text{output}|)$ time per query from $O(\log n+|\text{output}|)$ per query. In addition, we match the optimal runtime in the remaining two types of queries, pattern existence (Exist), and pattern reporting (Report). We also show that BASS is more generally applicable to other internal query problems.

相關內容

優化器(qi)

關注 4

泛函 · 可辨認的 · CASE · 可理解性 · 塊 ·

2024 年 8 月 22 日

Quantum Sabotage Complexity

Arjan Cornelissen,Nikhil S. Mande,Subhasree Patro

from arxiv, 21 pages, 1 figure

Given a Boolean function $f:\{0,1\}^n\to\{0,1\}$, the goal in the usual query model is to compute $f$ on an unknown input $x \in \{0,1\}^n$ while minimizing the number of queries to $x$. One can also consider a "distinguishing" problem denoted by $f_{\mathsf{sab}}$: given an input $x \in f^{-1}(0)$ and an input $y \in f^{-1}(1)$, either all differing locations are replaced by a $*$, or all differing locations are replaced by $\dagger$, and an algorithm's goal is to identify which of these is the case while minimizing the number of queries. Ben-David and Kothari [ToC'18] introduced the notion of randomized sabotage complexity of a Boolean function to be the zero-error randomized query complexity of $f_{\mathsf{sab}}$. A natural follow-up question is to understand $\mathsf{Q}(f_{\mathsf{sab}})$, the quantum query complexity of $f_{\mathsf{sab}}$. In this paper, we initiate a systematic study of this. The following are our main results: $\bullet\;\;$ If we have additional query access to $x$ and $y$, then $\mathsf{Q}(f_{\mathsf{sab}})=O(\min\{\mathsf{Q}(f),\sqrt{n}\})$. $\bullet\;\;$ If an algorithm is also required to output a differing index of a 0-input and a 1-input, then $\mathsf{Q}(f_{\mathsf{sab}})=O(\min\{\mathsf{Q}(f)^{1.5},\sqrt{n}\})$. $\bullet\;\;$ $\mathsf{Q}(f_{\mathsf{sab}}) = \Omega(\sqrt{\mathsf{fbs}(f)})$, where $\mathsf{fbs}(f)$ denotes the fractional block sensitivity of $f$. By known results, along with the results in the previous bullets, this implies that $\mathsf{Q}(f_{\mathsf{sab}})$ is polynomially related to $\mathsf{Q}(f)$. $\bullet\;\;$ The bound above is easily seen to be tight for standard functions such as And, Or, Majority and Parity. We show that when $f$ is the Indexing function, $\mathsf{Q}(f_{\mathsf{sab}})=\Theta(\mathsf{fbs}(f))$, ruling out the possibility that $\mathsf{Q}(f_{\mathsf{sab}})=\Theta(\sqrt{\mathsf{fbs}(f)})$ for all $f$.

SMC · 蒙特卡羅 · Performer · 估計/估計量 · 近似 ·

2024 年 8 月 22 日

Optimised Annealed Sequential Monte Carlo Samplers

Saifuddin Syed,Alexandre Bouchard-C?té,Kevin Chern,Arnaud Doucet

from arxiv, 65 pages, 7 figures

Annealed Sequential Monte Carlo (SMC) samplers are special cases of SMC samplers where the sequence of distributions can be embedded in a smooth path of distributions. Using this underlying path of distributions and a performance model based on the variance of the normalisation constant estimator, we systematically study dense schedule and large particle limits. From our theory and adaptive methods emerges a notion of global barrier capturing the inherent complexity of normalisation constant approximation under our performance model. We then turn the resulting approximations into surrogate objective functions of algorithm performance, and use them for methodology development. We obtain novel adaptive methodologies, Sequential SMC (SSMC) and Sequential AIS (SAIS) samplers, which address practical difficulties inherent in previous adaptive SMC methods. First, our SSMC algorithms are predictable: they produce a sequence of increasingly precise estimates at deterministic and known times. Second, SAIS, a special case of SSMC, enables schedule adaptation at a memory cost constant in the number of particles and require much less communication. Finally, these characteristics make SAIS highly efficient on GPUs. We develop an open-source, high-performance GPU implementation based on our methodology and demonstrate up to a hundred-fold speed improvement compared to state-of-the-art adaptive AIS methods.

可約的 · 自助法/自舉法 · Pivotal（公司） · 圖 · 稀疏化 ·

2024 年 8 月 21 日

Bootstrapping Dynamic APSP via Sparsification

Rasmus Kyng,Simon Meierhans,Gernot Z?cklein

We give a simple algorithm for the dynamic approximate All-Pairs Shortest Paths (APSP) problem. Given a graph $G = (V, E, l)$ with polynomially bounded edge lengths, our data structure processes $|E|$ edge insertions and deletions in total time $|E|^{1 + o(1)}$ and provides query access to $|E|^{o(1)}$-approximate distances in time $\tilde{O}(1)$ per query. We produce a data structure that mimics Thorup-Zwick distance oracles [TZ'05], but is dynamic and deterministic. Our algorithm selects a small number of pivot vertices. Then, for every other vertex, it reduces distance computation to maintaining distances to a small neighborhood around that vertex and to the nearest pivot. We maintain distances between pivots efficiently by representing them in a smaller graph and recursing. We construct these smaller graphs by (a) reducing vertex count using the dynamic distance-preserving core graphs of Kyng-Meierhans-Probst Gutenberg [KMPG'24] in a black-box manner and (b) reducing edge-count using a dynamic spanner akin to Chen-Kyng-Liu-Meierhans-Probst Gutenberg [CKL+'24]. Our dynamic spanner internally uses an APSP data structure. Choosing a large enough size reduction factor in the first step allows us to simultaneously bootstrap our spanner and a dynamic APSP data structure. Notably, our approach does not need expander graphs, an otherwise ubiquitous tool in derandomization.

等變 · Projection · Performer · Learning · Sphering ·

2024 年 8 月 19 日

Reverse Map Projections as Equivariant Quantum Embeddings

Max Arnott,Dimitri Papaioannou,Kieran McDowall,Phalgun Lolur,Bambordé Baldé

We introduce the novel class $(E_\alpha)_{\alpha \in [-\infty,1)}$ of reverse map projection embeddings, each one defining a unique new method of encoding classical data into quantum states. Inspired by well-known map projections from the unit sphere onto its tangent planes, used in practice in cartography, these embeddings address the common drawback of the amplitude embedding method, wherein scalar multiples of data points are identified and information about the norm of data is lost. We show how reverse map projections can be utilised as equivariant embeddings for quantum machine learning. Using these methods, we can leverage symmetries in classical datasets to significantly strengthen performance on quantum machine learning tasks. Finally, we select four values of $\alpha$ with which to perform a simple classification task, taking $E_\alpha$ as the embedding and experimenting with both equivariant and non-equivariant setups. We compare their results alongside those of standard amplitude embedding.

正交 · 核化 · 圖 · 向量化 · 算法與數據結構 ·

2024 年 8 月 15 日

Kernelization for Orthogonality Dimension

Ishay Haviv,Dror Rabinovich

from arxiv, 29 pages

The orthogonality dimension of a graph over $\mathbb{R}$ is the smallest integer $d$ for which one can assign to every vertex a nonzero vector in $\mathbb{R}^d$ such that every two adjacent vertices receive orthogonal vectors. For an integer $d$, the $d$-Ortho-Dim$_\mathbb{R}$ problem asks to decide whether the orthogonality dimension of a given graph over $\mathbb{R}$ is at most $d$. We prove that for every integer $d \geq 3$, the $d$-Ortho-Dim$_\mathbb{R}$ problem parameterized by the vertex cover number $k$ admits a kernel with $O(k^{d-1})$ vertices and bit-size $O(k^{d-1} \cdot \log k)$. We complement this result by a nearly matching lower bound, showing that for any $\varepsilon > 0$, the problem admits no kernel of bit-size $O(k^{d-1-\varepsilon})$ unless $\mathsf{NP} \subseteq \mathsf{coNP/poly}$. We further study the kernelizability of orthogonality dimension problems in additional settings, including over general fields and under various structural parameterizations.

Learning · 貝葉斯推斷 · 推斷 · 近似 · 情景 ·

2024 年 8 月 13 日

Transformers Can Do Bayesian Inference

Samuel Müller,Noah Hollmann,Sebastian Pineda Arango,Josif Grabocka,Frank Hutter

from arxiv, Published at ICLR 2022

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at //github.com/automl/TransformersCanDoBayesianInference.

列 · Facebook AI Research · 情景 · Extensibility · 近似 ·

2024 年 8 月 12 日

Fair Column Subset Selection

Antonis Matakos,Bruno Ordozgoiti,Suhas Thejaswi

from arxiv, KDD 2024

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the matrix rows are partitioned into two groups, and the goal is to choose a subset of columns that minimizes the maximum reconstruction error of both groups, relative to their respective best rank-k approximation. Extending the known results of column subset selection to this fair setting is not straightforward: in certain scenarios it is unavoidable to choose columns separately for each group, resulting in double the expected column count. We propose a deterministic leverage-score sampling strategy for the fair setting and show that sampling a column subset of minimum size becomes NP-hard in the presence of two groups. Despite these negative results, we give an approximation algorithm that guarantees a solution within 1.5 times the optimal solution size. We also present practical heuristic algorithms based on rank-revealing QR factorization. Finally, we validate our methods through an extensive set of experiments using real-world data.

Principle ·

2024 年 8 月 9 日

Modular Deutsch Entropic Uncertainty Principle

K. Mahesh Krishna

from arxiv, 4 Pages, 0 Figures

Khosravi, Drnov\v{s}ek and Moslehian [\textit{Filomat, 2012}] derived Buzano inequality for Hilbert C*-modules. Using this inequality we derive Deutsch entropic uncertainty principle for Hilbert C*-modules over commutative unital C*-algebras.

類別 · 泛函 · 數學 · Processing（編程語言） · Python ·

2024 年 7 月 24 日

Implementing a Restricted Function Space Class in Firedrake

Emma Rothwell

from arxiv, MSci Research Project, 51 pages, 19 figures

The implementation process of a $\texttt{RestrictedFunctionSpace}$ class in Firedrake, a Python library which numerically solves partial differential equations through the use of the finite element method, is documented. This includes an introduction to the current $\texttt{FunctionSpace}$ class in Firedrake, and the key features that it has. With the current $\texttt{FunctionSpace}$ class, the limitations of the capabilities of the solvers in Firedrake when imposing Dirichlet boundary conditions are explored, as well as what the $\texttt{RestrictedFunctionSpace}$ class does differently to remove these issues. These will be considered in both a mathematical way, and in the code as an abstraction of the mathematical ideas presented. Finally, the benefits to the user of the $\texttt{RestrictedFunctionSpace}$ class are considered, and demonstrated through tests and comparisons. This leads to the conclusion that in particular, the eigensolver in Firedrake is improved through the use of the $\texttt{RestrictedFunctionSpace}$, through the removal of eigenvalues associated with the Dirichlet boundary conditions for a system.

長短期記憶網絡 · 命名實體識別 · MoDELS · Better · 門控 ·

2018 年 5 月 15 日

Chinese NER Using Lattice LSTM

Yue Zhang,Jie Yang

from arxiv, Accepted at ACL 2018 as Long paper

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.