亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science and engineering, making efficient parallelization useful for a vast number of applications. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of $\frac{n}{\log n}$.

相關內容

Let $X$ be a $d$-dimensional simplicial complex. A function $F\colon X(k)\to \{0,1\}^k$ is said to be a direct product function if there exists a function $f\colon X(1)\to \{0,1\}$ such that $F(\sigma) = (f(\sigma_1), \ldots, f(\sigma_k))$ for each $k$-face $\sigma$. In an effort to simplify components of the PCP theorem, Goldreich and Safra introduced the problem of direct product testing, which asks whether one can test if $F\colon X(k)\to \{0,1\}^k$ is correlated with a direct product function by querying $F$ on only $2$ inputs. Dinur and Kaufman conjectured that there exist bounded degree complexes with a direct product test in the small soundness regime. We resolve their conjecture by showing that for all $\delta>0$, there exists a family of high-dimensional expanders with degree $O_{\delta}(1)$ and a $2$-query direct product tester with soundness $\delta$. We use the characterization given by a subset of the authors and independently by Dikstein and Dinur, who showed that some form of non-Abelian coboundary expansion (which they called "Unique-Games coboundary expansion") is a necessary and sufficient condition for a complex to admit such direct product testers. Our main technical contribution is a general technique for showing coboundary expansion of complexes with coefficients in a non-Abelian group. This allows us to prove that the high dimensional expanders constructed by Chapman and Lubotzky satisfies the necessary conditions, thus admitting a 2-query direct product tester with small soundness.

Recently it was shown that the so-called guided local Hamiltonian problem -- estimating the smallest eigenvalue of a $k$-local Hamiltonian when provided with a description of a quantum state ('guiding state') that is guaranteed to have substantial overlap with the true groundstate -- is BQP-complete for $k \geq 6$ when the required precision is inverse polynomial in the system size $n$, and remains hard even when the overlap of the guiding state with the groundstate is close to a constant $\left(\frac12 - \Omega\left(\frac{1}{\mathop{poly}(n)}\right)\right)$. We improve upon this result in three ways: by showing that it remains BQP-complete when i) the Hamiltonian is 2-local, ii) the overlap between the guiding state and target eigenstate is as large as $1 - \Omega\left(\frac{1}{\mathop{poly}(n)}\right)$, and iii) when one is interested in estimating energies of excited states, rather than just the groundstate. Interestingly, iii) is only made possible by first showing that ii) holds.

Motivated by applications in magnetic resonance relaxometry, we consider the following problem: Given samples of a function $t\mapsto \sum_{k=1}^K A_k\exp(-t\lambda_k)$, where $K\ge 2$ is an integer, $A_k\in\mathbb{R}$, $\lambda_k>0$ for $k=1,\cdots, K$, determine $K$, $A_k$'s and $\lambda_k$'s. Our approach is to transform this function into another function of the same form where $\lambda_k$'s are replaced by $i\lambda_k$. For this purpose, we study the least square approximation using polynomials weighted by the Gaussian weight, and use the fact that Hermite functions are eigenfunctions of the Fourier transform. We provide a detailed analysis of the effect of noise in the data.

Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.

Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we focus on the memory-efficient issue when context length $n$ is much greater than 128K ($n \gg 2^d$). Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$. It accomplishes this by constructing $U_1, U_2 \in \mathbb{R}^{n \times t}$ to expedite attention ${\sf Attn}(Q, K, V)$ computation within $n^{1+o(1)}$ time executions. Despite this, computing the approximated attention matrix $U_1U_2^\top \in \mathbb{R}^{n \times n}$ still necessitates $O(n^2)$ space, leading to significant memory usage. In response to these challenges, we introduce a new algorithm that only reads one pass of the data in a streaming fashion. This method employs sublinear space $o(n)$ to store three sketch matrices, alleviating the need for exact $K, V$ storage. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens. As the token length $n$ increases, our error guarantee diminishes while the memory usage remains nearly constant. This unique attribute underscores the potential of our technique in efficiently handling LLMs in streaming applications.

We consider approximating so-called tame functions, a class of nonsmooth, nonconvex functions, with piecewise polynomial functions. Tame functions appear in a wide range of applications: functions encountered in the training of deep neural networks with all common activations, value functions of mixed-integer programs, or wave functions of small molecules. We bound the quality of approximation of a tame function by a piecewise polynomial function with a given number of segments on any full-dimensional cube. We also present the first ever mixed-integer programming formulation of piecewise polynomial regression. Together, these can be used to estimate tame functions. We demonstrate promising computational results.

The class XNLP consists of (parameterized) problems that can be solved nondeterministically in $f(k)n^{O(1)}$ time and $f(k)\log n$ space, where $n$ is the size of the input instance and $k$ the parameter. The class XALP consists of problems that can be solved in the above time and space with access to an additional stack. These two classes are a "natural home" for many standard graph problems and their generalizations. In this paper, we show the hardness of several problems on planar graphs, parameterized by outerplanarity, treewidth and pathwidth, thus strengthening several existing results. In particular, we show the XNLP-hardness of the following problems parameterized by outerplanarity: All-or-Nothing Flow, Target Outdegree Orientation, Capacitated (Red-Blue) Dominating Set, Target Set Selections etc. We also show the XNLP-completeness of Scattered Set parameterized by pathwidth and XALP-completeness parameterized by treewidth and outerplanarity.

A popular and flexible time series model for counts is the generalized integer autoregressive process of order $p$, GINAR($p$). These Markov processes are defined using thinning operators evaluated on past values of the process along with a discretely-valued innovation process. This class includes the commonly used INAR($p$) process, defined with binomial thinning and Poisson innovations. GINAR processes can be used in a variety of settings, including modeling time series with low counts, and allow for more general mean-variance relationships, capturing both over- or under-dispersion. While there are many thinning operators and innovation processes given in the literature, less focus has been spent on comparing statistical inference and forecasting procedures over different choices of GINAR process. We provide an extensive study of exact and approximate inference and forecasting methods that can be applied to a wide class of GINAR($p$) processes with general thinning and innovation parameters. We discuss the challenges of exact estimation when $p$ is larger. We summarize and extend asymptotic results for estimators of process parameters, and present simulations to compare small sample performance, highlighting how different methods compare. We illustrate this methodology by fitting GINAR processes to a disease surveillance series.

A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptotic behavior of $T(n)$ is closely related to determining the zero-error capacity of the $(3/2)$-channel defined by Elias'88, and is a long-standing open problem in the area. Elias had shown that $T(n)\leq 2\times (3/2)^n$ and prior to our work the best upper bound was $T(n)\leq 0.6937 \times (3/2)^n$ due to Kurz'23. We improve this bound to $T(n)\leq c \times n^{-2/5}\times (3/2)^n$ where $c$ is an absolute constant.

We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.

北京阿比特科技有限公司