亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

If $A$ and $B$ are sets such that $A \subset B$, generalisation may be understood as the inference from $A$ of a hypothesis sufficient to construct $B$. One might infer any number of hypotheses from $A$, yet only some of those may generalise to $B$. How can one know which are likely to generalise? One strategy is to choose the shortest, equating the ability to compress information with the ability to generalise (a proxy for intelligence). We examine this in the context of a mathematical formalism of enactive cognition. We show that compression is neither necessary nor sufficient to maximise performance (measured in terms of the probability of a hypothesis generalising). We formulate a proxy unrelated to length or simplicity, called weakness. We show that if tasks are uniformly distributed, then there is no choice of proxy that performs at least as well as weakness maximisation in all tasks while performing strictly better in at least one. In experiments comparing maximum weakness and minimum description length in the context of binary arithmetic, the former generalised at between $1.1$ and $5$ times the rate of the latter. We argue this demonstrates that weakness is a far better proxy, and explains why Deepmind's Apperception Engine is able to generalise effectively.

相關內容

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.

Finding a good approximation of the top eigenvector of a given $d\times d$ matrix $A$ is a basic and important computational problem, with many applications. We give two different quantum algorithms that, given query access to the entries of a Hermitian matrix $A$ and assuming a constant eigenvalue gap, output a classical description of a good approximation of the top eigenvector: one algorithm with time complexity $\mathcal{\tilde{O}}(d^{1.75})$ and one with time complexity $d^{1.5+o(1)}$ (the first algorithm has a slightly better dependence on the $\ell_2$-error of the approximating vector than the second, and uses different techniques of independent interest). Both of our quantum algorithms provide a polynomial speed-up over the best-possible classical algorithm, which needs $\Omega(d^2)$ queries to entries of $A$, and hence $\Omega(d^2)$ time. We extend this to a quantum algorithm that outputs a classical description of the subspace spanned by the top-$q$ eigenvectors in time $qd^{1.5+o(1)}$. We also prove a nearly-optimal lower bound of $\tilde{\Omega}(d^{1.5})$ on the quantum query complexity of approximating the top eigenvector. Our quantum algorithms run a version of the classical power method that is robust to certain benign kinds of errors, where we implement each matrix-vector multiplication with small and well-behaved error on a quantum computer, in different ways for the two algorithms. Our first algorithm estimates the matrix-vector product one entry at a time, using a new ``Gaussian phase estimation'' procedure. Our second algorithm uses block-encoding techniques to compute the matrix-vector product as a quantum state, from which we obtain a classical description by a new time-efficient unbiased pure-state tomography procedure.

We study stochastic approximation algorithms with Markovian noise and constant step-size $\alpha$. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between $\theta_n$ -- the value at iteration $n$ -- and $\theta^*$ -- the unique equilibrium of the corresponding ODE. We show that, under some smoothness conditions, this bias is of order $O(\alpha)$. Furthermore, we show that the time-averaged bias is equal to $\alpha V + O(\alpha^2)$, where $V$ is a constant characterized by a Lyapunov equation, showing that $\esp{\bar{\theta}_n} \approx \theta^*+V\alpha + O(\alpha^2)$, where $\bar{\theta}_n=(1/n)\sum_{k=1}^n\theta_k$ is the Polyak-Ruppert average. We also show that $\bar{\theta}_n$ converges with high probability around $\theta^*+\alpha V$. We illustrate how to combine this with Richardson-Romberg extrapolation to derive an iterative scheme with a bias of order $O(\alpha^2)$.

Given a graph $M,$ path eigenvalues are eigenvalues of its path matrix. The path energy of a simple graph $M$ is equal to the sum of the absolute values of the path eigenvalues of the graph $M$ (Shikare et. al, 2018). We have discovered new upper constraints on path energy in this study, expressed in terms of a graph's maximum degree. Additionally, a relationship between a graph's energy and path energy is given.

Minimum Weight Cycle (MWC) is the problem of finding a simple cycle of minimum weight in a graph $G=(V,E)$. This is a fundamental graph problem with classical sequential algorithms that run in $\tilde{O}(n^3)$ and $\tilde{O}(mn)$ time where $n=|V|$ and $m=|E|$. In recent years this problem has received significant attention in the context of fine-grained sequential complexity as well as in the design of faster sequential approximation algorithms, though not much is known in the distributed CONGEST model. We present sublinear-round approximation algorithms for computing MWC in directed graphs, and weighted graphs. Our algorithms use a variety of techniques in non-trivial ways, such as in our approximate directed unweighted MWC algorithm that efficiently computes BFS from all vertices restricted to certain implicitly computed neighborhoods in sublinear rounds, and in our weighted approximation algorithms that use unweighted MWC algorithms on scaled graphs combined with a fast and streamlined method for computing multiple source approximate SSSP. We present $\tilde{\Omega}(\sqrt{n})$ lower bounds for arbitrary constant factor approximation of MWC in directed graphs and undirected weighted graphs.

In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse $n \times n$ matrices, using $n$ computers. They assumed that the input matrices are uniformly sparse--there are at most $d$ non-zeros in each row and column--and the task is to compute a uniformly sparse part of the product matrix. The sparsity structure is globally known in advance (this is the supported setting). As input, each computer receives one row of each input matrix, and each computer needs to output one row of the product matrix. In each communication round each computer can send and receive one $O(\log n)$-bit message. Their algorithm solves this task in $O(d^{1.907})$ rounds, while the trivial bound is $O(d^2)$. We improve on the prior work in two dimensions: First, we show that we can solve the same task faster, in only $O(d^{1.832})$ rounds. Second, we explore what happens when matrices are not uniformly sparse. We consider the following alternative notions of sparsity: row-sparse matrices (at most $d$ non-zeros per row), column-sparse matrices, matrices with bounded degeneracy (we can recursively delete a row or column with at most $d$ non-zeros), average-sparse matrices (at most $dn$ non-zeros in total), and general matrices.

Given an undirected connected graph $G = (V(G), E(G))$ on $n$ vertices, the minimum Monitoring Edge-Geodetic Set (MEG-set) problem asks to find a subset $M \subseteq V(G)$ of minimum cardinality such that, for every edge $e \in E(G)$, there exist $x,y \in M$ for which all shortest paths between $x$ and $y$ in $G$ traverse $e$. We show that, for any constant $c < \frac{1}{2}$, no polynomial-time $(c \log n)$-approximation algorithm for the minimum MEG-set problem exists, unless $\mathsf{P} = \mathsf{NP}$.

We provide a self contained proof of a result of Dudley [Dud64]} which shows that a bounded convex-body in $\Re^d$ can be $\varepsilon$-approximated, by the intersection of $O_d\bigl(\varepsilon^{-(d-1)/2} \bigr)$ halfspaces, where $O_d$ hides constants that depends on $d$.

We provide a framework to analyze the convergence of discretized kinetic Langevin dynamics for $M$-$\nabla$Lipschitz, $m$-convex potentials. Our approach gives convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration schemes which are popular in the molecular dynamics and machine learning communities. Further, we introduce the property ``$\gamma$-limit convergent" (GLC) to characterize underdamped Langevin schemes that converge to overdamped dynamics in the high-friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement. Finally, we provide asymptotic bias estimates for the BAOAB scheme, which remain accurate in the high-friction limit by comparison to a modified stochastic dynamics which preserves the invariant measure.

The article presents a systematic study of the problem of conditioning a Gaussian random variable $\xi$ on nonlinear observations of the form $F \circ \phi(\xi)$ where $\phi: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $\xi \mid F\circ \phi(\xi)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.

北京阿比特科技有限公司