亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Given $\mathbf A \in \mathbb{R}^{n \times n}$ with entries bounded in magnitude by $1$, it is well-known that if $S \subset [n] \times [n]$ is a uniformly random subset of $\tilde{O} (n/\epsilon^2)$ entries, and if ${\mathbf A}_S$ equals $\mathbf A$ on the entries in $S$ and is zero elsewhere, then $\|\mathbf A - \frac{n^2}{s} \cdot {\mathbf A}_S\|_2 \le \epsilon n$ with high probability, where $\|\cdot\|_2$ is the spectral norm. We show that for positive semidefinite (PSD) matrices, no randomness is needed at all in this statement. Namely, there exists a fixed subset $S$ of $\tilde{O} (n/\epsilon^2)$ entries that acts as a universal sparsifier: the above error bound holds simultaneously for every bounded entry PSD matrix $\mathbf A \in \mathbb{R}^{n \times n}$. One can view this result as a significant extension of a Ramanujan expander graph, which sparsifies any bounded entry PSD matrix, not just the all ones matrix. We leverage the existence of such universal sparsifiers to give the first deterministic algorithms for several central problems related to singular value computation that run in faster than matrix multiplication time. We also prove universal sparsification bounds for non-PSD matrices, showing that $\tilde{O} (n/\epsilon^4)$ entries suffices to achieve error $\epsilon \cdot \max(n,\|\mathbf A\|_1)$, where $\|\mathbf A\|_1$ is the trace norm. We prove that this is optimal up to an $\tilde{O} (1/\epsilon^2)$ factor. Finally, we give an improved deterministic spectral approximation algorithm for PSD $\mathbf A$ with entries lying in $\{-1,0,1\}$, which we show is nearly information-theoretically optimal.

相關內容

Ferrers diagram rank-metric codes were introduced by Etzion and Silberstein in 2009. In their work, they proposed a conjecture on the largest dimension of a space of matrices over a finite field whose nonzero elements are supported on a given Ferrers diagram and all have rank lower bounded by a fixed positive integer $d$. Since stated, the Etzion-Silberstein conjecture has been verified in a number of cases, often requiring additional constraints on the field size or on the minimum rank $d$ in dependence of the corresponding Ferrers diagram. As of today, this conjecture still remains widely open. Using modular methods, we give a constructive proof of the Etzion-Silberstein conjecture for the class of strictly monotone Ferrers diagrams, which does not depend on the minimum rank $d$ and holds over every finite field. In addition, we leverage on the last result to also prove the conjecture for the class of MDS-constructible Ferrers diagrams, without requiring any restriction on the field size.

It is well known that the Euler method for approximating the solutions of a random ordinary differential equation $\mathrm{d}X_t/\mathrm{d}t = f(t, X_t, Y_t)$ driven by a stochastic process $\{Y_t\}_t$ with $\theta$-H\"older sample paths is estimated to be of strong order $\theta$ with respect to the time step, provided $f=f(t, x, y)$ is sufficiently regular and with suitable bounds. Here, it is proved that, in many typical cases, further conditions on the noise can be exploited so that the strong convergence is actually of order 1, regardless of the H\"older regularity of the sample paths. This applies for instance to additive or multiplicative It\^o process noises (such as Wiener, Ornstein-Uhlenbeck, and geometric Brownian motion processes); to point-process noises (such as Poisson point processes and Hawkes self-exciting processes, which even have jump-type discontinuities); and to transport-type processes with sample paths of bounded variation. The result is based on a novel approach, estimating the global error as an iterated integral over both large and small mesh scales, and switching the order of integration to move the critical regularity to the large scale. The work is complemented with numerical simulations illustrating the strong order 1 convergence in those cases, and with an example with fractional Brownian motion noise with Hurst parameter $0 < H < 1/2$ for which the order of convergence is $H + 1/2$, hence lower than the attained order 1 in the examples above, but still higher than the order $H$ of convergence expected from previous works.

The goal of this paper is to understand how exponential-time approximation algorithms can be obtained from existing polynomial-time approximation algorithms, existing parameterized exact algorithms, and existing parameterized approximation algorithms. More formally, we consider a monotone subset minimization problem over a universe of size $n$ (e.g., Vertex Cover or Feedback Vertex Set). We have access to an algorithm that finds an $\alpha$-approximate solution in time $c^k \cdot n^{O(1)}$ if a solution of size $k$ exists (and more generally, an extension algorithm that can approximate in a similar way if a set can be extended to a solution with $k$ further elements). Our goal is to obtain a $d^n \cdot n^{O(1)}$ time $\beta$-approximation algorithm for the problem with $d$ as small as possible. That is, for every fixed $\alpha,c,\beta \geq 1$, we would like to determine the smallest possible $d$ that can be achieved in a model where our problem-specific knowledge is limited to checking the feasibility of a solution and invoking the $\alpha$-approximate extension algorithm. Our results completely resolve this question: (1) For every fixed $\alpha,c,\beta \geq 1$, a simple algorithm (``approximate monotone local search'') achieves the optimum value of $d$. (2) Given $\alpha,c,\beta \geq 1$, we can efficiently compute the optimum $d$ up to any precision $\varepsilon > 0$. Earlier work presented algorithms (but no lower bounds) for the special case $\alpha = \beta = 1$ [Fomin et al., J. ACM 2019] and for the special case $\alpha = \beta > 1$ [Esmer et al., ESA 2022]. Our work generalizes these results and in particular confirms that the earlier algorithms are optimal in these special cases.

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation -- effective Minkowski dimension. We prove that, the sample complexity of deep nonparametric regression only depends on the effective Minkowski dimension of $\mathcal{S}$ denoted by $p$. We further illustrate our theoretical findings by considering nonparametric regression with an anisotropic Gaussian random design $N(0,\Sigma)$, where $\Sigma$ is full rank. When the eigenvalues of $\Sigma$ have an exponential or polynomial decay, the effective Minkowski dimension of such an Gaussian random design is $p=\mathcal{O}(\sqrt{\log n})$ or $p=\mathcal{O}(n^\gamma)$, respectively, where $n$ is the sample size and $\gamma\in(0,1)$ is a small constant depending on the polynomial decay rate. Our theory shows that, when the manifold assumption does not hold, deep neural networks can still adapt to the effective Minkowski dimension of the data, and circumvent the curse of the ambient dimensionality for moderate sample sizes.

Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and covariance based non-contrastive criteria that can be related algebraically and shown to be equivalent under limited assumptions, we show how close those families can be. We further study popular methods and introduce variations of them, allowing us to relate this theoretical result to current practices and show the influence (or lack thereof) of design choices on downstream performance. Motivated by our equivalence result, we investigate the low performance of SimCLR and show how it can match VICReg's with careful hyperparameter tuning, improving significantly over known baselines. We also challenge the popular assumption that non-contrastive methods need large output dimensions. Our theoretical and quantitative results suggest that the numerical gaps between contrastive and non-contrastive methods in certain regimes can be closed given better network design choices and hyperparameter tuning. The evidence shows that unifying different SOTA methods is an important direction to build a better understanding of self-supervised learning.

Motivated by questions in theoretical computer science and quantum information theory, we study the classical problem of determining linear spaces of matrices of bounded rank. Spaces of bounded rank three were classified in 1983, and it has been a longstanding problem to classify spaces of bounded rank four. Before our study, no non-classical example of such a space was known. We exhibit two non-classical examples of such spaces and give the full classification of basic spaces of bounded rank four. There are exactly four such up to isomorphism. We also take steps to bring together the methods of the linear algebra community and the algebraic geometry community used to study spaces of bounded rank.

Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality. Cross approximation-originally developed for representing a matrix from a set of selected rows and columns-is an efficient method for constructing a tensor train decomposition of a tensor from few of its entries. While tensor train cross approximation has achieved remarkable performance in practical applications, its theoretical analysis, in particular regarding the error of the approximation, is so far lacking. To our knowledge, existing results only provide element-wise approximation accuracy guarantees, which lead to a very loose bound when extended to the entire tensor. In this paper, we bridge this gap by providing accuracy guarantees in terms of the entire tensor for both exact and noisy measurements. Our results illustrate how the choice of selected subtensors affects the quality of the cross approximation and that the approximation error caused by model error and/or measurement error may not grow exponentially with the order of the tensor. These results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors, such as those encountered in the description of quantum many-body states.

In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.

We study the complexity of randomized computation of integrals depending on a parameter, with integrands from Sobolev spaces. That is, for $r,d_1,d_2\in{\mathbb N}$, $1\le p,q\le \infty$, $D_1= [0,1]^{d_1}$, and $D_2= [0,1]^{d_2}$ we are given $f\in W_p^r(D_1\times D_2)$ and we seek to approximate $$ Sf=\int_{D_2}f(s,t)dt\quad (s\in D_1), $$ with error measured in the $L_q(D_1)$-norm. Our results extend previous work of Heinrich and Sindambiwe (J.\ Complexity, 15 (1999), 317--341) for $p=q=\infty$ and Wiegand (Shaker Verlag, 2006) for $1\le p=q<\infty$. Wiegand's analysis was carried out under the assumption that $W_p^r(D_1\times D_2)$ is continuously embedded in $C(D_1\times D_2)$ (embedding condition). We also study the case that the embedding condition does not hold. For this purpose a new ingredient is developed -- a stochastic discretization technique. The paper is based on Part I, where vector valued mean computation -- the finite-dimensional counterpart of parametric integration -- was studied. In Part I a basic problem of Information-Based Complexity on the power of adaption for linear problems in the randomized setting was solved. Here a further aspect of this problem is settled.

北京阿比特科技有限公司