亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sketch-and-precondition techniques are efficient and popular for solving large least squares (LS) problems of the form $Ax=b$ with $A\in\mathbb{R}^{m\times n}$ and $m\gg n$. This is where $A$ is ``sketched" to a smaller matrix $SA$ with $S\in\mathbb{R}^{\lceil cn\rceil\times m}$ for some constant $c>1$ before an iterative LS solver computes the solution to $Ax=b$ with a right preconditioner $P$, where $P$ is constructed from $SA$. Prominent sketch-and-precondition LS solvers are Blendenpik and LSRN. We show that the sketch-and-precondition technique in its most commonly used form is not numerically stable for ill-conditioned LS problems. For provable and practical backward stability and optimal residuals, we suggest using an unpreconditioned iterative LS solver on $(AP)z=b$ with $x=Pz$. Provided the condition number of $A$ is smaller than the reciprocal of the unit round-off, we show that this modification ensures that the computed solution has a backward error comparable to the iterative LS solver applied to a well-conditioned matrix. Using smoothed analysis, we model floating-point rounding errors to argue that our modification is expected to compute a backward stable solution even for arbitrarily ill-conditioned LS problems. Additionally, we provide experimental evidence that using the sketch-and-solve solution as a starting vector in sketch-and-precondition algorithms (as suggested by Rokhlin and Tygert in 2008) should be highly preferred over the zero vector. The initialization often results in much more accurate solutions -- albeit not always backward stable ones.

相關內容

A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. We also explore the tightness of this upper bound in different regimes. Finally, we propose a new estimator for the KL divergence and empirically show that it provides a tight approximation through a few examples.

The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then employed to estimate the source model related parameters. In reality, however, the spectral coefficients from different frequency bands may be dependent, which is not considered in the existing ILRMA algorithm. This paper presents an improved version of ILRMA, which considers the dependency between the spectral coefficients from different frequency bands. The Sinkhorn divergence is then exploited to optimize the source model parameters. As a result of using the cross-band information, the BSS performance is improved. But the number of parameters to be estimated also increases significantly, and so is the computational complexity. To reduce the algorithm complexity, we apply the Kronecker product to decompose the modeling matrix into the product of a number of matrices of much smaller dimensionality. An efficient algorithm is then developed to implement the Sinkhorn divergence based BSS algorithm and the complexity is reduced by an order of magnitude.

We compute explicitly the MTW tensor (or cross curvature) for the optimal transport problem on $\mathbb{R}^n$ with a cost function of form $\mathsf{c}(x, y) = \mathsf{u}(x^{\mathfrak{t}}y)$, where $\mathsf{u}$ is a scalar function with inverse $\mathsf{s}$, $x^{\ft}y$ is a nondegenerate bilinear pairing of vectors $x, y$ belonging to an open subset of $\mathbb{R}^n$. The condition that the MTW-tensor vanishes on null vectors under the Kim-McCann metric is a fourth-order nonlinear ODE, which could be reduced to a linear ODE of the form $\mathsf{s}^{(2)} - S\mathsf{s}^{(1)} + P\mathsf{s} = 0$ with constant coefficients $P$ and $S$. The resulting inverse functions include {\it Lambert} and {\it generalized inverse hyperbolic\slash trigonometric} functions. The square Euclidean metric and $\log$-type costs are equivalent to instances of these solutions. The optimal map for the family is also explicit. For cost functions of a similar form on a hyperboloid model of the hyperbolic space and unit sphere, we also express this tensor in terms of algebraic expressions in derivatives of $\mathsf{s}$ using the Gauss-Codazzi equation, obtaining new families of strictly regular costs for these manifolds, including new families of {\it power function costs}. We analyze the $\sinh$-type hyperbolic cost, providing examples of $\mathsf{c}$-convex functions and divergence.

The generalized inverse Gaussian, denoted $\mathrm{GIG}(p, a, b)$, is a flexible family of distributions that includes the gamma, inverse gamma, and inverse Gaussian distributions as special cases. In this article, we derive two novel mixture representations for the $\mathrm{GIG}(p, a, b)$: one that expresses the distribution as a continuous mixture of inverse Gaussians and another one that expresses it as a continuous mixture of truncated exponentials. Beyond their conceptual interest, these representations are useful for random number generation. We use the first representation to derive a geometrically ergodic Gibbs sampler whose stationary distribution is $\mathrm{GIG}(p, a, b)$, and the second one to define a recursive algorithm to generate exact independent draws from the distribution for half-integer $p$. Additionally, the second representation gives rise to a recursive algorithm for evaluating the cumulative distribution function of the $\mathrm{GIG}(p, a, b)$ for half-integer $p$. The algorithms are simple and can be easily implemented in standard programming languages.

In this paper, we address the problem of testing exchangeability of a sequence of random variables, $X_1, X_2,\cdots$. This problem has been studied under the recently popular framework of testing by betting. But the mapping of testing problems to game is not one to one: many games can be designed for the same test. Past work established that it is futile to play single game betting on every observation: test martingales in the data filtration are powerless. Two avenues have been explored to circumvent this impossibility: betting in a reduced filtration (wealth is a test martingale in a coarsened filtration), or playing many games in parallel (wealth is an e-process in the data filtration). The former has proved to be difficult to theoretically analyze, while the latter only works for binary or discrete observation spaces. Here, we introduce a different approach that circumvents both drawbacks. We design a new (yet simple) game in which we observe the data sequence in pairs. Despite the fact that betting on individual observations is futile, we show that betting on pairs of observations is not. To elaborate, we prove that our game leads to a nontrivial test martingale, which is interesting because it has been obtained by shrinking the filtration very slightly. We show that our test controls type-1 error despite continuous monitoring, and achieves power one for both binary and continuous observations, under a broad class of alternatives. Due to the shrunk filtration, optional stopping is only allowed at even stopping times, not at odd ones: a relatively minor price. We provide a wide array of simulations that align with our theoretical findings.

For a matrix $A$ which satisfies Crouzeix's conjecture, we construct several classes of matrices from $A$ for which the conjecture will also hold. We discover a new link between cyclicity and Crouzeix's conjecture, which shows that Crouzeix's Conjecture holds in full generality if and only if it holds for the differentiation operator on a class of analytic functions. We pose several open questions, which if proved, will prove Crouzeix's conjecture. We also begin an investigation into Crouzeix's conjecture for symmetric matrices and in the case of $3 \times 3$ matrices, we show Crouzeix's conjecture holds for symmetric matrices if and only if it holds for analytic truncated Toeplitz operators.

We propose a new density estimation algorithm. Given $n$ i.i.d. samples from a distribution belonging to a class of densities on $\mathbb{R}^d$, our estimator outputs any density in the class whose ''perceptron discrepancy'' with the empirical distribution is at most $O(\sqrt{d/n})$. The perceptron discrepancy between two distributions is defined as the largest difference in mass that they place on any halfspace of $\mathbb{R}^d$. It is shown that this estimator achieves expected total variation distance to the truth that is almost minimax optimal over the class of densities with bounded Sobolev norm and Gaussian mixtures. This suggests that regularity of the prior distribution could be an explanation for the efficiency of the ubiquitous step in machine learning that replaces optimization over large function spaces with simpler parametric classes (e.g. in the discriminators of GANs). We generalize the above to show that replacing the ''perceptron discrepancy'' with the generalized energy distance of Sz\'ekeley-Rizzo further improves total variation loss. The generalized energy distance between empirical distributions is easily computable and differentiable, thus making it especially useful for fitting generative models. To the best of our knowledge, it is the first example of a distance with such properties for which there are minimax statistical guarantees.

We propose a method for computing the Lyapunov exponents of renewal equations (delay equations of Volterra type) and of coupled systems of renewal and delay differential equations. The method consists in the reformulation of the delay equation as an abstract differential equation, the reduction of the latter to a system of ordinary differential equations via pseudospectral collocation, and the application of the standard discrete QR method. The effectiveness of the method is shown experimentally and a MATLAB implementation is provided.

The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular subspaces of a matrix change when subjected to a small perturbation. This classic result is sharp in the worst case scenario. In this paper, we prove a stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the perturbation is a Gaussian random matrix. Under certain structural assumptions, we obtain an optimal bound that significantly improves upon the classic Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new perturbation bound for the singular values, which may be of independent interest.

We present and analyze an algorithm designed for addressing vector-valued regression problems involving possibly infinite-dimensional input and output spaces. The algorithm is a randomized adaptation of reduced rank regression, a technique to optimally learn a low-rank vector-valued function (i.e. an operator) between sampled data via regularized empirical risk minimization with rank constraints. We propose Gaussian sketching techniques both for the primal and dual optimization objectives, yielding Randomized Reduced Rank Regression (R4) estimators that are efficient and accurate. For each of our R4 algorithms we prove that the resulting regularized empirical risk is, in expectation w.r.t. randomness of a sketch, arbitrarily close to the optimal value when hyper-parameteres are properly tuned. Numerical expreriments illustrate the tightness of our bounds and show advantages in two distinct scenarios: (i) solving a vector-valued regression problem using synthetic and large-scale neuroscience datasets, and (ii) regressing the Koopman operator of a nonlinear stochastic dynamical system.

北京阿比特科技有限公司