亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider the problem of solving linear least squares problems in a framework where only evaluations of the linear map are possible. We derive randomized methods that do not need any other matrix operations than forward evaluations, especially no evaluation of the adjoint map is needed. Our method is motivated by the simple observation that one can get an unbiased estimate of the application of the adjoint. We show convergence of the method and then derive a more efficient method that uses an exact linesearch. This method, called random descent, resembles known methods in other context and has the randomized coordinate descent method as special case. We provide convergence analysis of the random descent method emphasizing the dependence on the underlying distribution of the random vectors. Furthermore we investigate the applicability of the method in the context of ill-posed inverse problems and show that the method can have beneficial properties when the unknown solution is rough. We illustrate the theoretical findings in numerical examples. One particular result is that the random descent method actually outperforms established transposed-free methods (TFQMR and CGS) in examples.

相關內容

We extend our study of the swarm-based gradient descent method for non-convex optimization, [Lu, Tadmor & Zenginoglu, arXiv:2211.17157], to allow random descent directions. We recall that the swarm-based approach consists of a swarm of agents, each identified with a position, ${\mathbf x}$, and mass, $m$. The key is the transfer of mass from high ground to low(-est) ground. The mass of an agent dictates its step size: lighter agents take larger steps. In this paper, the essential new feature is the choice of direction: rather than restricting the swarm to march in the steepest gradient descent, we let agents proceed in randomly chosen directions centered around -- but otherwise different from -- the gradient direction. The random search secures the descent property while at the same time, enabling greater exploration of ambient space. Convergence analysis and benchmark optimizations demonstrate the effectiveness of the swarm-based random descent method as a multi-dimensional global optimizer.

The crossed random-effects model is widely used in applied statistics, finding applications in various fields such as longitudinal studies, e-commerce, and recommender systems, among others. However, these models encounter scalability challenges, as the computational time grows disproportionately with the number of data points, typically following a cubic root relationship (N^(3/2) or worse) with N. Our inspiration for addressing this issue comes from observing the recommender system employed by an online clothing retailer. Our dataset comprises over 700,000 clients, 5,000 items, and 5,000,000 measurements. When applying the maximum likelihood approach to fit crossed random effects, computational inefficiency becomes a significant concern, limiting the applicability of this approach in large-scale settings. To tackle the scalability issues, previous research by Ghosh et al. (2022a) and Ghosh et al. (2022b) has explored linear and logistic regression models utilizing fixed-effect features based on client and item variables, while incorporating random intercept terms for clients and items. In this study, we present a more generalized version of the problem, allowing random effect sizes/slopes. This extension enables us to capture the variability in effect size among both clients and items. Importantly, we have developed a scalable solution to address the aforementioned problem and have empirically demonstrated the consistency of our estimates. Specifically, as the number of data points increases, our estimates converge towards the true parameters. To validate our approach, we implement the proposed algorithm using Stitch Fix data.

Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.

The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.

This paper considers the Cauchy problem for the nonlinear dynamic string equation of Kirchhoff-type with time-varying coefficients. The objective of this work is to develop a temporal discretization algorithm capable of approximating a solution to this initial-boundary value problem. To this end, a symmetric three-layer semi-discrete scheme is employed with respect to the temporal variable, wherein the value of a nonlinear term is evaluated at the middle node point. This approach enables the numerical solutions per temporal step to be obtained by inverting the linear operators, yielding a system of second-order linear ordinary differential equations. Local convergence of the proposed scheme is established, and it achieves quadratic convergence concerning the step size of the discretization of time on the local temporal interval. We have conducted several numerical experiments using the proposed algorithm for various test problems to validate its performance. It can be said that the obtained numerical results are in accordance with the theoretical findings.

In this paper, we study the weighted $k$-server problem on the uniform metric in both the offline and online settings. We start with the offline setting. In contrast to the (unweighted) $k$-server problem which has a polynomial-time solution using min-cost flows, there are strong computational lower bounds for the weighted $k$-server problem, even on the uniform metric. Specifically, we show that assuming the unique games conjecture, there are no polynomial-time algorithms with a sub-polynomial approximation factor, even if we use $c$-resource augmentation for $c < 2$. Furthermore, if we consider the natural LP relaxation of the problem, then obtaining a bounded integrality gap requires us to use at least $\ell$ resource augmentation, where $\ell$ is the number of distinct server weights. We complement these results by obtaining a constant-approximation algorithm via LP rounding, with a resource augmentation of $(2+\epsilon)\ell$ for any constant $\epsilon > 0$. In the online setting, an $\exp(k)$ lower bound is known for the competitive ratio of any randomized algorithm for the weighted $k$-server problem on the uniform metric. In contrast, we show that $2\ell$-resource augmentation can bring the competitive ratio down by an exponential factor to only $O(\ell^2 \log \ell)$. Our online algorithm uses the two-stage approach of first obtaining a fractional solution using the online primal-dual framework, and then rounding it online.

Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.

In inverse problems, one attempts to infer spatially variable functions from indirect measurements of a system. To practitioners of inverse problems, the concept of "information" is familiar when discussing key questions such as which parts of the function can be inferred accurately and which cannot. For example, it is generally understood that we can identify system parameters accurately only close to detectors, or along ray paths between sources and detectors, because we have "the most information" for these places. Although referenced in many publications, the "information" that is invoked in such contexts is not a well understood and clearly defined quantity. Herein, we present a definition of information density that is based on the variance of coefficients as derived from a Bayesian reformulation of the inverse problem. We then discuss three areas in which this information density can be useful in practical algorithms for the solution of inverse problems, and illustrate the usefulness in one of these areas -- how to choose the discretization mesh for the function to be reconstructed -- using numerical experiments.

Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

北京阿比特科技有限公司