亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We propose a non-intrusive, reduced-basis, and data-driven method for approximating both eigenvalues and eigenvectors in parametric eigenvalue problems. We generate the basis of the reduced space by applying the proper orthogonal decomposition (POD) approach on a collection of pre-computed, full-order snapshots at a chosen set of parameters. Then, we use Bayesian linear regression (a.k.a. Gaussian Process Regression) in the online phase to predict both eigenvalues and eigenvectors at new parameters. A split of the data generated in the offline phase into training and test data sets is utilized in the numerical experiments following standard practices in the field of supervised machine learning. Furthermore, we discuss the connection between Gaussian Process Regression and spline methods, and compare the performance of GPR method against linear and cubic spline methods. We show that GPR outperforms other methods for functions with a certain regularity. To this end, we discuss various different covariance functions which influence the performance of GPR. The proposed method is shown to be accurate and efficient for the approximation of multiple 1D and 2D affine and non-affine parameter-dependent eigenvalue problems that exhibit crossing of eigenvalues.

相關內容

高(gao)斯過程(Gaussian Process, GP)是(shi)(shi)概率(lv)論和數理統(tong)計(ji)中隨機(ji)過程(stochastic process)的一種,是(shi)(shi)一系列服從正態(tai)(tai)分(fen)布(bu)(bu)的隨機(ji)變量(liang)(liang)(random variable)在一指數集(index set)內(nei)的組(zu)合(he)(he)。 高(gao)斯過程中任(ren)意(yi)隨機(ji)變量(liang)(liang)的線(xian)性(xing)(xing)組(zu)合(he)(he)都(dou)服從正態(tai)(tai)分(fen)布(bu)(bu),每(mei)個有(you)限維(wei)分(fen)布(bu)(bu)都(dou)是(shi)(shi)聯合(he)(he)正態(tai)(tai)分(fen)布(bu)(bu),且其(qi)本身在連續指數集上的概率(lv)密(mi)度函(han)數即是(shi)(shi)所有(you)隨機(ji)變量(liang)(liang)的高(gao)斯測度,因此被視為(wei)聯合(he)(he)正態(tai)(tai)分(fen)布(bu)(bu)的無限維(wei)廣義延伸。高(gao)斯過程由其(qi)數學期望和協方(fang)差函(han)數完全決定,并繼承了(le)正態(tai)(tai)分(fen)布(bu)(bu)的諸多性(xing)(xing)質

Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.

A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by M\"uller et al. (2021), and derive convergence rates in both the noisy and noiseless settings.

Consider the Toeplitz matrix $T_n(f)$ generated by the symbol $f(\theta)=\hat{f}_r e^{\mathbf{i}r\theta}+\hat{f}_0+\hat{f}_{-s} e^{-\mathbf{i}s\theta}$, where $\hat{f}_r, \hat{f}_0, \hat{f}_{-s} \in \mathbb{C}$ and $0<r<n,~0<s<n$. For $r=s=1$ we have the classical tridiagonal Toeplitz matrices, for which the eigenvalues and eigenvectors are known. Similarly, the eigendecompositions are known for $1<r=s$, when the generated matrices are ``symmetrically sparse tridiagonal''. In the current paper we study the eigenvalues of $T_n(f)$ for $1\leq r<s$, which are ``non-symmetrically sparse tridiagonal''. We propose an algorithm which constructs one or two ad hoc matrices smaller than $T_n(f)$, whose eigenvalues are sufficient for determining the full spectrum of $T_n(f)$. The algorithm is explained through use of a conjecture for which examples and numerical experiments are reported for supporting it and for clarifying the presentation. Open problems are briefly discussed.

We develop a class of data-driven generative models that approximate the solution operator for parameter-dependent partial differential equations (PDE). We propose a novel probabilistic formulation of the operator learning problem based on recently developed generative denoising diffusion probabilistic models (DDPM) in order to learn the input-to-output mapping between problem parameters and solutions of the PDE. To achieve this goal we modify DDPM to supervised learning in which the solution operator for the PDE is represented by a class of conditional distributions. The probabilistic formulation combined with DDPM allows for an automatic quantification of confidence intervals for the learned solutions. Furthermore, the framework is directly applicable for learning from a noisy data set. We compare computational performance of the developed method with the Fourier Network Operators (FNO). Our results show that our method achieves comparable accuracy and recovers the noise magnitude when applied to data sets with outputs corrupted by additive noise.

Econometric models of strategic interactions among people or firms have received a great deal of attention in the literature. Less attention has been paid to the role of the underlying assumptions about the way agents form beliefs about other agents. We focus on a single large Bayesian game with idiosyncratic strategic neighborhoods and develop an approach of empirical modeling that relaxes the assumption of rational expectations and allows the players to form beliefs differently. By drawing on the main intuition of Kalai (2004), we introduce the notion of hindsight regret, which measures each player's ex-post value of other players' type information, and obtain the belief-free bound for the hindsight regret. Using this bound, we derive testable implications and develop a bootstrap inference procedure for the structural parameters. Our inference method is uniformly valid regardless of the size of strategic neighborhoods and tends to exhibit high power when the neighborhoods are large. We demonstrate the finite sample performance of the method through Monte Carlo simulations.

The relationship between the number of training data points, the number of parameters in a statistical model, and the generalization capabilities of the model has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime, and believe that the standard bias-variance trade-off holds in the under-parameterized regime. In this paper, we present a simple example that provably exhibits double descent in the under-parameterized regime. For simplicity, we look at the ridge regularized least squares denoising problem with data on a line embedded in high-dimension space. By deriving an asymptotically accurate formula for the generalization error, we observe sample-wise and parameter-wise double descent with the peak in the under-parameterized regime rather than at the interpolation point or in the over-parameterized regime. Further, the peak of the sample-wise double descent curve corresponds to a peak in the curve for the norm of the estimator, and adjusting $\mu$, the strength of the ridge regularization, shifts the location of the peak. We observe that parameter-wise double descent occurs for this model for small $\mu$. For larger values of $\mu$, we observe that the curve for the norm of the estimator has a peak but that this no longer translates to a peak in the generalization error. Moreover, we study the training error for this problem. The considered problem setup allows for studying the interaction between two regularizers. We provide empirical evidence that the model implicitly favors using the ridge regularizer over the input data noise regularizer. Thus, we show that even though both regularizers regularize the same quantity, i.e., the norm of the estimator, they are not equivalent.

We study the change point detection problem for high-dimensional linear regression models. The existing literature mainly focused on the change point estimation with stringent sub-Gaussian assumptions on the errors. In practice, however, there is no prior knowledge about the existence of a change point or the tail structures of errors. To address these issues, in this paper, we propose a novel tail-adaptive approach for simultaneous change point testing and estimation. The method is built on a new loss function which is a weighted combination between the composite quantile and least squared losses, allowing us to borrow information of the possible change points from both the conditional mean and quantiles. For the change point testing, based on the adjusted $L_2$-norm aggregation of a weighted score CUSUM process, we propose a family of individual testing statistics with different weights to account for the unknown tail structures. Combining the individual tests, a tail-adaptive test is further constructed that is powerful for sparse alternatives of regression coefficients' changes under various tail structures. For the change point estimation, a family of argmax-based individual estimators is proposed once a change point is detected. In theory, for both individual and tail-adaptive tests, the bootstrap procedures are proposed to approximate their limiting null distributions. Under some mild conditions, we justify the validity of the new tests in terms of size and power under the high-dimensional setup. The corresponding change point estimators are shown to be rate optimal up to a logarithm factor. Moreover, combined with the wild binary segmentation technique, a new algorithm is proposed to detect multiple change points in a tail-adaptive manner. Extensive numerical results are conducted to illustrate the appealing performance of the proposed method.

The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $\lambda_{max}$, the largest eigenvalue of the Hessian of the loss); and algorithms, such as Sharpness-Aware Minimization (SAM) [1], that directly optimize for flatness. Other works question the link between $\lambda_{max}$ and generalization. In this paper, we present findings that call $\lambda_{max}$'s influence on generalization further into question. We show that: (1) while larger learning rates reduce $\lambda_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $\lambda_{max}$ without affecting generalization; (3) while SAM produces smaller $\lambda_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $\lambda_{max}$; and (5) while batch-normalization does not consistently produce smaller $\lambda_{max}$, it nevertheless confers generalization benefits. While our experiments affirm the generalization benefits of large learning rates and SAM for minibatch SGD, the GD-SGD discrepancy demonstrates limits to $\lambda_{max}$'s ability to explain generalization in neural networks.

Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the original dataset, and how large are distilled datasets? In this work, we take a theoretical view on kernel ridge regression (KRR) based methods of dataset distillation such as Kernel Inducing Points. By transforming ridge regression in random Fourier features (RFF) space, we provide the first proof of the existence of small (size) distilled datasets and their corresponding excess risk for shift-invariant kernels. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. We further show that a KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data. The size of this set is linear in the dimension of the RFF space of the input set or alternatively near linear in the number of effective degrees of freedom, which is a function of the kernel, number of datapoints, and the regularization parameter $\lambda$. The error bound of this distilled set is also a function of $\lambda$. We verify our bounds analytically and empirically.

The Inverse-Wishart (IW) distribution is a standard and popular choice of priors for covariance matrices and has attractive properties such as conditional conjugacy. However, the IW family of priors has crucial drawbacks, including the lack of effective choices for non-informative priors. Several classes of priors for covariance matrices that alleviate these drawbacks, while preserving computational tractability, have been proposed in the literature. These priors can be obtained through appropriate scale mixtures of IW priors. However, the high-dimensional posterior consistency of models which incorporate such priors has not been investigated. We address this issue for the multi-response regression setting ($q$ responses, $n$ samples) under a wide variety of IW scale mixture priors for the error covariance matrix. Posterior consistency and contraction rates for both the regression coefficient matrix and the error covariance matrix are established in the ``large $q$, large $n$" setting under mild assumptions on the true data-generating covariance matrix and relevant hyperparameters. In particular, the number of responses $q_n$ is allowed to grow with $n$, but with $q_n = o(n)$. Also, some results related to the inconsistency of posterior mean for $q_n/n \to \gamma$, where $\gamma \in (0,\infty)$ are provided.

北京阿比特科技有限公司