亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider stochastic approximations of sampling algorithms, such as Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD). We observe that the noise introduced by the stochastic approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the stochastic approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms. For SGLD, we prove the first stable convergence rate in KL divergence without requiring uniform warm start, assuming the target density satisfies a Log-Sobolev Inequality. Our result implies superior first-order oracle complexity compared to prior works, under significantly milder assumptions. We also prove the first guarantees for SGLD under even weaker conditions such as H\"{o}lder smoothness and Poincare Inequality, thus bridging the gap between the state-of-the-art guarantees for LMC and SGLD. Our analysis motivates a new algorithm called covariance correction, which corrects for the additional noise introduced by the stochastic approximation by rescaling the strength of the diffusion. Finally, we apply our techniques to analyze RBM, and significantly improve upon the guarantees in prior works (such as removing exponential dependence on horizon), under minimal assumptions.

相關內容

Variational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We argue that for both approaches, the quality of the learned approximations can heavily suffer from the respective design choices: By updating the individual components using samples from the mixture model, iBayes-GMM often fails to produce meaningful updates to low-weight components, and by using a zero-order method for estimating the natural gradient, VIPS scales badly to higher-dimensional problems. Furthermore, we show that information-geometric trust-regions (used by VIPS) are effective even when using first-order natural gradient estimates, and often outperform the improved Bayesian learning rule (iBLR) update used by iBayes-GMM. We systematically evaluate the effects of design choices and show that a hybrid approach significantly outperforms both prior works. Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports 432 different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner.

Diffusion models are a powerful class of generative models that can produce high-quality images, but they may suffer from data bias. Data bias occurs when the training data does not reflect the true distribution of the data domain, but rather exhibits some skewed or imbalanced patterns. For example, the CelebA dataset contains more female images than male images, which can lead to biased generation results and affect downstream applications. In this paper, we propose a novel method to mitigate data bias in diffusion models by applying manifold guidance. Our key idea is to estimate the manifold of the training data using a learnable information-theoretic approach, and then use it to guide the sampling process of diffusion models. In this way, we can encourage the generated images to be uniformly distributed on the data manifold, without changing the model architecture or requiring labels or retraining. We provide theoretical analysis and empirical evidence to show that our method can improve the quality and unbiasedness of image generation compared to standard diffusion models.

The identification of choice models is crucial for understanding consumer behavior, designing marketing policies, and developing new products. The identification of parametric choice-based demand models, such as the multinomial choice model (MNL), is typically straightforward. However, nonparametric models, which are highly effective and flexible in explaining customer choices, may encounter the curse of the dimensionality and lose their identifiability. For example, the ranking-based model, which is a nonparametric model and designed to mirror the random utility maximization (RUM) principle, is known to be nonidentifiable from the collection of choice probabilities alone. In this paper, we develop a new class of nonparametric models that is not subject to the problem of nonidentifiability. Our model assumes bounded rationality of consumers, which results in symmetric demand cannibalization and intriguingly enables full identification. That is to say, we can uniquely construct the model based on its observed choice probabilities over assortments. We further propose an efficient estimation framework using a combination of column generation and expectation-maximization algorithms. Using a real-world data, we show that our choice model demonstrates competitive prediction accuracy compared to the state-of-the-art benchmarks, despite incorporating the assumption of bounded rationality which could, in theory, limit the representation power of our model.

In many stochastic service systems, decision-makers find themselves making a sequence of decisions, with the number of decisions being unpredictable. To enhance these decisions, it is crucial to uncover the causal impact these decisions have through careful analysis of observational data from the system. However, these decisions are not made independently, as they are shaped by previous decisions and outcomes. This phenomenon is called sequential bias and violates a key assumption in causal inference that one person's decision does not interfere with the potential outcomes of another. To address this issue, we establish a connection between sequential bias and the subfield of causal inference known as dynamic treatment regimes. We expand these frameworks to account for the random number of decisions by modeling the decision-making process as a marked point process. Consequently, we can define and identify causal effects to quantify sequential bias. Moreover, we propose estimators and explore their properties, including double robustness and semiparametric efficiency. In a case study of 27,831 encounters with a large academic emergency department, we use our approach to demonstrate that the decision to route a patient to an area for low acuity patients has a significant impact on the care of future patients.

This paper develops a method to detect model structural changes by applying a Corrected Kernel Principal Component Analysis (CKPCA) to construct the so-called central distribution deviation subspaces. This approach can efficiently identify the mean and distribution changes in these dimension reduction subspaces. We derive that the locations and number changes in the dimension reduction data subspaces are identical to those in the original data spaces. Meanwhile, we also explain the necessity of using CKPCA as the classical KPCA fails to identify the central distribution deviation subspaces in these problems. Additionally, we extend this approach to clustering by embedding the original data with nonlinear lower dimensional spaces, providing enhanced capabilities for clustering analysis. The numerical studies on synthetic and real data sets suggest that the dimension reduction versions of existing methods for change point detection and clustering significantly improve the performances of existing approaches in finite sample scenarios.

In this paper we derive tight lower bounds resolving the hardness status of several fundamental weighted matroid problems. One notable example is budgeted matroid independent set, for which we show there is no fully polynomial-time approximation scheme (FPTAS), indicating the Efficient PTAS of [Doron-Arad, Kulik and Shachnai, SOSA 2023] is the best possible. Furthermore, we show that there is no pseudo-polynomial time algorithm for exact weight matroid independent set, implying the algorithm of [Camerini, Galbiati and Maffioli, J. Algorithms 1992] for representable matroids cannot be generalized to arbitrary matroids. Similarly, we show there is no Fully PTAS for constrained minimum basis of a matroid and knapsack cover with a matroid, implying the existing Efficient PTAS for the former is optimal. For all of the above problems, we obtain unconditional lower bounds in the oracle model, where the independent sets of the matroid can be accessed only via a membership oracle. We complement these results by showing that the same lower bounds hold under standard complexity assumptions, even if the matroid is encoded as part of the instance. All of our bounds are based on a specifically structured family of paving matroids.

The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are also always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically linearly and slower than the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. We assess the procedures through a real-data application with millions of observations, and in high-dimensional logistic regression, where a large-scale simulation experiment produces concrete evidence for the existence of a simple adjustment to the maximum Jeffreys'-penalized likelihood estimates that delivers high accuracy in terms of signal recovery even in cases where estimates from ML and other recently-proposed corrective methods do not exist.

We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司