亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. We propose Mixed-Effect Thompson Sampling (meTS) that uses this structure to explore efficiently and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform extremely well empirically and show the generality of the proposed framework.

相關內容

The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inferior comparative performance. Nevertheless, some advanced models, such as the decision tree and the support vector machine, can learn features from complex data more effectively while they cannot provide explainable quantification for the spatial variation of localized relationships. To address the above issues, we propose a geographically gradient boosting weighted regression model, GWRBoost, that applies the localized additive model and gradient boosting optimization method to alleviate underfitting problems and retains explainable quantification capability for spatially-varying relationships between geographically located variables. Furthermore, we formulate the computation method of the Akaike information score for the proposed model to conduct the comparative analysis with the classic GWR algorithm. Simulation experiments and the empirical case study are applied to prove the efficient performance and practical value of GWRBoost. The results show that our proposed model can reduce the RMSE by 18.3\% in parameter estimation accuracy and AICc by 67.3\% in the goodness of fit.

We propose to use L\'evy {\alpha}-stable distributions for constructing priors for Bayesian inverse problems. The construction is based on Markov fields with stable-distributed increments. Special cases include the Cauchy and Gaussian distributions, with stability indices {\alpha} = 1, and {\alpha} = 2, respectively. Our target is to show that these priors provide a rich class of priors for modelling rough features. The main technical issue is that the {\alpha}-stable probability density functions do not have closed-form expressions in general, and this limits their applicability. For practical purposes, we need to approximate probability density functions through numerical integration or series expansions. Current available approximation methods are either too time-consuming or do not function within the range of stability and radius arguments needed in Bayesian inversion. To address the issue, we propose a new hybrid approximation method for symmetric univariate and bivariate {\alpha}-stable distributions, which is both fast to evaluate, and accurate enough from a practical viewpoint. Then we use approximation method in the numerical implementation of {\alpha}-stable random field priors. We demonstrate the applicability of the constructed priors on selected Bayesian inverse problems which include the deconvolution problem, and the inversion of a function governed by an elliptic partial differential equation. We also demonstrate hierarchical {\alpha}-stable priors in the one-dimensional deconvolution problem. We employ maximum-a-posterior-based estimation at all the numerical examples. To that end, we exploit the limited-memory BFGS and its bounded variant for the estimator.

In this paper we deal with the problem of sequential testing of multiple hypotheses. We are interested in minimising a weighted average sample number under restrictions on the error probabilities. A computer-oriented method of construction of optimal sequential tests is proposed. For the particular case of sampling from a Bernoulli population we develop a whole set of computer algorithms for optimal design and performance evaluation of sequential tests and implement them in the form of computer code written in R programming language. The tests we obtain are exact (neither asymptotic nor approximate). Extensions to other distribution families are discussed. A numerical comparison with other known tests (of MSPRT type) is carried out.

A recently developed measure-theoretic framework solves a stochastic inverse problem (SIP) for models where uncertainties in model output data are predominantly due to aleatoric (i.e., irreducible) uncertainties in model inputs (i.e., parameters). The subsequent inferential target is a distribution on parameters. Another type of inverse problem is to quantify uncertainties in estimates of "true" parameter values under the assumption that such uncertainties should be reduced as more data are incorporated into the problem, i.e., the uncertainty is considered epistemic. A major contribution of this work is the formulation and solution of such a parameter identification problem (PIP) within the measure-theoretic framework developed for the SIP. The approach is novel in that it utilizes a solution to a stochastic forward problem (SFP) to update an initial density only in the parameter directions informed by the model output data. In other words, this method performs "selective regularization" only in the parameter directions not informed by data. The solution is defined by a maximal updated density (MUD) point where the updated density defines the measure-theoretic solution to the PIP. Another significant contribution of this work is the full theory of existence and uniqueness of MUD points for linear maps with Gaussian distributions. Data-constructed Quantity of Interest (QoI) maps are also presented and analyzed for solving the PIP within this measure-theoretic framework as a means of reducing uncertainties in the MUD estimate. We conclude with a demonstration of the general applicability of the method on two problems involving either spatial or temporal data for estimating uncertain model parameters.

We introduce a novel approach to inference on parameters that take values in a Riemannian manifold embedded in a Euclidean space. Parameter spaces of this form are ubiquitous across many fields, including chemistry, physics, computer graphics, and geology. This new approach uses generalized fiducial inference to obtain a posterior-like distribution on the manifold, without needing to know a parameterization that maps the constrained space to an unconstrained Euclidean space. The proposed methodology, called the constrained generalized fiducial distribution (CGFD), is obtained by using mathematical tools from Riemannian geometry. A Bernstein-von Mises-type result for the CGFD, which provides intuition for how the desirable asymptotic qualities of the unconstrained generalized fiducial distribution are inherited by the CGFD, is provided. To demonstrate the practical use of the CGFD, we provide three proof-of-concept examples: inference for data from a multivariate normal density with the mean parameters on a sphere, a linear logspline density estimation problem, and a reimagined approach to the AR(1) model, all of which exhibit desirable coverages via simulation. We discuss two Markov chain Monte Carlo algorithms for the exploration of these constrained parameter spaces and adapt them for the CGFD.

Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.

Recommender systems play a fundamental role in web applications in filtering massive information and matching user interests. While many efforts have been devoted to developing more effective models in various scenarios, the exploration on the explainability of recommender systems is running behind. Explanations could help improve user experience and discover system defects. In this paper, after formally introducing the elements that are related to model explainability, we propose a novel explainable recommendation model through improving the transparency of the representation learning process. Specifically, to overcome the representation entangling problem in traditional models, we revise traditional graph convolution to discriminate information from different layers. Also, each representation vector is factorized into several segments, where each segment relates to one semantic aspect in data. Different from previous work, in our model, factor discovery and representation learning are simultaneously conducted, and we are able to handle extra attribute information and knowledge. In this way, the proposed model can learn interpretable and meaningful representations for users and items. Unlike traditional methods that need to make a trade-off between explainability and effectiveness, the performance of our proposed explainable model is not negatively affected after considering explainability. Finally, comprehensive experiments are conducted to validate the performance of our model as well as explanation faithfulness.

Properly handling missing data is a fundamental challenge in recommendation. Most present works perform negative sampling from unobserved data to supply the training of recommender models with negative signals. Nevertheless, existing negative sampling strategies, either static or adaptive ones, are insufficient to yield high-quality negative samples --- both informative to model training and reflective of user real needs. In this work, we hypothesize that item knowledge graph (KG), which provides rich relations among items and KG entities, could be useful to infer informative and factual negative samples. Towards this end, we develop a new negative sampling model, Knowledge Graph Policy Network (KGPolicy), which works as a reinforcement learning agent to explore high-quality negatives. Specifically, by conducting our designed exploration operations, it navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender. We tested on a matrix factorization (MF) model equipped with KGPolicy, and it achieves significant improvements over both state-of-the-art sampling methods like DNS and IRGAN, and KG-enhanced recommender models like KGAT. Further analyses from different angles provide insights of knowledge-aware sampling. We release the codes and datasets at //github.com/xiangwang1223/kgpolicy.

Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.

北京阿比特科技有限公司