亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the finite sample behavior of Lasso-based inference methods such as post double Lasso and debiased Lasso. We show that these methods can exhibit substantial omitted variable biases (OVBs) due to Lasso not selecting relevant controls. This phenomenon can occur even when the coefficients are sparse and the sample size is large and larger than the number of controls. Therefore, relying on the existing asymptotic inference theory can be problematic in empirical applications. We compare the Lasso-based inference methods to modern high-dimensional OLS-based methods and provide practical guidance.

相關內容

We formulate natural gradient variational inference (VI), expectation propagation (EP), and posterior linearisation (PL) as extensions of Newton's method for optimising the parameters of a Bayesian posterior distribution. This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation. We show that common approximations to Newton's method from the optimisation literature, namely Gauss-Newton and quasi-Newton methods (e.g., the BFGS algorithm), are still valid under this 'Bayes-Newton' framework. This leads to a suite of novel algorithms which are guaranteed to result in positive semi-definite covariance matrices, unlike standard VI and EP. Our unifying viewpoint provides new insights into the connections between various inference schemes. All the presented methods apply to any model with a Gaussian prior and non-conjugate likelihood, which we demonstrate with (sparse) Gaussian processes and state space models.

We consider the problem of online linear regression in the stochastic setting. We derive high probability regret bounds for online ridge regression and the forward algorithm. This enables us to compare online regression algorithms more accurately and eliminate assumptions of bounded observations and predictions. Our study advocates for the use of the forward algorithm in lieu of ridge due to its enhanced bounds and robustness to the regularization parameter. Moreover, we explain how to integrate it in algorithms involving linear function approximation to remove a boundedness assumption without deteriorating theoretical bounds. We showcase this modification in linear bandit settings where it yields improved regret bounds. Last, we provide numerical experiments to illustrate our results and endorse our intuitions.

Momentum methods have been shown to accelerate the convergence of the standard gradient descent algorithm in practice and theory. In particular, the minibatch-based gradient descent methods with momentum (MGDM) are widely used to solve large-scale optimization problems with massive datasets. Despite the success of the MGDM methods in practice, their theoretical properties are still underexplored. To this end, we investigate the theoretical properties of MGDM methods based on the linear regression models. We first study the numerical convergence properties of the MGDM algorithm and further provide the theoretically optimal tuning parameters specification to achieve faster convergence rate. In addition, we explore the relationship between the statistical properties of the resulting MGDM estimator and the tuning parameters. Based on these theoretical findings, we give the conditions for the resulting estimator to achieve the optimal statistical efficiency. Finally, extensive numerical experiments are conducted to verify our theoretical results.

We propose a new, more general definition of extended probability measures. We study their properties and provide a behavioral interpretation. We use them in an inference procedure, whose environment is canonically represented by the probability space $(\Omega,\mathcal{F},P)$, when both $P$ and the composition of $\Omega$ are unknown. We develop an ex ante analysis -- taking place before the statistical analysis requiring knowledge of $\Omega$ -- in which we progressively learn the true composition of $\Omega$. We describe how to update extended probabilities in this setting, and introduce the concept of lower extended probabilities. We provide two examples in the fields of ecology and opinion dynamics.

A word is called closed if it has a prefix which is also its suffix and there is no internal occurrences of this prefix in the word. In this paper we study words that are rich in closed factors, i.e., which contain the maximal possible number of distinct closed factors. As the main result, we show that for finite words the asymptotics of the maximal number of distinct closed factors in a word of length $n$ is $\frac{n^2}{6}$. For infinite words, we show there exist words such that each their factor of length $n$ contains a quadratic number of distinct closed factors, with uniformly bounded constant; we call such words infinite closed-rich. We provide several necessary and some sufficient conditions for a word to be infinite closed rich. For example, we show that all linearly recurrent words are closed-rich. We provide a characterization of rich words among Sturmian words. Certain examples we provide involve non-constructive methods.

Quantile regression is a powerful data analysis tool that accommodates heterogeneous covariate-response relationships. We find that by coupling the asymmetric Laplace working likelihood with appropriate shrinkage priors, we can deliver posterior inference that automatically adapts to possible sparsity in quantile regression analysis. After a suitable adjustment on the posterior variance, the posterior inference provides asymptotically valid inference under heterogeneity. Furthermore, the proposed approach leads to oracle asymptotic efficiency for the active (nonzero) quantile regression coefficients and super-efficiency for the non-active ones. By avoiding the need to pursue dichotomous variable selection, the Bayesian computational framework demonstrates desirable inference stability with respect to tuning parameter selection. Our work helps to uncloak the value of Bayesian computational methods in frequentist inference for quantile regression.

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency.

The ratio of two Gaussians is useful in many contexts of statistical inference. We discuss statistically valid inference of the ratio estimator under Differential Privacy (DP). We use the delta method to derive the asymptotic distribution of the ratio estimator and use the Gaussian mechanism to provide $(\epsilon, \delta)$ privacy guarantees. Like many statistics, the quantities needed here can be re-written as functions of sums, and sums are easy to work with for many reasons. In the DP case, the sensitivity of a sum can be easily obtained. We focus on the coverage of 95\% confidence intervals (CIs). Our simulations shows that the no correction method, which ignores the noise mechanism, gives CIs that are too narrow to provide proper coverage for small samples. We propose two methods to mitigate the under-coverage issue, one based on Monte Carlo simulations and the other based on analytical correction. We show that the CIs of our methods have the right coverage with proper privacy budget. In addition, our methods can handle weighted data, where the weights are fixed and bounded.

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司