亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider a bilevel continuous knapsack problem where the leader controls the capacity of the knapsack, while the follower chooses a feasible packing maximizing his own profit. The leader's aim is to optimize a linear objective function in the capacity and in the follower's solution, but with respect to different item values. We address a stochastic version of this problem where the follower's profits are uncertain from the leader's perspective, and only a probability distribution is known. Assuming that the leader aims at optimizing the expected value of her objective function, we first observe that the stochastic problem is tractable as long as the possible scenarios are given explicitly as part of the input, which also allows to deal with general distributions using a sample average approximation. For the case of independently and uniformly distributed item values, we show that the problem is #P-hard in general, and the same is true even for evaluating the leader's objective function. Nevertheless, we present pseudo-polynomial time algorithms for this case, running in time linear in the total size of the items. Based on this, we derive an additive approximation scheme for the general case of independently distributed item values, which runs in pseudo-polynomial time.

相關內容

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a certain computation oracle.

In real life situations often paired comparisons involving alternatives of either full or partial profiles to mitigate cognitive burden are presented. For this situation the problem of finding optimal designs is considered in the presence of second-order interactions when all attributes have general common number of levels.

We introduce a new algorithm for expected log-likelihood maximization in situations where the objective function is multi-modal and/or has saddle points, that we term G-PFSO. The key idea underpinning G-PFSO is to define a sequence of probability distributions which (a) is shown to concentrate on the target parameter value and (b) can be efficiently estimated by means of a standard particle filter algorithm. These distributions depends on a learning rate, where the faster the learning rate is the faster is the rate at which they concentrate on the desired parameter value but the lesser is the ability of G-PFSO to escape from a local optimum of the objective function. To conciliate ability to escape from a local optimum and fast convergence rate, the proposed estimator exploits the acceleration property of averaging, well-known in the stochastic gradient literature. Based on challenging estimation problems, our numerical experiments suggest that the estimator introduced in this paper converges at the optimal rate, and illustrate the practical usefulness of G-PFSO for parameter inference in large datasets. If the focus of this work is expected log-likelihood maximization the proposed approach and its theory apply more generally for optimizing a function defined through an expectation.

We study Tikhonov regularization for possibly nonlinear inverse problems with weighted $\ell^1$-penalization. The forward operator, mapping from a sequence space to an arbitrary Banach space, typically an $L^2$-space, is assumed to satisfy a two-sided Lipschitz condition with respect to a weighted $\ell^2$-norm and the norm of the image space. We show that in this setting approximation rates of arbitrarily high H\"older-type order in the regularization parameter can be achieved, and we characterize maximal subspaces of sequences on which these rates are attained. On these subspaces the method also converges with optimal rates in terms of the noise level with the discrepancy principle as parameter choice rule. Our analysis includes the case that the penalty term is not finite at the exact solution ('oversmoothing'). As a standard example we discuss wavelet regularization in Besov spaces $B^r_{1,1}$. In this setting we demonstrate in numerical simulations for a parameter identification problem in a differential equation that our theoretical results correctly predict improved rates of convergence for piecewise smooth unknown coefficients.

Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.

The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates $p$ sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used, in a general compact setting. Furthermore, for the simpler case with finite state and action spaces we prove R-linear convergence of the state-value function to the optimum.

In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{T\gamma_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $\gamma_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}\gamma_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Mat\'ern kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds.

In the simplest formulation of the knapsack problem, one seeks to maximize the total value of a collection of objects such that the total weight remains below a certain limit. In this work, we move from computer science to physics and formulate the knapsack problem as a statistical physics system and compute the corresponding partition function. We approximate the result in the large number limit and from this approximation develop a new algorithm for the problem. We compare the performance of this algorithm to that of other approximation algorithms, finding that the new algorithm is faster than most of these approaches while still retaining high accuracy. From its speed and accuracy relationship, we argue that the algorithm is a manifestation of a greedy algorithm. We conclude by discussing ways to extend the formalism to make its underlying heuristics more rigorous or to apply the approach to other combinatorial optimization problems. In all, this work exists at the intersection between computer science and statistical physics and represents a new analytical approach to solving the problems in the former using methods of the latter.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

北京阿比特科技有限公司