亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In the prophet secretary problem, $n$ values are drawn independently from known distributions, and presented in a uniformly random order. A decision-maker must accept or reject each value when it is presented, and may accept at most $k$ values in total. The objective is to maximize the expected sum of accepted values. We analyze the performance of static threshold policies, which accept the first $k$ values exceeding a fixed threshold (or all such values, if fewer than $k$ exist). We show that an appropriate threshold guarantees $\gamma_k = 1 - e^{-k}k^k/k!$ times the value of the offline optimal solution. Note that $\gamma_1 = 1-1/e$, and by Stirling's approximation $\gamma_k \approx 1-1/\sqrt{2 \pi k}$. This represents the best-known guarantee for the prophet secretary problem for all $k>1$, and is tight for all $k$ for the class of static threshold policies. We provide two simple methods for setting the threshold. Our first method sets a threshold such that $k \cdot \gamma_k$ values are accepted in expectation, and offers an optimal guarantee for all $k$. Our second sets a threshold such that the expected number of values exceeding the threshold is equal to $k$. This approach gives an optimal guarantee if $k > 4$, but gives sub-optimal guarantees for $k \le 4$. Our proofs use a new result for optimizing sums of independent Bernoulli random variables, which extends a classical result of Hoeffding (1956) and is likely to be of independent interest. Finally, we note that our methods for setting thresholds can be implemented under limited information about agents' values.

相關內容

We show that 11-channel sorting networks have at least 35 comparators and that 12-channel sorting networks have at least 39 comparators. This positively settles the optimality of the corresponding sorting networks given in The Art of Computer Programming vol. 3 and closes the two smallest open instances of the Bose-Nelson sorting problem. We obtain these bounds by generalizing a result of Van Voorhis from sorting networks to a more general class of comparator networks. From this we derive a dynamic programming algorithm that computes the optimal size for a sorting network with a given number of channels. From an execution of this algorithm we construct a certificate containing a derivation of the corresponding lower size bound, which we check using a program formally verified using the Isabelle/HOL proof assistant.

The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online.

Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains $ \tilde{O}\left( \max\{H,{1}/{p_{min}}\}H|S|^{3/2}\sqrt{|A|T\log(\max\{|\mathcal{F}|,|\mathcal{P}|\}/\delta)} \right)$ regret bound, with probability $1-\delta$, where $\mathcal{P}$ and $\mathcal{F}$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes. To our knowledge, our approach is the first optimistic approach applied to contextual MDPs with general function approximation (i.e., without additional knowledge regarding the function class, such as it being linear and etc.). In addition, we present a lower bound of $\Omega(\sqrt{T H |S| |A| \ln(|\mathcal{F}|/|S|)/\ln(|A|)})$, on the expected regret which holds even in the case of known dynamics.

This work presents a new procedure for obtaining predictive distributions in the context of Gaussian process (GP) modeling, with a relaxation of the interpolation constraints outside some ranges of interest: the mean of the predictive distributions no longer necessarily interpolates the observed values when they are outside ranges of interest, but are simply constrained to remain outside. This method called relaxed Gaussian process (reGP) interpolation provides better predictive distributions in ranges of interest, especially in cases where a stationarity assumption for the GP model is not appropriate. It can be viewed as a goal-oriented method and becomes particularly interesting in Bayesian optimization, for example, for the minimization of an objective function, where good predictive distributions for low function values are important. When the expected improvement criterion and reGP are used for sequentially choosing evaluation points, the convergence of the resulting optimization algorithm is theoretically guaranteed (provided that the function to be optimized lies in the reproducing kernel Hilbert spaces attached to the known covariance of the underlying Gaussian process). Experiments indicate that using reGP instead of stationary GP models in Bayesian optimization is beneficial.

In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.

We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. Unlike parametric mixtures, the difficulty does not arise from the order $k$ or small weights $w_i$, and unlike nonparametric density estimation it does not arise from the curse of dimensionality, irregularity, or inhomogeneity. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.

Suppose we are given integer $k \leq n$ and $n$ boxes labeled $1,\ldots, n$ by an adversary, each containing a number chosen from an unknown distribution. We have to choose an order to sequentially open these boxes, and each time we open the next box in this order, we learn its number. If we reject a number in a box, the box cannot be recalled. Our goal is to accept the $k$ largest of these numbers, without necessarily opening all boxes. This is the free order multiple-choice secretary problem. Free order variants were studied extensively for the secretary and prophet problems. Kesselheim, Kleinberg, and Niazadeh KKN (STOC'15) initiated a study of randomness-efficient algorithms (with the cheapest order in terms of used random bits) for the free order secretary problems. We present an algorithm for free order multiple-choice secretary, which is simultaneously optimal for the competitive ratio and used amount of randomness. I.e., we construct a distribution on orders with optimal entropy $\Theta(\log\log n)$ such that a deterministic multiple-threshold algorithm is $1-O(\sqrt{\log k/k})$-competitive. This improves in three ways the previous best construction by KKN, whose competitive ratio is $1 - O(1/k^{1/3}) - o(1)$. Our competitive ratio is (near)optimal for the multiple-choice secretary problem; it works for exponentially larger parameter $k$; and our algorithm is a simple deterministic multiple-threshold algorithm, while that in KKN is randomized. We also prove a corresponding lower bound on the entropy of optimal solutions for the multiple-choice secretary problem, matching entropy of our algorithm, where no such previous lower bound was known. We obtain our algorithmic results with a host of new techniques, and with these techniques we also improve significantly the previous results of KKN about constructing entropy-optimal distributions for the classic free order secretary.

Stochastic rounding (SR) offers an alternative to the deterministic IEEE-754 floating-point rounding modes. In some applications such as PDEs, ODEs and neural networks, SR empirically improves the numerical behavior and convergence to accurate solutions while no sound theoretical background has been provided. Recent works by Ipsen, Zhou, Higham, and Mary have computed SR probabilistic error bounds for basic linear algebra kernels. For example, the inner product SR probabilistic bound of the forward error is proportional to $\sqrt$ nu instead of nu for the default rounding mode. To compute the bounds, these works show that the errors accumulated in computation form a martingale. This paper proposes an alternative framework to characterize SR errors based on the computation of the variance. We pinpoint common error patterns in numerical algorithms and propose a lemma that bounds their variance. For each probability and through Bienaym{\'e}-Chebyshev inequality, this bound leads to better probabilistic error bound in several situations. Our method has the advantage of providing a tight probabilistic bound for all algorithms fitting our model. We show how the method can be applied to give SR error bounds for the inner product and Horner polynomial evaluation.

Predictive coding (PC) is an influential theory in computational neuroscience, which argues that the cortex forms unsupervised world models by implementing a hierarchical process of prediction error minimization. PC networks (PCNs) are trained in two phases. First, neural activities are updated to optimize the network's response to external stimuli. Second, synaptic weights are updated to consolidate this change in activity -- an algorithm called \emph{prospective configuration}. While previous work has shown how in various limits, PCNs can be found to approximate backpropagation (BP), recent work has demonstrated that PCNs operating in this standard regime, which does not approximate BP, nevertheless obtain competitive training and generalization performance to BP-trained networks while outperforming them on tasks such as online, few-shot, and continual learning, where brains are known to excel. Despite this promising empirical performance, little is understood theoretically about the properties and dynamics of PCNs in this regime. In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration. We first derive analytical results concerning the inference equilibrium for PCNs and a previously unknown close connection relationship to target propagation (TP). Secondly, we provide a theoretical analysis of learning in PCNs as a variant of generalized expectation-maximization and use that to prove the convergence of PCNs to critical points of the BP loss function, thus showing that deep PCNs can, in theory, achieve the same generalization performance as BP, while maintaining their unique advantages.

北京阿比特科技有限公司