亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.

相關內容

We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.

We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial meta-algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as misspecified linear contextual bandits, linear bandit with unknown dimension and reinforcement learning with unknown feature maps. Our algorithm requires the knowledge of the optimal base regret to adjust the meta-algorithm learning rate. We show that without such prior knowledge any meta-algorithm can suffer a regret larger than the optimal base regret.

Pervasive cross-section dependence is increasingly recognized as a characteristic of economic data and the approximate factor model provides a useful framework for analysis. Assuming a strong factor structure where $\Lop\Lo/N^\alpha$ is positive definite in the limit when $\alpha=1$, early work established convergence of the principal component estimates of the factors and loadings up to a rotation matrix. This paper shows that the estimates are still consistent and asymptotically normal when $\alpha\in(0,1]$ albeit at slower rates and under additional assumptions on the sample size. The results hold whether $\alpha$ is constant or varies across factors. The framework developed for heterogeneous loadings and the simplified proofs that can be also used in strong analysis are of independent interest

Estimating statistical properties is fundamental in statistics and computer science. In this paper, we propose a unified quantum algorithm framework for estimating properties of discrete probability distributions, with estimating R\'enyi entropies as specific examples. In particular, given a quantum oracle that prepares an $n$-dimensional quantum state $\sum_{i=1}^{n}\sqrt{p_{i}}|i\rangle$, for $\alpha>1$ and $0<\alpha<1$, our algorithm framework estimates $\alpha$-R\'enyi entropy $H_{\alpha}(p)$ to within additive error $\epsilon$ with probability at least $2/3$ using $\widetilde{\mathcal{O}}(n^{1-\frac{1}{2\alpha}}/\epsilon + \sqrt{n}/\epsilon^{1+\frac{1}{2\alpha}})$ and $\widetilde{\mathcal{O}}(n^{\frac{1}{2\alpha}}/\epsilon^{1+\frac{1}{2\alpha}})$ queries, respectively. This improves the best known dependence in $\epsilon$ as well as the joint dependence between $n$ and $1/\epsilon$. Technically, our quantum algorithms combine quantum singular value transformation, quantum annealing, and variable-time amplitude estimation. We believe that our algorithm framework is of general interest and has wide applications.

Weitzman introduced Pandora's box problem as a mathematical model of sequential search with inspection costs, in which a searcher is allowed to select a prize from one of $n$ alternatives. Several decades later, Doval introduced a close version of the problem, where the searcher does not need to incur the inspection cost of an alternative, and can select it uninspected. Unlike the original problem, the optimal solution to the nonobligatory inspection variant is proved to need adaptivity, and by recent work of [FLL22], finding the optimal solution is NP-hard. Our first main result is a structural characterization of the optimal policy: We show there exists an optimal policy that follows only two different pre-determined orders of inspection, and transitions from one to the other at most once. Our second main result is a polynomial time approximation scheme (PTAS). Our proof involves a novel reduction to a framework developed by [FLX18], utilizing our optimal two-phase structure. Furthermore, we show Pandora's problem with nonobligatory inspection belongs to class NP, which by using the hardness result of [FLL22], settles the computational complexity class of the problem. Finally, we provide a tight 0.8 approximation and a novel proof for committing policies [BK19] (informally, the set of nonadaptive policies) for general classes of distributions, which was previously shown only for discrete and finite distributions [GMS08].

Weitzman (1979) introduced the Pandora Box problem as a model for sequential search with inspection costs, and gave an elegant index-based policy that attains provably optimal expected payoff. In various scenarios, the searching agent may select an option without making a costly inspection. The variant of the Pandora box problem with non-obligatory inspection has attracted interest from both economics and algorithms researchers. Various simple algorithms have proved suboptimal, with the best known 0.8-approximation algorithm due to Guha et al. (2008). No hardness result for the problem was known. In this work, we show that it is NP-hard to compute an optimal policy for Pandora's problem with nonobligatory inspection. We also give a polynomial-time approximation scheme (PTAS) that computes policies with an expected payoff at least $(1 - \epsilon)$-fraction of the optimal, for arbitrarily small $\epsilon > 0$. On the side, we show the decision version of the problem to be in NP.

Hypothesis testing procedures are developed to assess linear operator constraints in function-on-scalar regression when incomplete functional responses are observed. The approach enables statistical inferences about the shape and other aspects of the functional regression coefficients within a unified framework encompassing three incomplete sampling scenarios: (i) partially observed response functions as curve segments over random sub-intervals of the domain; (ii) discretely observed functional responses with additive measurement errors; and (iii) the composition of former two scenarios, where partially observed response segments are observed discretely with measurement error. The latter scenario has been little explored to date, although such structured data is increasingly common in applications. For statistical inference, deviations from the constraint space are measured via integrated $L^2$-distance between the model estimates from the constrained and unconstrained model spaces. Large sample properties of the proposed test procedure are established, including the consistency, asymptotic distribution and local power of the test statistic. Finite sample power and level of the proposed test are investigated in a simulation study covering a variety of scenarios. The proposed methodologies are illustrated by applications to U.S. obesity prevalence data, analyzing the functional shape of its trends over time, and motion analysis in a study of automotive ergonomics.

Information is often stored in a distributed and proprietary form, and agents who own information are often self-interested and require incentives to reveal their information. Suitable mechanisms are required to elicit and aggregate such distributed information for decision making. In this paper, we use simulations to investigate the use of decision markets as mechanisms in a multi-agent learning system to aggregate distributed information for decision-making in a contextual bandit problem. The system utilises strictly proper decision scoring rules to assess the accuracy of probabilistic reports from agents, which allows agents to learn to solve the contextual bandit problem jointly. Our simulations show that our multi-agent system with distributed information can be trained as efficiently as a centralised counterpart with a single agent that receives all information. Moreover, we use our system to investigate scenarios with deterministic decision scoring rules which are not incentive compatible. We observe the emergence of more complex dynamics with manipulative behaviour, which agrees with existing theoretical analyses.

Approximating functions by a linear span of truncated basis sets is a standard procedure for the numerical solution of differential and integral equations. Commonly used concepts of approximation methods are well-posed and convergent, by provable approximation orders. On the down side, however, these methods often suffer from the curse of dimensionality, which limits their approximation behavior, especially in situations of highly oscillatory target functions. Nonlinear approximation methods, such as neural networks, were shown to be very efficient in approximating high-dimensional functions. We investigate nonlinear approximation methods that are constructed by composing standard basis sets with normalizing flows. Such models yield richer approximation spaces while maintaining the density properties of the initial basis set, as we show. Simulations to approximate eigenfunctions of a perturbed quantum harmonic oscillator indicate convergence with respect to the size of the basis set.

Two combined numerical methods for solving time-varying semilinear differential-algebraic equations (DAEs) are obtained. These equations are also called degenerate DEs, descriptor systems, operator-differential equations and DEs on manifolds. The convergence and correctness of the methods are proved. When constructing methods we use, in particular, time-varying spectral projectors which can be numerically found. This enables to numerically solve and analyze the considered DAE in the original form without additional analytical transformations. To improve the accuracy of the second method, recalculation (a ``predictor-corrector'' scheme) is used. Note that the developed methods are applicable to the DAEs with the continuous nonlinear part which may not be continuously differentiable in $t$, and that the restrictions of the type of the global Lipschitz condition, including the global condition of contractivity, are not used in the theorems on the global solvability of the DAEs and on the convergence of the numerical methods. This enables to use the developed methods for the numerical solution of more general classes of mathematical models. For example, the functions of currents and voltages in electric circuits may not be differentiable or may be approximated by nondifferentiable functions. Presented conditions for the global solvability of the DAEs ensure the existence of an unique exact global solution for the corresponding initial value problem, which enables to compute approximate solutions on any given time interval (provided that the conditions of theorems or remarks on the convergence of the methods are fulfilled). In the paper, the numerical analysis of the mathematical model for a certain electrical circuit, which demonstrates the application of the presented theorems and numerical methods, is carried out.

北京阿比特科技有限公司