亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability $1-\delta$ for some $\delta>0$, the arm with the highest mean reward in minimum possible samples from the set of arms $\mathcal{K}$. Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other. We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm. Our proposed algorithm C-LUCB, which generalizes the LUCB algorithm utilizes this partial knowledge of correlations to sharply reduce the sample complexity of best-arm identification. More interestingly, we show that the total samples obtained by C-LUCB are of the form $\mathcal{O}\left(\sum_{k \in \mathcal{C}} \log\left(\frac{1}{\delta}\right)\right)$ as opposed to the typical $\mathcal{O}\left(\sum_{k \in \mathcal{K}} \log\left(\frac{1}{\delta}\right)\right)$ samples required in the independent reward setting. The improvement comes, as the $\mathcal{O}(\log(1/\delta))$ term is summed only for the set of competitive arms $\mathcal{C}$, which is a subset of the original set of arms $\mathcal{K}$. The size of the set $\mathcal{C}$, depending on the problem setting, can be as small as $2$, and hence using C-LUCB in the correlated bandits setting can lead to significant performance improvements. Our theoretical findings are supported by experiments on the Movielens and Goodreads recommendation datasets.

相關內容

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( \delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.

In recent years, correntropy has been seccessfully applied to robust adaptive filtering to eliminate adverse effects of impulsive noises or outliers. Correntropy is generally defined as the expectation of a Gaussian kernel between two random variables. This definition is reasonable when the error between the two random variables is symmetrically distributed around zero. For the case of asymmetric error distribution, the symmetric Gaussian kernel is however inappropriate and cannot adapt to the error distribution well. To address this problem, in this brief we propose a new variant of correntropy, named asymmetric correntropy, which uses an asymmetric Gaussian model as the kernel function. In addition, a robust adaptive filtering algorithm based on asymmetric correntropy is developed and its steady-state convergence performance is analyzed. Simulations are provided to confirm the theoretical results and good performance of the proposed algorithm.

We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the past impact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real world. For example, the capability to pay back a loan for people in a certain social group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits. We generalize the bandit setting to encode the dependency of this "bias" due to the action history during learning. The goal is to maximize the collected utilities over time while taking into account the dynamics created by the delayed impacts of historical actions. We propose an algorithm that achieves a regret of $\tilde{\mathcal{O}}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON -- Batched Exploration with Adaptive COmmunicatioN -- that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.

We consider the problem of approximating the arboricity of a graph $G= (V,E)$, which we denote by $\mathsf{arb}(G)$, in sublinear time, where the arboricity of a graph is the minimal number of forests required to cover its edges. An algorithm for this problem may perform degree and neighbor queries, and is allowed a small error probability. We design an algorithm that outputs an estimate $\hat{\alpha}$, such that with probability $1-1/\textrm{poly}(n)$, $\mathsf{arb}(G)/c\log^2 n \leq \hat{\alpha} \leq \mathsf{arb}(G)$, where $n=|V|$ and $c$ is a constant. The expected query complexity and running time of the algorithm are $O(n/\mathsf{arb}(G))\cdot \textrm{poly}(\log n)$, and this upper bound also holds with high probability. %($\widetilde{O}(\cdot)$ is used to suppress $\textrm{poly}(\log n)$ dependencies). This bound is optimal for such an approximation up to a $\textrm{poly}(\log n)$ factor.

We give an $(\varepsilon,\delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a\in [k]:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.

We revisit the problem of finding optimal strategies for deterministic Markov Decision Processes (DMDPs), and a closely related problem of testing feasibility of systems of $m$ linear inequalities on $n$ real variables with at most two variables per inequality (2VPI). We give a randomized trade-off algorithm solving both problems and running in $\tilde{O}(nmh+(n/h)^3)$ time using $\tilde{O}(n^2/h+m)$ space for any parameter $h\in [1,n]$. In particular, using subquadratic space we get $\tilde{O}(nm+n^{3/2}m^{3/4})$ running time, which improves by a polynomial factor upon all the known upper bounds for non-dense instances with $m=O(n^{2-\epsilon})$. Moreover, using linear space we match the randomized $\tilde{O}(nm+n^3)$ time bound of Cohen and Megiddo [SICOMP'94] that required $\tilde{\Theta}(n^2+m)$ space. Additionally, we show a new algorithm for the Discounted All-Pairs Shortest Paths problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices. For the case of uniform discount factors, we give a deterministic algorithm running in $\tilde{O}(n^{3/2}m^{3/4})$ time, which improves significantly upon the randomized bound $\tilde{O}(n^2\sqrt{m})$ of Madani et al.

Graphs (networks) are an important tool to model data in different domains. Real-world graphs are usually directed, where the edges have a direction and they are not symmetric. Betweenness centrality is an important index widely used to analyze networks. In this paper, first given a directed network $G$ and a vertex $r \in V(G)$, we propose an exact algorithm to compute betweenness score of $r$. Our algorithm pre-computes a set $\mathcal{RV}(r)$, which is used to prune a huge amount of computations that do not contribute to the betweenness score of $r$. Time complexity of our algorithm depends on $|\mathcal{RV}(r)|$ and it is respectively $\Theta(|\mathcal{RV}(r)|\cdot|E(G)|)$ and $\Theta(|\mathcal{RV}(r)|\cdot|E(G)|+|\mathcal{RV}(r)|\cdot|V(G)|\log |V(G)|)$ for unweighted graphs and weighted graphs with positive weights. $|\mathcal{RV}(r)|$ is bounded from above by $|V(G)|-1$ and in most cases, it is a small constant. Then, for the cases where $\mathcal{RV}(r)$ is large, we present a simple randomized algorithm that samples from $\mathcal{RV}(r)$ and performs computations for only the sampled elements. We show that this algorithm provides an $(\epsilon,\delta)$-approximation to the betweenness score of $r$. Finally, we perform extensive experiments over several real-world datasets from different domains for several randomly chosen vertices as well as for the vertices with the highest betweenness scores. Our experiments reveal that for estimating betweenness score of a single vertex, our algorithm significantly outperforms the most efficient existing randomized algorithms, in terms of both running time and accuracy. Our experiments also reveal that our algorithm improves the existing algorithms when someone is interested in computing betweenness values of the vertices in a set whose cardinality is very small.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

Sufficient training data is normally required to train deeply learned models. However, the number of pedestrian images per ID in person re-identification (re-ID) datasets is usually limited, since manually annotations are required for multiple camera views. To produce more data for training deeply learned models, generative adversarial network (GAN) can be leveraged to generate samples for person re-ID. However, the samples generated by vanilla GAN usually do not have labels. So in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated images. With MpRL, the generated samples will be used as supplementary of real training data to train a deep model in a semi-supervised learning fashion. Considering data bias between generated and real samples, MpRL utilizes different contributions from predefined training classes. The contribution-based virtual labels are automatically assigned to generated samples to reduce ambiguous prediction in training. Meanwhile, MpRL only relies on predefined training classes without using extra classes. Furthermore, to reduce over-fitting, a regularized manner is applied to MpRL to regularize the learning process. To verify the effectiveness of MpRL, two state-of-the-art convolutional neural networks (CNNs) are adopted in our experiments. Experiments demonstrate that by assigning MpRL to generated samples, we can further improve the person re-ID performance on three datasets i.e., Market-1501, DukeMTMCreID, and CUHK03. The proposed method obtains +6.29%, +6.30% and +5.58% improvements in rank-1 accuracy over a strong CNN baseline respectively, and outperforms the state-of-the- art methods.

北京阿比特科技有限公司