亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper we prove the efficacy of a simple greedy algorithm for a finite horizon online resource allocation/matching problem, when the corresponding static planning linear program (SPP) exhibits a non-degeneracy condition called the general position gap (GPG). The key intuition that we formalize is that the solution of the reward maximizing SPP is the same as a feasibility LP restricted to the optimal basic activities, and under GPG this solution can be tracked with bounded regret by a greedy algorithm, i.e., without the commonly used technique of periodically resolving the SPP. The goal of the decision maker is to combine resources (from a finite set of resource types) into configurations (from a finite set of feasible configurations) where each configuration is specified by the number of resources consumed of each type and a reward. The resources are further subdivided into three types - offline (whose quantity is known and available at time 0), online-queueable (which arrive online and can be stored in a buffer), and online-nonqueueable (which arrive online and must be matched on arrival or lost). Under GRG we prove that, (i) our greedy algorithm gets bounded any-time regret of $\mathcal{O}(1/\epsilon_0)$ for matching reward ($\epsilon_0$ is a measure of the GPG) when no configuration contains both an online-queueable and an online-nonqueueable resource, and (ii) $\mathcal{O}(\log t)$ expected any-time regret otherwise (we also prove a matching lower bound). By considering the three types of resources, our matching framework encompasses several well-studied problems such as dynamic multi-sided matching, network revenue management, online stochastic packing, and multiclass queueing systems.

相關內容

The Matching Augmentation Problem (MAP) has recently received significant attention as an important step towards better approximation algorithms for finding cheap $2$-edge connected subgraphs. This has culminated in a $\frac{5}{3}$-approximation algorithm. However, the algorithm and its analysis are fairly involved and do not compare against the problem's well-known LP relaxation called the cut LP. In this paper, we propose a simple algorithm that, guided by an optimal solution to the cut LP, first selects a DFS tree and then finds a solution to MAP by computing an optimum augmentation of this tree. Using properties of extreme point solutions, we show that our algorithm always returns (in polynomial time) a better than $2$-approximation when compared to the cut LP. We thereby also obtain an improved upper bound on the integrality gap of this natural relaxation.

The Initial Algebra Theorem by Trnkov\'a et al.~states, under mild assumptions, that an endofunctor has an initial algebra provided it has a pre-fixed point. The proof crucially depends on transfinitely iterating the functor and in fact shows that, equivalently, the (transfinite) initial-algebra chain stops. We give a constructive proof of the Initial Algebra Theorem that avoids transfinite iteration of the functor. For a given pre-fixed point $A$ of the functor, it uses Pataraia's theorem to obtain the least fixed point of a monotone function on the partial order formed by all subobjects of $A$. Thanks to properties of recursive coalgebras, this least fixed point yields an initial algebra. We obtain new results on fixed points and initial algebras in categories enriched over directed-complete partial orders, again without iteration. Using transfinite iteration we equivalently obtain convergence of the initial-algebra chain as an equivalent condition, overall yielding a streamlined version of the original proof.

We study the problem of reinforcement learning (RL) with low (policy) switching cost - a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. This is an exponential improvement over the best-known switching cost $O(H^2SA\log T)$ among existing methods with $\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$ regret. In the above, $S,A$ denotes the number of states and actions in an $H$-horizon episodic Markov Decision Process model with unknown transitions, and $T$ is the number of steps. We also prove an information-theoretical lower bound which says that a switching cost of $\Omega(HSA)$ is required for any no-regret algorithm. As a byproduct, our new algorithmic techniques allow us to derive a \emph{reward-free} exploration algorithm with an optimal switching cost of $O(HSA)$.

In 2017, Krenn reported that certain problems related to the perfect matchings and colourings of graphs emerge out of studying the constructability of general quantum states using modern photonic technologies. He realized that if we can prove that the \emph{weighted matching index} of a graph, a parameter defined in terms of perfect matchings and colourings of the graph is at most 2, that could lead to exciting insights on the potential of resources of quantum inference. Motivated by this, he conjectured that the {weighted matching index} of any graph is at most 2. The first result on this conjecture was by Bogdanov, who proved that the \emph{(unweighted) matching index} of graphs (non-isomorphic to $K_4$) is at most 2, thus classifying graphs non-isomorphic to $K_4$ into Type 0, Type 1 and Type 2. By definition, the weighted matching index of Type 0 graphs is 0. We give a structural characterization for Type 2 graphs, using which we settle Krenn's conjecture for Type 2 graphs. Using this characterization, we provide a simple $O(|V||E|)$ time algorithm to find the unweighted matching index of any graph. In view of our work, Krenn's conjecture remains to be proved only for Type 1 graphs. We give upper bounds for the weighted matching index in terms of connectivity parameters for such graphs. Using these bounds, for a slightly simplified version, we settle Krenn's conjecture for the class of graphs with vertex connectivity at most 2 and the class of graphs with maximum degree at most 4. Krenn has been publicizing his conjecture in various ways since 2017. He has even declared a reward for a resolution of his conjecture. We hope that this article will popularize the problem among computer scientists.

An intensive line of research on fixed parameter tractability of integer programming is focused on exploiting the relation between the sparsity of a constraint matrix $A$ and the norm of the elements of its Graver basis. In particular, integer programming is fixed parameter tractable when parameterized by the primal tree-depth and the entry complexity of $A$, and when parameterized by the dual tree-depth and the entry complexity of $A$; both these parameterization imply that $A$ is sparse, in particular, the number of its non-zero entries is linear in the number of columns or rows, respectively. We study preconditioners transforming a given matrix to an equivalent sparse matrix if it exists and provide structural results characterizing the existence of a sparse equivalent matrix in terms of the structural properties of the associated column matroid. In particular, our results imply that the $\ell_1$-norm of the Graver basis is bounded by a function of the maximum $\ell_1$-norm of a circuit of $A$. We use our results to design a parameterized algorithm that constructs a matrix equivalent to an input matrix $A$ that has small primal/dual tree-depth and entry complexity if such an equivalent matrix exists. Our results yield parameterized algorithms for integer programming when parameterized by the $\ell_1$-norm of the Graver basis of the constraint matrix, when parameterized by the $\ell_1$-norm of the circuits of the constraint matrix, when parameterized by the smallest primal tree-depth and entry complexity of a matrix equivalent to the constraint matrix, and when parameterized by the smallest dual tree-depth and entry complexity of a matrix equivalent to the constraint matrix.

We propose 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation. The latter is computed per interest point and aligned to the local reference frame (LRF) to achieve rotation invariance. Our compact, learned, rotation invariant 3D point cloud descriptor achieves 94.9% average recall on the 3DMatch benchmark data set, outperforming the state-of-the-art by more than 20 percent points with only 32 output dimensions. This very low output dimension allows for near realtime correspondence search with 0.1 ms per feature point on a standard PC. Our approach is sensor- and sceneagnostic because of SDV, LRF and learning highly descriptive features with fully convolutional layers. We show that 3DSmoothNet trained only on RGB-D indoor scenes of buildings achieves 79.0% average recall on laser scans of outdoor vegetation, more than double the performance of our closest, learning-based competitors. Code, data and pre-trained models are available online at //github.com/zgojcic/3DSmoothNet.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

To see is to sketch -- free-hand sketching naturally builds ties between human and machine vision. In this paper, we present a novel approach for translating an object photo to a sketch, mimicking the human sketching process. This is an extremely challenging task because the photo and sketch domains differ significantly. Furthermore, human sketches exhibit various levels of sophistication and abstraction even when depicting the same object instance in a reference photo. This means that even if photo-sketch pairs are available, they only provide weak supervision signal to learn a translation model. Compared with existing supervised approaches that solve the problem of D(E(photo)) -> sketch, where E($\cdot$) and D($\cdot$) denote encoder and decoder respectively, we take advantage of the inverse problem (e.g., D(E(sketch)) -> photo), and combine with the unsupervised learning tasks of within-domain reconstruction, all within a multi-task learning framework. Compared with existing unsupervised approaches based on cycle consistency (i.e., D(E(D(E(photo)))) -> photo), we introduce a shortcut consistency enforced at the encoder bottleneck (e.g., D(E(photo)) -> photo) to exploit the additional self-supervision. Both qualitative and quantitative results show that the proposed model is superior to a number of state-of-the-art alternatives. We also show that the synthetic sketches can be used to train a better fine-grained sketch-based image retrieval (FG-SBIR) model, effectively alleviating the problem of sketch data scarcity.

This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.

北京阿比特科技有限公司