亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment. However, most existing adaptive learning rates are for online learning problems with a minimax regret of $\Theta(\sqrt{T})$ for the number of rounds $T$, and there are only a few studies on adaptive learning rates for problems with a minimax regret of $\Theta(T^{2/3})$, which include several important problems dealing with indirect feedback. To address this limitation, we establish a new adaptive learning rate framework for problems with a minimax regret of $\Theta(T^{2/3})$. Our learning rate is designed by matching the stability, penalty, and bias terms that naturally appear in regret upper bounds for problems with a minimax regret of $\Theta(T^{2/3})$. As applications of this framework, we consider two major problems dealing with indirect feedback: partial monitoring and graph bandits. We show that FTRL with our learning rate and the Tsallis entropy regularizer improves existing Best-of-Both-Worlds (BOBW) regret upper bounds, which achieve simultaneous optimality in the stochastic and adversarial regimes. The resulting learning rate is surprisingly simple compared to the existing learning rates for BOBW algorithms for problems with a minimax regret of $\Theta(T^{2/3})$.

相關內容

In the metric distortion problem there is a set of candidates $C$ and voters $V$ in the same metric space. The goal is to select a candidate minimizing the social cost: the sum of distances of the selected candidate from all the voters, and the challenge arises from the algorithm receiving only ordinaL input: each voter's ranking of candidate, while the objective function is cardinal, determined by the underlying metric. The distortion of an algorithm is its worst-case approximation factor of the optimal social cost. A key concept here is the (p,q)-veto core, with $p\in \Delta(V)$ and $q\in \Delta(C)$ being normalized weight vectors representing voters' veto power and candidates' support, respectively. The (p,q)-veto core corresponds to a set of winners from a specific class of deterministic algorithms. Notably, the optimal distortion of $3$ is obtained from this class, by selecting veto core candidates using uniform $p$ and $q$ proportional to candidates' plurality scores. Bounding the distortion of other algorithms from this class is an open problem. Our contribution is twofold. First, we establish upper bounds on the distortion of candidates from the (p,q)-veto core for arbitrary weight vectors $p$ and $q$. Second, we revisit the metric distortion problem through the \emph{learning-augmented} framework, which equips the algorithm with a (machine-learned) prediction regarding the optimal candidate. The quality of this prediction is unknown, and the goal is to optimize the algorithm's performance under accurate predictions (consistency), while simultaneously providing worst-case guarantees under arbitrarily inaccurate predictions (robustness). We propose an algorithm that chooses candidates from the (p,q)-veto core, using a prediction-guided q vector and, leveraging our distortion bounds, we prove that this algorithm achieves the optimal robustness-consistency trade-off.

Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $\beta$, as well as to the quality of the preference data. We analyze the impact of $\beta$ and data quality on DPO, uncovering that optimal $\beta$ values vary with the informativeness of pairwise data. Addressing the limitations of static $\beta$ values, we introduce a novel framework that dynamically calibrates $\beta$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $\beta$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $\beta$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{//github.com/junkangwu/beta-DPO}.

Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their applicability. To overcome these, we introduce a new GPU-based error-bounded scientific lossy compressor named cuSZ-$i$, with the following contributions: (1) A novel GPU-optimized interpolation-based prediction method significantly improves the compression ratio and decompression data quality. (2) The Huffman encoding module in cuSZ-$i$ is optimized for better efficiency. (3) cuSZ-$i$ is the first to integrate the NVIDIA Bitcomp-lossless as an additional compression-ratio-enhancing module. Evaluations show that cuSZ-$i$ significantly outperforms other latest GPU-based lossy compressors in compression ratio under the same error bound (hence, the desired quality), showcasing a 476% advantage over the second-best. This leads to cuSZ-$i$'s optimized performance in several real-world use cases.

Although end-to-end robot learning has shown some success for robot manipulation, the learned policies are often not sufficiently robust to variations in object pose or geometry. To improve the policy generalization, we introduce spatially-grounded parameterized motion primitives in our method HACMan++. Specifically, we propose an action representation consisting of three components: what primitive type (such as grasp or push) to execute, where the primitive will be grounded (e.g. where the gripper will make contact with the world), and how the primitive motion is executed, such as parameters specifying the push direction or grasp orientation. These three components define a novel discrete-continuous action space for reinforcement learning. Our framework enables robot agents to learn to chain diverse motion primitives together and select appropriate primitive parameters to complete long-horizon manipulation tasks. By grounding the primitives on a spatial location in the environment, our method is able to effectively generalize across object shape and pose variations. Our approach significantly outperforms existing methods, particularly in complex scenarios demanding both high-level sequential reasoning and object generalization. With zero-shot sim-to-real transfer, our policy succeeds in challenging real-world manipulation tasks, with generalization to unseen objects. Videos can be found on the project website: //sgmp-rss2024.github.io.

We present a divergence-free and $H(div)$-conforming hybridized discontinuous Galerkin (HDG) method and a computationally efficient variant called embedded-HDG (E-HDG) for solving stationary incompressible viso-resistive magnetohydrodynamic (MHD) equations. The proposed E-HDG approach uses continuous facet unknowns for the vector-valued solutions (velocity and magnetic fields) while it uses discontinuous facet unknowns for the scalar variable (pressure and magnetic pressure). This choice of function spaces makes E-HDG computationally far more advantageous, due to the much smaller number of degrees of freedom, compared to the HDG counterpart. The benefit is even more significant for three-dimensional/high-order/fine mesh scenarios. On simplicial meshes, the proposed methods with a specific choice of approximation spaces are well-posed for linear(ized) MHD equations. For nonlinear MHD problems, we present a simple approach exploiting the proposed linear discretizations by using a Picard iteration. The beauty of this approach is that the divergence-free and $H(div)$-conforming properties of the velocity and magnetic fields are automatically carried over for nonlinear MHD equations. We study the accuracy and convergence of our E-HDG method for both linear and nonlinear MHD cases through various numerical experiments, including two- and three-dimensional problems with smooth and singular solutions. The numerical examples show that the proposed methods are pressure robust, and the divergence of the resulting velocity and magnetic fields is machine zero for both smooth and singular problems.

When verifying liveness properties on a transition system, it is often necessary to discard spurious violating paths by making assumptions on which paths represent realistic executions. Capturing that some property holds under such an assumption in a logical formula is challenging and error-prone, particularly in the modal $\mu$-calculus. In this paper, we present template formulae in the modal $\mu$-calculus that can be instantiated to a broad range of liveness properties. We consider the following assumptions: progress, justness, weak fairness, strong fairness, and hyperfairness, each with respect to actions. The correctness of these formulae has been proven.

We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrogate losses: cost-sensitive comp-sum losses and cost-sensitive constrained losses. Minimizing these loss functions leads to new cardinality-aware algorithms that we describe in detail in the case of both top-$k$ and threshold-based classifiers. We establish $H$-consistency bounds for our cardinality-aware surrogate loss functions, thereby providing a strong theoretical foundation for our algorithms. We report the results of extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and SVHN datasets demonstrating the effectiveness and benefits of our cardinality-aware algorithms.

We develop domain theory in constructive and predicative univalent foundations (also known as homotopy type theory). That we work predicatively means that we do not assume Voevodsky's propositional resizing axioms. Our work is constructive in the sense that we do not rely on excluded middle or the axiom of (countable) choice. Domain theory studies so-called directed complete posets (dcpos) and Scott continuous maps between them and has applications in a variety of fields, such as programming language semantics, higher-type computability and topology. A common approach to deal with size issues in a predicative foundation is to work with information systems, abstract bases or formal topologies rather than dcpos, and approximable relations rather than Scott continuous functions. In our type-theoretic approach, we instead accept that dcpos may be large and work with type universes to account for this. A priori one might expect that iterative constructions of dcpos may result in a need for ever-increasing universes and are predicatively impossible. We show, through a careful tracking of type universe parameters, that such constructions can be carried out in a predicative setting. In particular, we give a predicative reconstruction of Scott's $D_\infty$ model of the untyped $\lambda$-calculus. Our work is formalised in the Agda proof assistant and its ability to infer universe levels has been invaluable for our purposes.

We study the edge-coloring problem in simple $n$-vertex $m$-edge graphs with maximum degree $\Delta$. This is one of the most classical and fundamental graph-algorithmic problems. Vizing's celebrated theorem provides $(\Delta+1)$-edge-coloring in $O(m\cdot n)$ deterministic time. This running time was improved to $O\left(m\cdot\min\left\{\Delta\cdot\log n,\sqrt{n}\right\}\right)$, and very recently to randomized $\tilde{O}\left(m\cdot n^{1/3}\right)$. A randomized $(1+\varepsilon)\Delta$-edge-coloring algorithm can be computed in $O\left(m\cdot\frac{\log^6 n}{\varepsilon^2}\right)$ time, and for large values of $\Delta$, this task requires randomized $O\left(\frac{m\cdot\log\varepsilon^{-1}}{\varepsilon^2}\right)$ time. It was however open if there exists a deterministic near-linear time algorithm for this basic problem. We devise a simple deterministic $(1+\varepsilon)\Delta$-edge-coloring algorithm with running time $O\left(m\cdot\frac{\log n}{\varepsilon}\right)$. A randomized variant of our algorithm has running time $O(m\cdot(\varepsilon^{-18}+\log(\varepsilon\cdot\Delta)))$. We also study edge-coloring of graphs with arboricity at most $\alpha$. A randomized computation of $(\Delta+1)$-edge-coloring requires $\tilde{O}\left(\min\{m\cdot\sqrt{n},m\cdot\Delta\}\cdot\frac{\alpha}{\Delta}\right)$ time. Deterministically, this task can be done in $O\left(m\cdot\alpha^7\cdot\log n\right)$ time. However, for large values of $\alpha$, these algorithms require super-linear time. We devise a deterministic $(\Delta+\varepsilon\alpha)$-edge-coloring algorithm with running time $O\left(\frac{m\cdot\log n}{\varepsilon^7}\right)$. A randomized version of our algorithm requires $O\left(\frac{m\cdot\log n}{\varepsilon}\right)$ expected time. Our algorithm is based on a novel two-way degree-splitting, which we devise in this paper. We believe that this technique is of independent interest.

The "Harmony Lemma", as formulated by Sangiorgi & Walker, establishes the equivalence between the labelled transition semantics and the reduction semantics in the $\pi$-calculus. Despite being a widely known and accepted result for the standard $\pi$-calculus, this assertion has never been rigorously proven, formally or informally. Hence, its validity may not be immediately apparent when considering extensions of the $\pi$-calculus. Contributing to the second challenge of the Concurrent Calculi Formalization Benchmark -- a set of challenges tackling the main issues related to the mechanization of concurrent systems -- we present a formalization of this result for the fragment of the $\pi$-calculus examined in the Benchmark. Our formalization is implemented in Beluga and draws inspiration from the HOAS formalization of the LTS semantics popularized by Honsell et al. In passing, we introduce a couple of useful encoding techniques for handling telescopes and lexicographic induction.

北京阿比特科技有限公司