亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A set of probabilistic forecasts is calibrated if each prediction of the forecaster closely approximates the empirical distribution of outcomes on the subset of timesteps where that prediction was made. We study the fundamental problem of online calibrated forecasting of binary sequences, which was initially studied by Foster & Vohra (1998). They derived an algorithm with $O(T^{2/3})$ calibration error after $T$ time steps, and showed a lower bound of $\Omega(T^{1/2})$. These bounds remained stagnant for two decades, until Qiao & Valiant (2021) improved the lower bound to $\Omega(T^{0.528})$ by introducing a combinatorial game called sign preservation and showing that lower bounds for this game imply lower bounds for calibration. In this paper, we give the first improvement to the $O(T^{2/3})$ upper bound on calibration error of Foster & Vohra. We do this by introducing a variant of Qiao & Valiant's game that we call sign preservation with reuse (SPR). We prove that the relationship between SPR and calibrated forecasting is bidirectional: not only do lower bounds for SPR translate into lower bounds for calibration, but algorithms for SPR also translate into new algorithms for calibrated forecasting. We then give an improved \emph{upper bound} for the SPR game, which implies, via our equivalence, a forecasting algorithm with calibration error $O(T^{2/3 - \varepsilon})$ for some $\varepsilon > 0$, improving Foster & Vohra's upper bound for the first time. Using similar ideas, we then prove a slightly stronger lower bound than that of Qiao & Valiant, namely $\Omega(T^{0.54389})$. Our lower bound is obtained by an oblivious adversary, marking the first $\omega(T^{1/2})$ calibration lower bound for oblivious adversaries.

相關內容

Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their applicability. To overcome these, we introduce a new GPU-based error-bounded scientific lossy compressor named cuSZ-$i$, with the following contributions: (1) A novel GPU-optimized interpolation-based prediction method significantly improves the compression ratio and decompression data quality. (2) The Huffman encoding module in cuSZ-$i$ is optimized for better efficiency. (3) cuSZ-$i$ is the first to integrate the NVIDIA Bitcomp-lossless as an additional compression-ratio-enhancing module. Evaluations show that cuSZ-$i$ significantly outperforms other latest GPU-based lossy compressors in compression ratio under the same error bound (hence, the desired quality), showcasing a 476% advantage over the second-best. This leads to cuSZ-$i$'s optimized performance in several real-world use cases.

The stable and fair division of profits/costs is a central concern in economics. The core, which ensures stability, has long been the gold standard for profit/cost sharing in cooperative games. Shapley and Shubik([SS71])'s classic work on the assignment game revealed that core imputations can be disproportionately favoring certain agents. Recent work ([Vaz24]) gave leximin and leximax core imputations for this game, achieving better fairness properties. We explore these fairness notions for the cores of three cooperative games: the max-flow game, the minimum spanning tree (MST) game, and the bipartite $b$-matching game. For all three games we give examples to show that an arbitrary core imputation can be excessively unfair to certain agents. Leximin and leximax core imputations are natural extensions of the widely used max-min and min-max fairness notions. We show that finding such imputations in the core is NP-hard for the max-flow and MST games, and likely so for $b$-matching as well. To address this, we introduce the concept of Dual-Consistent Core (DCC) imputations, which are characterized by solutions to the dual linear programs. We give polynomial time algorithms for computing leximin and leximax DCC imputations for all three games. These games have numerous applications and these imputations will provide a more fair way of distributing profit among agents for them.

The Check tools automate formal memory consistency model and security verification of processors by analyzing abstract models of microarchitectures, called $\mu$SPEC models. Despite the efficacy of this approach, a verification gap between $\mu$SPEC models, which must be manually written, and RTL limits the Check tools' broad adoption. Our prior work, called RTL2$\mu$SPEC, narrows this gap by automatically synthesizing formally verified $\mu$SPEC models from SystemVerilog implementations of simple processors. But, RTL2$\mu$SPEC assumes input designs where an instruction (e.g., a load) cannot exhibit more than one microarchitectural execution path ($\mu$PATH, e.g., a cache hit or miss path) -- its single-execution-path assumption. In this paper, we first propose an automated approach and tool, called RTL2M$\mu$PATH, that resolves RTL2$\mu$SPEC's single-execution-path assumption. Given a SystemVerilog processor design, instruction encodings, and modest design metadata, RTL2M$\mu$PATH finds a complete set of formally verified $\mu$PATHs for each instruction. Next, we make an important observation: an instruction that can exhibit more than one $\mu$PATH strongly indicates the presence of a microarchitectural side channel in the input design. Based on this observation, we then propose an automated approach and tool, called SynthLC, that extends RTL2M$\mu$PATH with a symbolic information flow analysis to support synthesizing a variety of formally verified leakage contracts from SystemVerilog processor designs. Leakage contracts are foundational to state-of-the-art defenses against hardware side-channel attacks. SynthLC is the first automated methodology for formally verifying hardware adherence to them.

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

Hypothesis tests calibrated by (re)sampling methods (such as permutation, rank and bootstrap tests) are useful tools for statistical analysis, at the computational cost of requiring Monte-Carlo sampling for calibration. It is common and almost universal practice to execute such tests with predetermined and large number of Monte-Carlo samples, and disregard any randomness from this sampling at the time of drawing and reporting inference. At best, this approach leads to computational inefficiency, and at worst to invalid inference. That being said, a number of approaches in the literature have been proposed to adaptively guide analysts in choosing the number of Monte-Carlo samples, by sequentially deciding when to stop collecting samples and draw inference. These works introduce varying competing notions of what constitutes "valid" inference, complicating the landscape for analysts seeking suitable methodology. Furthermore, the majority of these approaches solely guarantee a meaningful estimate of the testing outcome, not the $p$-value itself $\unicode{x2014}$ which is insufficient for many practical applications. In this paper, we survey the relevant literature, and build bridges between the scattered validity notions, highlighting some of their complementary roles. We also introduce a new practical methodology that provides an estimate of the $p$-value of the Monte-Carlo test, endowed with practically relevant validity guarantees. Moreover, our methodology is sequential, updating the $p$-value estimate after each new Monte-Carlo sample has been drawn, while retaining important validity guarantees regardless of the selected stopping time. We conclude this paper with a set of recommendations for the practitioner, both in terms of selection of methodology and manner of reporting results.

Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning. Such automatic instruction collection pipelines, however, inadvertently introduce significant variability in data quality. This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment, to compress this vast corpus of machine-generated multimodal instructions to a compact and high-quality form: (i) For human preference alignment, we have collected a machine-generated multimodal instruction dataset and established a comprehensive set of both subjective and objective criteria to guide the data quality assessment critically from human experts. By doing so, a reward model was trained on the annotated dataset to internalize the nuanced human understanding of instruction alignment. (ii) For LLM preference alignment, given the instruction selected by the reward model, we propose leveraging the inner LLM used in MLLM to align the writing style of visual instructions with that of the inner LLM itself, resulting in LLM-aligned instruction improvement. Extensive experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%. Impressively, by aggressively reducing the total training sample size from 158k to 14k (9$\times$ smaller), our model consistently outperforms its full-size dataset counterpart across various MLLM benchmarks. Our project is available at //github.com/DCDmllm/Align2LLaVA.

Quantum computations are very important branch of modern cryptology. According to the number of working physical qubits available in general-purpose quantum computers and in quantum annealers, there is no coincidence, that nowadays quantum annealers allow to solve larger problems. In this paper we focus on solving discrete logarithm problem (DLP) over binary fields using quantum annealing. It is worth to note, that however solving DLP over prime fields using quantum annealing has been considered before, no author, until now, has considered DLP over binary fields using quantum annealing. Therefore, in this paper, we aim to bridge this gap. We present a polynomial transformation of the discrete logarithm problem over binary fields to the Quadratic Unconstrained Binary Optimization (QUBO) problem, using approximately $3n^2$ logical variables for the binary field $\mathbb{F}_{2^n}$. In our estimations, we assume the existence of an optimal normal base of II type in the given fields. Such a QUBO instance can then be solved using quantum annealing.

Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines. To benefit the broader research community, we open-source SciDFM at //huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0.

Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

北京阿比特科技有限公司