亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typically accompanied by a sacrifice in performance in other domains. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics. A token-level gating mechanism is introduced to blend the specialists' outputs. A two-stage training strategy accompanied by balanced sampling is designed to ensure stability. To effectively train the fused model, we further construct a high-quality supervised instruction tuning dataset, UltraChat 2, which includes text, code, and mathematical content. This dataset comprises approximately 300,000 instructions and covers a wide range of topics in each domain. Experiments show that our model could simultaneously achieve mastery of the three crucial domains.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · Networking · 預測器/決策函數 · · 估計/估計量 ·
2024 年 4 月 25 日

Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimate network performance without any training. However, they are still poorly understood, exhibit biases with network properties, and their performance is limited. Inspired by the drawbacks of zero-cost proxies, we propose neural graph features (GRAF), simple to compute properties of architectural graphs. GRAF offers fast and interpretable performance prediction while outperforming zero-cost proxies and other common encodings. In combination with other zero-cost proxies, GRAF outperforms most existing performance predictors at a fraction of the cost.

We study the expectation propagation (EP) algorithm for symbol detection in massive multiple-input multiple-output (MIMO) systems. The EP detector shows excellent performance but suffers from a high computational complexity due to the matrix inversion, required in each EP iteration to perform marginal inference on a Gaussian system. We propose an inversion-free variant of the EP algorithm by treating inference on the mean and variance as two separate and simpler subtasks: We study the preconditioned conjugate gradient algorithm for obtaining the mean, which can significantly reduce the complexity and increase stability by relying on the Jacobi preconditioner that proves to fit the EP characteristics very well. For the variance, we use a simple approximation based on linear regression of the Gram channel matrix. Numerical studies on the Rayleigh-fading channel and on a realistic 3GPP channel model reveal the efficiency of the proposed scheme, which offers an attractive performance-complexity tradeoff and even outperforms the original EP detector in high multi-user inference cases where the matrix inversion becomes numerically unstable.

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.

In this paper we consider the filtering problem associated to partially observed McKean-Vlasov stochastic differential equations (SDEs). The model consists of data that are observed at regular and discrete times and the objective is to compute the conditional expectation of (functionals) of the solutions of the SDE at the current time. This problem, even the ordinary SDE case is challenging and requires numerical approximations. Based upon the ideas in [3, 12] we develop a new particle filter (PF) and multilevel particle filter (MLPF) to approximate the afore-mentioned expectations. We prove under assumptions that, for $\epsilon>0$, to obtain a mean square error of $\mathcal{O}(\epsilon^2)$ the PF has a cost per-observation time of $\mathcal{O}(\epsilon^{-5})$ and the MLPF costs $\mathcal{O}(\epsilon^{-4})$ (best case) or $\mathcal{O}(\epsilon^{-4}\log(\epsilon)^2)$ (worst case). Our theoretical results are supported by numerical experiments.

Decomposition has been the mainstream approach in classic mathematical programming for multi-objective optimization and multi-criterion decision-making. However, it was not properly studied in the context of evolutionary multi-objective optimization (EMO) until the development of multi-objective evolutionary algorithm based on decomposition (MOEA/D). In this two-part survey series, we use MOEA/D as the representative of decomposition-based EMO to review the up-to-date development in this area, and systematically and comprehensively analyze its research landscape. In the first part, we present a comprehensive survey of the development of MOEA/D from its origin to the current state-of-the-art approaches. In order to be self-contained, we start with a step-by-step tutorial that aims to help a novice quickly get onto the working mechanism of MOEA/D. Then, selected major developments of MOEA/D are reviewed according to its core design components including weight vector settings, subproblem formulations, selection mechanisms and reproduction operators. Besides, we also overview some selected advanced topics for constraint handling, optimization in dynamic and uncertain environments, computationally expensive objective functions, and preference incorporation. In the final part, we shed some light on emerging directions for future developments.

Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process' task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that enhances diffusion models by supporting a broader range of forward processes beyond the fixed linear Gaussian. We also propose a novel parameterization technique for learning the forward process. Our framework provides an end-to-end, simulation-free optimization objective, effectively minimizing a variational upper bound on the negative log-likelihood. Experimental results demonstrate NFDM's strong performance, evidenced by state-of-the-art likelihood estimation. Furthermore, we investigate NFDM's capacity for learning generative dynamics with specific characteristics, such as deterministic straight lines trajectories. This exploration underscores NFDM's versatility and its potential for a wide range of applications.

We propose a theoretical framework to compute, rapidly and accurately, the signal-to-noise ratio at the output of spatial-division multiplexing (SDM) linear MIMO equalizers with arbitrary numbers of spatial modes and filter taps and demonstrate three orders of magnitude of speed-up compared to Monte Carlo simulations.

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

北京阿比特科技有限公司