亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We have observed a distinctive quantization-related behavior in the LLaMA3/3.1-70B models that is absent in both the LLaMA2-70B and LLaMA3/3.1/3.2-1B/3B/8B/405B models. Quantization is a crucial technique for deploying large language models (LLMs) efficiently. The impact of W8A8 post-training quantization on model accuracy, especially on the recently released LLaMA3/3.1 model series, remains contentious. In this paper, we explore three key questions: What makes the LLaMA3-70B model series uniquely vulnerable to quantization? Why is this the case? And how can the issue be addressed? We empirically investigate multiple LLMs featured on an open LLM leaderboard, discovering that the LLaMA3-70B model series have a unique accuracy degradation behavior with W8A8 per-channel post-training quantization. In contrast, other model series such as LLaMA2, LLaMA3/3.1-8B, LLaMA3.2, Qwen, Mixtral, Mistral, Phi-3, and Falcon demonstrate robust performance with W8A8. Contrary to previous assertions attributing degradation to the large dynamic range of activations, our findings indicate that the weight distribution of the LLaMA3-70B is the primary factor behind the vulnerability. By meticulously analyzing the distinct characteristics of weight distributions across Transformer blocks, we propose two solutions that make different tradeoffs in hardware/software overhead. First, we propose a mixed strategy where less than 3\% of the layers employ finer per-group W8A8 quantization granularity. Second, we introduce a bi-smoothing strategy that balances quantization errors between weights and activations while maintaining per-channel quantization throughout. Experimental results demonstrate that both strategies effectively preserve the accuracy of the entire LLaMA3-70B model series under W8A8 quantization, achieving performance on par with their FP16 counterparts.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 表示 · MoDELS · 稀疏編碼 · 稀疏 ·
2024 年 11 月 8 日

Representing and exploiting multivariate signals require capturing complex relations between variables. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. Our new formulation allows to include a priori knowledge on signal properties, as well as on underlying graphs and their coefficients. We show the capability of our method to reconstruct graphs from signals in multiple synthetic settings, where our model outperforms previous baselines. Then, we exploit graph-dictionary representations in a motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods relying on many more features.

Sensing-assisted communication is critical to enhance the system efficiency in integrated sensing and communication (ISAC) systems. However, most existing literature focuses on large-scale channel sensing, without considering the impacts of small-scale channel aging. In this paper, we investigate a dual-scale channel estimation framework for sensing-assisted communication, where both large-scale channel sensing and small-scale channel aging are considered. By modeling the channel aging effect with block fading and incorporating CRB (Cram\'er-Rao bound)-based sensing errors, we optimize both the time duration of large-scale detection and the frequency of small-scale update within each subframe to maximize the achievable rate while satisfying sensing requirements. Since the formulated optimization problem is non-convex, we propose a two-dimensional search-based optimization algorithm to obtain the optimal solution. Simulation results demonstrate the superiority of our proposed optimal design over three counterparts.

Simplicity bias, the propensity of deep models to over-rely on simple features, has been identified as a potential reason for limited out-of-distribution generalization of neural networks (Shah et al., 2020). Despite the important implications, this phenomenon has been theoretically confirmed and characterized only under strong dataset assumptions, such as linear separability (Lyu et al., 2021). In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.

Evolutionary Multi-Objective Optimization Algorithms (EMOAs) are widely employed to tackle problems with multiple conflicting objectives. Recent research indicates that not all objectives are equally important to the decision-maker (DM). In the context of interactive EMOAs, preference information elicited from the DM during the optimization process can be leveraged to identify and discard irrelevant objectives, a crucial step when objective evaluations are computationally expensive. However, much of the existing literature fails to account for the dynamic nature of DM preferences, which can evolve throughout the decision-making process and affect the relevance of objectives. This study addresses this limitation by simulating dynamic shifts in DM preferences within a ranking-based interactive algorithm. Additionally, we propose methods to discard outdated or conflicting preferences when such shifts occur. Building on prior research, we also introduce a mechanism to safeguard relevant objectives that may become trapped in local or global optima due to the diminished correlation with the DM-provided rankings. Our experimental results demonstrate that the proposed methods effectively manage evolving preferences and significantly enhance the quality and desirability of the solutions produced by the algorithm.

Recent advances in large language models (LLMs) show the potential of using LLMs as evaluators for assessing the quality of text generations from LLMs. However, applying LLM evaluators naively to compare or judge between different systems can lead to unreliable results due to the intrinsic win rate estimation bias of LLM evaluators. In order to mitigate this problem, we propose two calibration methods, Bayesian Win Rate Sampling (BWRS) and Bayesian Dawid-Skene, both of which leverage Bayesian inference to more accurately infer the true win rate of generative language models. We empirically validate our methods on six datasets covering story generation, summarization, and instruction following tasks. We show that both our methods are effective in improving the accuracy of win rate estimation using LLMs as evaluators, offering a promising direction for reliable automatic text quality evaluation.

We present a computational formulation for the approximate version of several variational inequality problems, investigating their computational complexity and establishing PPAD-completeness. Examining applications in computational game theory, we specifically focus on two key concepts: resilient Nash equilibrium, and multi-leader-follower games -- domains traditionally known for the absence of general solutions. In the presence of standard assumptions and relaxation techniques, we formulate problem versions for such games that are expressible in terms of variational inequalities, ultimately leading to proofs of PPAD-completeness.

Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation, offering exceptional flexibility and performance across a diverse range of applications. INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions, providing critical advantages such as resolution independence, memory efficiency, and generalisation beyond discretised data structures. Their ability to solve complex inverse problems makes them particularly effective for tasks including audio reconstruction, image representation, 3D object reconstruction, and high-dimensional data synthesis. This survey provides a comprehensive review of state-of-the-art INR methods, introducing a clear taxonomy that categorises them into four key areas: activation functions, position encoding, combined strategies, and network structure optimisation. We rigorously analyse their critical properties, such as full differentiability, smoothness, compactness, and adaptability to varying resolutions while also examining their strengths and limitations in addressing locality biases and capturing fine details. Our experimental comparison offers new insights into the trade-offs between different approaches, showcasing the capabilities and challenges of the latest INR techniques across various tasks. In addition to identifying areas where current methods excel, we highlight key limitations and potential avenues for improvement, such as developing more expressive activation functions, enhancing positional encoding mechanisms, and improving scalability for complex, high-dimensional data. This survey serves as a roadmap for researchers, offering practical guidance for future exploration in the field of INRs. We aim to foster new methodologies by outlining promising research directions for INRs and applications.

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

北京阿比特科技有限公司