欧美精品日韩精品国内精品,无遮挡又黄又刺激的免费视频,国产在线无码精品麻豆不卡,日韩不卡1卡2卡三卡网站乱码,开心激情五月天色婷婷网

Zhihang Yuan,Yuzhang Shang,Yang Zhou,Zhen Dong,Zhe Zhou,Chenhao Xue,Bingzhe Wu,Zhikai Li,Qingyi Gu,Yong Jae Lee,Yan Yan,Beidi Chen,Guangyu Sun,Kurt Keutzer

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this domain. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems, such as why LLMs are memory-bound, how much memory and computation they need, and how to choose the right hardware. We systematically collate the latest advancements in efficient LLM inference, covering crucial areas such as model compression (e.g., Knowledge Distillation and Quantization), algorithm improvements (e.g., Early Exit and Mixture-of-Expert), and both hardware and system-level enhancements. Our survey stands out by analyzing these methods with roofline model, helping us understand their impact on memory access and computation. This distinctive approach not only showcases the current research landscape but also delivers valuable insights for practical implementation, positioning our work as an indispensable resource for researchers new to the field as well as for those seeking to deepen their understanding of efficient LLM deployment. The analyze tool, LLM-Viewer, is open-sourced.

相關內容

大語言模型

關注 55

大語言模型是基于海量文本數據訓練的深度學習模型。它不僅能夠生成自然語言文本，還能夠深入理解文本含義，處理各種自然語言任務，如文本摘要、問答、翻譯等。2023年，大語言模型及其在人工智能領域的應用已成為全球科技研究的熱點，其在規模上的增長尤為引人注目，參數量已從最初的十幾億躍升到如今的一萬億。參數量的提升使得模型能夠更加精細地捕捉人類語言微妙之處，更加深入地理解人類語言的復雜性。在過去的一年里，大語言模型在吸納新知識、分解復雜任務以及圖文對齊等多方面都有顯著提升。隨著技術的不斷成熟，它將不斷拓展其應用范圍，為人類提供更加智能化和個性化的服務，進一步改善人們的生活和生產方式。

矩 · 線性的 · CASES · 正交 · SimPLe ·

2024 年 4 月 23 日

Positive Moments Forever: Undecidable and Decidable Cases

Gemma De les Coves,Joshua Graf,Andreas Klingler,Tim Netzer

from arxiv, 17 pages

Is there an algorithm to determine attributes such as positivity or non-zeroness of linear recurrence sequences? This long-standing question is known as Skolem's problem. In this paper, we study the complexity of an equivalent problem, namely the (generalized) moment membership problem for matrices. We show that this problem is decidable for orthogonal, unitary and real eigenvalue matrices, and undecidable for matrices over certain commutative and non-commutative polynomial rings. Our results imply that the positivity problem for simple unitary linear recurrence sequences is decidable, and is undecidable for linear recurrence sequences over the ring of commutative polynomials. As a byproduct, we prove a free version of Polya's theorem.

回合 · 多樣性 · 3D · AI · Prompt ·

2024 年 4 月 22 日

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang,Fan-Yun Sun,Luca Weihs,Eli VanderBilt,Alvaro Herrasti,Winson Han,Jiajun Wu,Nick Haber,Ranjay Krishna,Lingjie Liu,Chris Callison-Burch,Mark Yatskar,Aniruddha Kembhavi,Christopher Clark

from arxiv, Published in CVPR 2024, 21 pages, 27 figures, 2 tables

3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (i.e., GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints. Our large-scale human evaluation shows that annotators prefer Holodeck over manually designed procedural baselines in residential scenes and that Holodeck can produce high-quality outputs for diverse scene types. We also demonstrate an exciting application of Holodeck in Embodied AI, training agents to navigate in novel scenes like music rooms and daycares without human-constructed data, which is a significant step forward in developing general-purpose embodied agents.

Performer · Networking · 優化器 · Performance · 縮放 ·

2024 年 4 月 22 日

Cooperative ISAC Networks: Performance Analysis, Scaling Laws and Optimization

Kaitao Meng,Christos Masouros,Athina P. Petropulu,Lajos Hanzo

from arxiv, 13 pages, 10 figures, this work has been submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2403.20228

Integrated sensing and communication (ISAC) networks are investigated with the objective of effectively balancing the sensing and communication (S&C) performance at the network level. Through the simultaneous utilization of multi-point (CoMP) coordinated joint transmission and distributed multiple-input multiple-output (MIMO) radar techniques, we propose an innovative networked ISAC scheme, where multiple transceivers are employed for collaboratively enhancing the S&C services. Then, the potent tool of stochastic geometry is exploited for characterizing the S&C performance, which allows us to illuminate the key cooperative dependencies in the ISAC network and optimize salient network-level parameters. Remarkably, the Cramer-Rao lower bound (CRLB) expression of the localization accuracy derived unveils a significant finding: Deploying N ISAC transceivers yields an enhanced average cooperative sensing performance across the entire network, in accordance with the ln^2N scaling law. Crucially, this scaling law is less pronounced in comparison to the performance enhancement of N^2 achieved when the transceivers are equidistant from the target, which is primarily due to the substantial path loss from the distant base stations (BSs) and leads to reduced contributions to sensing performance gain. Moreover, we derive a tight expression of the communication rate, and present a low-complexity algorithm to determine the optimal cooperative cluster size. Based on our expression derived for the S&C performance, we formulate the optimization problem of maximizing the network performance in terms of two joint S&C metrics. To this end, we jointly optimize the cooperative BS cluster sizes and the transmit power to strike a flexible tradeoff between the S&C performance.

表示 · INFORMS · 三角形化 · HTTPS · 全 ·

2024 年 4 月 20 日

DMesh: A Differentiable Representation for General Meshes

Sanghyun Son,Matheus Gadelha,Yang Zhou,Zexiang Xu,Ming C. Lin,Yi Zhou

from arxiv, 17 pages, 9 figures

We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and formulate probability of faces to exist on our desired mesh in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: //sonsang.github.io/dmesh-project.

噪聲 · FAST · 去噪 · 圖像降噪 · 相互獨立的 ·

2024 年 4 月 18 日

Back to Basics: Fast Denoising Iterative Algorithm

Deborah Pereg

We introduce Back to Basics (BTB), a fast iterative algorithm for noise reduction. Our method is computationally efficient, does not require training or ground truth data, and can be applied in the presence of independent noise, as well as correlated (coherent) noise, where the noise level is unknown. We examine three study cases: natural image denoising in the presence of additive white Gaussian noise, Poisson-distributed image denoising, and speckle suppression in optical coherence tomography (OCT). Experimental results demonstrate that the proposed approach can effectively improve image quality, in challenging noise settings. Theoretical guarantees are provided for convergence stability.

Learning · Processing（編程語言） · Machine Learning · 評論員 · Vision ·

2022 年 6 月 30 日

Causal Machine Learning: A Survey and Open Problems

Jean Kaddour,Aengus Lynch,Qi Liu,Matt J. Kusner,Ricardo Silva

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This allows one to reason about the effects of changes to this process (i.e., interventions) and what would have happened in hindsight (i.e., counterfactuals). We categorize work in \causalml into five groups according to the problems they tackle: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, (5) causal reinforcement learning. For each category, we systematically compare its methods and point out open problems. Further, we review modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

圖 · 知識圖譜 · 知識表示 · Machine Learning · Processing（編程語言） ·

2021 年 12 月 31 日

What is Event Knowledge Graph: A Survey

Saiping Guan,Xueqi Cheng,Long Bai,Fujun Zhang,Zixuan Li,Yutao Zeng,Xiaolong Jin,Jiafeng Guo

Besides entity-centric knowledge, usually organized as Knowledge Graph (KG), events are also an essential kind of knowledge in the world, which trigger the spring up of event-centric knowledge representation form like Event KG (EKG). It plays an increasingly important role in many machine learning and artificial intelligence applications, such as intelligent search, question-answering, recommendation, and text generation. This paper provides a comprehensive survey of EKG from history, ontology, instance, and application views. Specifically, to characterize EKG thoroughly, we focus on its history, definitions, schema induction, acquisition, related representative graphs/systems, and applications. The development processes and trends are studied therein. We further summarize perspective directions to facilitate future research on EKG.

學成 · 深度學習 · 可辨認的 · MoDELS · 目標跟蹤 ·

2019 年 7 月 31 日

Deep Learning in Video Multi-Object Tracking: A Survey

Gioele Ciaparrone,Francisco Luque Sánchez,Siham Tabik,Luigi Troiano,Roberto Tagliaferri,Francisco Herrera

from arxiv, New in v2: corrected typos and various minor mistakes. Submitted to Neurocomputing. Main text: 25 pages, 5 figures, 6 tables. Summary table in appendix at the end of the paper

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.

Taxonomy · Machine Learning · IR · 有向 · 推薦系統 ·

2018 年 5 月 13 日

Explainable Recommendation: A Survey and New Perspectives

Yongfeng Zhang,Xu Chen

from arxiv, 88 pages

Explainable Recommendation refers to the personalized recommendation algorithms that address the problem of why -- they not only provide the user with the recommendations, but also make the user aware why such items are recommended by generating recommendation explanations, which help to improve the effectiveness, efficiency, persuasiveness, and user satisfaction of recommender systems. In recent years, a large number of explainable recommendation approaches -- especially model-based explainable recommendation algorithms -- have been proposed and adopted in real-world systems. In this survey, we review the work on explainable recommendation that has been published in or before the year of 2018. We first high-light the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation itself in terms of three aspects: 1) We provide a chronological research line of explanations in recommender systems, including the user study approaches in the early years, as well as the more recent model-based approaches. 2) We provide a taxonomy for explainable recommendation algorithms, including user-based, item-based, model-based, and post-model explanations. 3) We summarize the application of explainable recommendation in different recommendation tasks, including product recommendation, social recommendation, POI recommendation, etc. We devote a chapter to discuss the explanation perspectives in the broader IR and machine learning settings, as well as their relationship with explainable recommendation research. We end the survey by discussing potential future research directions to promote the explainable recommendation research area.

FPGA · 卷積神經網絡 · Neural Networks · 卷積 · 層 ·

2016 年 9 月 30 日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Roberto DiCecco,Griffin Lacey,Jasmina Vasiljevic,Paul Chow,Graham Taylor,Shawki Areibi

Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs to significantly improve their classification and training times. With these improvements, many frameworks have become available for implementing CNNs on both CPUs and GPUs, with no support for FPGA implementations. In this work we present a modified version of the popular CNN framework Caffe, with FPGA support. This allows for classification using CNN models and specialized FPGA implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations. To validate the framework, we use the Xilinx SDAccel environment to implement an FPGA-based Winograd convolution engine and show that the FPGA layer can be used alongside other layers running on a host processor to run several popular CNNs (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework achieves 50 GFLOPS across 3x3 convolutions in the benchmarks. This is achieved within a practical framework, which will aid in future development of FPGA-based CNNs.