99视频在线播放喷射,中文字幕AV一区二区三区亭亭色,国产裸体美女自慰免费看,黄片大全一区二区,亚洲婷婷一区二区三区在线播放

Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices, such as mobile phones where multiple tasks serve a single user for daily activities, and data centers, where various requests are raised from millions of users, as seen with large language models. To reduce the costly computational and memory requirements of these workloads, various efficient sparsification approaches have been introduced, resulting in widespread sparsity across different types of DNN models. In this context, there is an emerging need for scheduling sparse multi-DNN workloads, a problem that is largely unexplored in previous literature. This paper systematically analyses the use-cases of multiple sparse DNNs and investigates the opportunities for optimizations. Based on these findings, we propose Dysta, a novel bi-level dynamic and static scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Both static and dynamic components of Dysta are jointly designed at the software and hardware levels, respectively, to improve and refine the scheduling approach. To facilitate future progress in the study of this class of workloads, we construct a public benchmark that contains sparse multi-DNN workloads across different deployment scenarios, spanning from mobile phones and AR/VR wearables to data centers. A comprehensive evaluation on the sparse multi-DNN benchmark demonstrates that our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time. Our artifacts and code are publicly available at: //github.com/SamsungLabs/Sparse-Multi-DNN-Scheduling.

相關內容

稀疏

關注 1

MCMC · 推斷 · MoDELS · 貝葉斯推斷 · 馬爾可夫鏈蒙特卡羅 ·

2023 年 11 月 29 日

Liesel: A Probabilistic Programming Framework for Developing Semi-Parametric Regression Models and Custom Bayesian Inference Algorithms

Hannes Riebl,Paul F. V. Wiemann,Thomas Kneib

from arxiv, 27 pages, 10 figures, updated for compatibility with Liesel v0.2.7, added second case study

Liesel is a new probabilistic programming framework developed with the aim of supporting research on Bayesian inference based on Markov chain Monte Carlo (MCMC) simulations in general and semi-parametric regression specifications in particular. Its three main components are (i) an R interface (RLiesel) for the configuration of an initial semi-parametric regression model, (ii) a graph-based model building library, where the initial model graph can be manipulated to incorporate new research ideas, and (iii) an MCMC library for designing modular inference algorithms combining multiple types of well-tested and possibly customized MCMC kernels. The graph builder as well as the MCMC library are implemented in Python, relying on JAX as a numerical computing library, and can therefore benefit from the latest machine learning technology such as automatic differentiation, just-in-time (JIT) compilation, and the use of high-performance computing devices such as tensor processing units (TPUs). Liesel provides all required tools for efficient and reliable statistical research on complex models and estimation algorithms. Its modular design allows users to expand the model library and inference algorithms, offering the flexibility and customization options to tailor the software to any specific research needs.

異常檢測 · 變換 · 模型評估 · surge · MoDELS ·

2023 年 11 月 29 日

TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection

Ijaz Ul Haq,Byung Suk Lee

from arxiv, 32 pages , 4 figures, It will submitted to a journal

The surge in real-time data collection across various industries has underscored the need for advanced anomaly detection in both univariate and multivariate time series data. Traditional methods, while comprehensive, often struggle to capture the complex interdependencies in such data. This paper introduces TransNAS-TSAD, a novel framework that synergizes transformer architecture with neural architecture search (NAS), enhanced through NSGA-II algorithm optimization. This innovative approach effectively tackles the complexities of both univariate and multivariate time series, balancing computational efficiency with detection accuracy. Our evaluation reveals that TransNAS-TSAD surpasses conventional anomaly detection models, demonstrating marked improvements in diverse data scenarios. We also propose the Efficiency-Accuracy-Complexity Score (EACS) as a new metric for assessing model performance, emphasizing the crucial balance between accuracy and computational resources. TransNAS-TSAD sets a new benchmark in time series anomaly detection, offering a versatile, efficient solution for complex real-world applications. This research paves the way for future developments in the field, highlighting its potential in a wide range of industry applications.

控制器 · 二次規劃 · 線性的 · 整流線性 · 修正線性單元/整流線性單元 ·

2023 年 11 月 29 日

ReLU-QP: A GPU-Accelerated Quadratic Programming Solver for Model-Predictive Control

Arun L. Bishop,John Z. Zhang,Swaminathan Gurumurthy,Kevin Tracy,Zachary Manchester

from arxiv, submitted to ICRA 2024

We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the deployment of ReLU-QP on GPUs using standard machine-learning toolboxes. We evaluate the performance of ReLU-QP across three model-predictive control (MPC) benchmarks: stabilizing random linear dynamical systems with control limits, balancing an Atlas humanoid robot on a single foot, and tracking whole-body reference trajectories on a quadruped equipped with a six-degree-of-freedom arm. These benchmarks indicate that ReLU-QP is competitive with state-of-the-art CPU-based solvers for small-to-medium-scale problems and offers order-of-magnitude speed improvements for larger-scale problems.

邊緣計算 · 邊 · Processing（編程語言） · Learning · 可約的 ·

2023 年 11 月 29 日

The AutoSPADA Platform: User-Friendly Edge Computing for Distributed Learning and Data Analytics in Connected Vehicles

Adrian Nilsson,Simon Smith,Jonas Hagmar,Magnus ?nnheim,Mats Jirstrand

from arxiv, 14 pages, 4 figures, 3 tables, 1 algorithm, 1 code listing

Contemporary connected vehicles host numerous applications, such as diagnostics and navigation, and new software is continuously being developed. However, the development process typically requires offline batch processing of large data volumes. In an edge computing approach, data analysts and developers can instead process sensor data directly on computational resources inside vehicles. This enables rapid prototyping to shorten development cycles and reduce the time to create new business values or insights. This paper presents the design, implementation, and operation of the AutoSPADA edge computing platform for distributed data analytics. The platform's design follows scalability, reliability, resource efficiency, privacy, and security principles promoted through mature and industrially proven technologies. In AutoSPADA, computational tasks are general Python scripts, and we provide a library to, for example, read signals from the vehicle and publish results to the cloud. Hence, users only need Python knowledge to use the platform. Moreover, the platform is designed to be extended to support additional programming languages.

Networking · 語言模型化 · MoDELS · 可約的 · 可理解性 ·

2023 年 11 月 29 日

Large Language Models for Networking: Applications, Enabling Techniques, and Challenges

Yudong Huang,Hongyang Du,Xinyuan Zhang,Dusit Niyato,Jiawen Kang,Zehui Xiong,Shuo Wang,Tao Huang

from arxiv, 7 pages, 3 figures, 2 tables

The rapid evolution of network technologies and the growing complexity of network tasks necessitate a paradigm shift in how networks are designed, configured, and managed. With a wealth of knowledge and expertise, large language models (LLMs) are one of the most promising candidates. This paper aims to pave the way for constructing domain-adapted LLMs for networking. Firstly, we present potential LLM applications for vertical network fields and showcase the mapping from natural language to network language. Then, several enabling technologies are investigated, including parameter-efficient finetuning and prompt engineering. The insight is that language understanding and tool usage are both required for network LLMs. Driven by the idea of embodied intelligence, we propose the ChatNet, a domain-adapted network LLM framework with access to various external network tools. ChatNet can reduce the time required for burdensome network planning tasks significantly, leading to a substantial improvement in efficiency. Finally, key challenges and future research directions are highlighted.

Networks · 有向 · 邊 · Networking · AI ·

2023 年 11 月 29 日

Distributed AI in Zero-touch Provisioning for Edge Networks: Challenges and Research Directions

Abhishek Hazra,Andrea Morichetta,Ilir Murturi,Lauri Lovén,Chinmaya Kumar Dehury,Victor Casamayor Pujol,Praveen Kumar Donta,Schahram Dustdar

Zero-touch network is anticipated to inaugurate the generation of intelligent and highly flexible resource provisioning strategies where multiple service providers collaboratively offer computation and storage resources. This transformation presents substantial challenges to network administration and service providers regarding sustainability and scalability. This article combines Distributed Artificial Intelligence (DAI) with Zero-touch Provisioning (ZTP) for edge networks. This combination helps to manage network devices seamlessly and intelligently by minimizing human intervention. In addition, several advantages are also highlighted that come with incorporating Distributed AI into ZTP in the context of edge networks. Further, we draw potential research directions to foster novel studies in this field and overcome the current limitations.

Networking · ASSETS · 可辨認的 · 路徑 · Analysis ·

2023 年 11 月 22 日

SPGNN-API: A Transferable Graph Neural Network for Attack Paths Identification and Autonomous Mitigation

Houssem Jmal,Firas Ben Hmida,Nardine Basta,Muhammad Ikram,Mohamed Ali Kaafar,Andy Walker

from arxiv, IEEE Transactions on Information Forensics & Security (TIFS)

Attack paths are the potential chain of malicious activities an attacker performs to compromise network assets and acquire privileges through exploiting network vulnerabilities. Attack path analysis helps organizations to identify new/unknown chains of attack vectors that reach critical assets within the network, as opposed to individual attack vectors in signature-based attack analysis. Timely identification of attack paths enables proactive mitigation of threats. Nevertheless, manual analysis of complex network configurations, vulnerabilities, and security events to identify attack paths is rarely feasible. This work proposes a novel transferable graph neural network-based model for shortest path identification. The proposed shortest path detection approach, integrated with a novel holistic and comprehensive model for identifying potential network vulnerabilities interactions, is then utilized to detect network attack paths. Our framework automates the risk assessment of attack paths indicating the propensity of the paths to enable the compromise of highly-critical assets (e.g., databases) given the network configuration, assets' criticality, and the severity of the vulnerabilities in-path to the asset. The proposed framework, named SPGNN-API, incorporates automated threat mitigation through a proactive timely tuning of the network firewall rules and zero-trust policies to break critical attack paths and bolster cyber defenses. Our evaluation process is twofold; evaluating the performance of the shortest path identification and assessing the attack path detection accuracy. Our results show that SPGNN-API largely outperforms the baseline model for shortest path identification with an average accuracy >= 95% and successfully detects 100% of the potentially compromised assets, outperforming the attack graph baseline by 47%.

Learning · 圖 · Extensibility · motivation · 講稿 ·

2022 年 6 月 27 日

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

Zhen Wang,Weirui Kuang,Yuexiang Xie,Liuyi Yao,Yaliang Li,Bolin Ding,Jingren Zhou

from arxiv, Accpeted by KDD'2022; We have released FederatedScope for users on //github.com/alibaba/FederatedScope

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at //github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.

Next · Integration · 有向 · 控制器 · Continuity ·

2022 年 3 月 5 日

AI for Next Generation Computing: Emerging Trends and Future Directions

Sukhpal Singh Gill,Minxian Xu,Carlo Ottaviani,Panos Patros,Rami Bahsoon,Arash Shaghaghi,Muhammed Golec,Vlado Stankovski,Huaming Wu,Ajith Abraham,Manmeet Singh,Harshit Mehta,Soumya K. Ghosh,Thar Baker,Ajith Kumar Parlikad,Hanan Lutfiyya,Salil S. Kanhere,Rizos Sakellariou,Schahram Dustdar,Omer Rana,Ivona Brandic,Steve Uhlig

from arxiv, Accepted for Publication in Elsevier IoT Journal, 2022

Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.

圖卷積神經網絡/圖卷積網絡 · 圖卷積網絡 · 圖 · 圖卷積 · state-of-the-art ·

2019 年 8 月 8 日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Wei-Lin Chiang,Xuanqing Liu,Si Si,Yang Li,Samy Bengio,Cho-Jui Hsieh

from arxiv, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19)

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.