亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A rising research challenge is running costly machine learning (ML) networks locally on resource-constrained edge devices. ML networks with large convolutional layers can easily exceed available memory, increasing latency due to excessive OS swapping. Previous memory reduction techniques such as pruning and quantization reduce model accuracy and often require retraining. Alternatively, distributed methods partition the convolutions into equivalent smaller sub-computations, but the implementations introduce communication costs and require a network of devices. Distributed partitioning approaches can, however, also be used to run in a reduced memory footprint on a single device by subdividing the network into smaller operations. In this paper, we extend prior work on distributed partitioning into a memory-aware execution on a single device. Our approach extends prior fusing strategies to allow for multiple groups of convolutional layers that are fused and tiled independently. This enables trading off overhead versus data reuse in order to specifically reduces memory footprint. We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations for an arbitrary set of convolutional layers. When applied to the YOLOv2 object detection network, results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints. Additionally, our algorithm will return a configuration with a latency that is within 6% of the best latency measured in a manual search.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we take a step towards addressing these challenges by introducing BOLT, a sparse deep learning library for training large-scale search and recommendation models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of information retrieval tasks including product recommendations, text classification, graph neural networks, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer case study in the field of e-commerce.

A large number of current machine learning methods rely upon deep neural networks. Yet, viewing neural networks as nonlinear dynamical systems, it becomes quickly apparent that mathematically rigorously establishing certain patterns generated by the nodes in the network is extremely difficult. Indeed, it is well-understood in the nonlinear dynamics of complex systems that, even in low-dimensional models, analytical techniques rooted in pencil-and-paper approaches frequently reach their limits. In this work, we propose a completely different perspective via the paradigm of validated numerical methods of nonlinear dynamics. The idea is to use computer-assisted proofs to validate mathematically the existence of nonlinear patterns in neural networks. As a case study, we consider a class of recurrent neural networks, where we prove via computer assistance the existence of several hundred Hopf bifurcation points, their non-degeneracy, and hence also the existence of several hundred periodic orbits. Our paradigm has the capability to rigorously verify complex nonlinear behaviour of neural networks, which provides a first step to explain the full abilities, as well as potential sensitivities, of machine learning methods via computer-assisted proofs. We showcase how validated numerical techniques can shed light on the internal working of recurrent neural networks (RNNs). For this, proofs of Hopf bifurcations are a first step towards an integration of dynamical system theory in practical application of RNNs, by proving the existence of periodic orbits in a variety of settings.

The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity improvement of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. However, some legacy projects with large codebases, such as CESM, ROMS and WRF, contain numerous lines of code and do not have significant hotspots. The cost of manually porting such applications to the Sunway architecture is almost unaffordable. To overcome such a challenge, we have developed a toolkit named O2ATH. O2ATH forwards GNU OpenMP runtime library calls to Sunway's Athread library, which greatly simplifies the parallelization work on the Sunway architecture.O2ATH enables users to write both MPE and CPE code in a single file, and parallelization can be achieved by utilizing OpenMP directives and attributes. In practice, O2ATH has helped us to port two large projects, CESM and ROMS, to the CPEs of the next generation Sunway supercomputers via the OpenMP offload method. In the experiments, kernel speedups range from 3 to 15 times, resulting in 3 to 6 times whole application speedups.Furthermore, O2ATH requires significantly fewer code modifications compared to manually crafting CPE functions.This indicates that O2ATH can greatly enhance development efficiency when porting or optimizing large software projects on Sunway supercomputers.

The rapid adoption of artificial intelligence (AI) necessitates careful analysis of its ethical implications. In addressing ethics and fairness implications, it is important to examine the whole range of ethically relevant features rather than looking at individual agents alone. This can be accomplished by shifting perspective to the systems in which agents are embedded, which is encapsulated in the macro ethics of sociotechnical systems (STS). Through the lens of macro ethics, the governance of systems - which is where participants try to promote outcomes and norms which reflect their values - is key. However, multiple-user social dilemmas arise in an STS when stakeholders of the STS have different value preferences or when norms in the STS conflict. To develop equitable governance which meets the needs of different stakeholders, and resolve these dilemmas in satisfactory ways with a higher goal of fairness, we need to integrate a variety of normative ethical principles in reasoning. Normative ethical principles are understood as operationalizable rules inferred from philosophical theories. A taxonomy of ethical principles is thus beneficial to enable practitioners to utilise them in reasoning. This work develops a taxonomy of normative ethical principles which can be operationalized in the governance of STS. We identify an array of ethical principles, with 25 nodes on the taxonomy tree. We describe the ways in which each principle has previously been operationalized, and suggest how the operationalization of principles may be applied to the macro ethics of STS. We further explain potential difficulties that may arise with each principle. We envision this taxonomy will facilitate the development of methodologies to incorporate ethical principles in reasoning capacities for governing equitable STS.

By offloading computation-intensive tasks of vehicles to roadside units (RSUs), mobile edge computing (MEC) in the Internet of Vehicles (IoV) can relieve the onboard computation burden. However, existing model-based task offloading methods suffer from heavy computational complexity with the increase of vehicles and data-driven methods lack interpretability. To address these challenges, in this paper, we propose a knowledge-driven multi-agent reinforcement learning (KMARL) approach to reduce the latency of task offloading in cybertwin-enabled IoV. Specifically, in the considered scenario, the cybertwin serves as a communication agent for each vehicle to exchange information and make offloading decisions in the virtual space. To reduce the latency of task offloading, a KMARL approach is proposed to select the optimal offloading option for each vehicle, where graph neural networks are employed by leveraging domain knowledge concerning graph-structure communication topology and permutation invariance into neural networks. Numerical results show that our proposed KMARL yields higher rewards and demonstrates improved scalability compared with other methods, benefitting from the integration of domain knowledge.

Federated learning (FL) facilitates distributed training across clients, safeguarding the privacy of their data. The inherent distributed structure of FL introduces vulnerabilities, especially from adversarial (Byzantine) clients aiming to skew local updates to their advantage. Despite the plethora of research focusing on Byzantine-resilient FL, the academic community has yet to establish a comprehensive benchmark suite, pivotal for impartial assessment and comparison of different techniques. This paper investigates existing techniques in Byzantine-resilient FL and introduces an open-source benchmark suite for convenient and fair performance comparisons. Our investigation begins with a systematic study of Byzantine attack and defense strategies. Subsequently, we present \ours, a scalable, extensible, and easily configurable benchmark suite that supports researchers and developers in efficiently implementing and validating novel strategies against baseline algorithms in Byzantine-resilient FL. The design of \ours incorporates key characteristics derived from our systematic study, encompassing the attacker's capabilities and knowledge, defense strategy categories, and factors influencing robustness. Blades contains built-in implementations of representative attack and defense strategies and offers user-friendly interfaces for seamlessly integrating new ideas.

Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets. Evaluating and debiasing these datasets and models is especially hard in text datasets where sensitive attributes such as race, gender, and sexual orientation may not be available. When these models are deployed into society, they can lead to unfair outcomes for historically underrepresented groups. In this paper, we present a dataset coupled with an approach to improve text fairness in classifiers and language models. We create a new, more comprehensive identity lexicon, TIDAL, which includes 15,123 identity terms and associated sense context across three demographic categories. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context and the effectiveness of ML fairness techniques. We evaluate our approaches using human contributors, and additionally run experiments focused on dataset and model debiasing. Results show our assistive annotation technique improves the reliability and velocity of human-in-the-loop processes. Our dataset and methods uncover more disparities during evaluation, and also produce more fair models during remediation. These approaches provide a practical path forward for scaling classifier and generative model fairness in real-world settings.

Graph neural networks (GNNs) have demonstrated a significant boost in prediction performance on graph data. At the same time, the predictions made by these models are often hard to interpret. In that regard, many efforts have been made to explain the prediction mechanisms of these models from perspectives such as GNNExplainer, XGNN and PGExplainer. Although such works present systematic frameworks to interpret GNNs, a holistic review for explainable GNNs is unavailable. In this survey, we present a comprehensive review of explainability techniques developed for GNNs. We focus on explainable graph neural networks and categorize them based on the use of explainable methods. We further provide the common performance metrics for GNNs explanations and point out several future research directions.

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at //github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.

Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.

北京阿比特科技有限公司