成人不卡顿免费视频在线,精品自在线观看影片天天看,日本高清一区二区不卡免费,日韩一区二区三区四区黄色网站,国产精品成人免费视频一区丝袜

DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware architectures, analytical cost models are crucial to estimate scheduling effects on the HW level. However, state-of-the-art cost models are lacking support for exploring the complete depth-first scheduling space, for instance focusing only on activations while ignoring weights, or modeling only DRAM accesses while overlooking on-chip data movements. These limitations prevent researchers from systematically and accurately understanding the depth-first scheduling space. After formalizing this design space, this work proposes a unified modeling framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in the gaps. DeFiNES enables analytically estimating the hardware cost for possible schedules in terms of both energy and latency, while considering data access at every memory level. This is done for each schedule and HW architecture under study by optimally choosing the active part of the memory hierarchy per unique combination of operand, layer, and feature map tile. The hardware costs are estimated, taking into account both data computation and data copy phases. The analytical cost model is validated against measured data from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling accuracy at the end-to-end neural network level. A comparison with generalized state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Networking · Learning · INFORMS · motivation ·

2023 年 2 月 13 日

Deep Unfolding of the DBFB Algorithm with Application to ROI CT Imaging with Limited Angular Density

Marion Savanier,Emilie Chouzenoux,Jean-Christophe Pesquet,Cyril Riddell

This paper presents a new method for reconstructing regions of interest (ROI) from a limited number of computed tomography (CT) measurements. Classical model-based iterative reconstruction methods lead to images with predictable features. Still, they often suffer from tedious parameterization and slow convergence. On the contrary, deep learning methods are fast, and they can reach high reconstruction quality by leveraging information from large datasets, but they lack interpretability. At the crossroads of both methods, deep unfolding networks have been recently proposed. Their design includes the physics of the imaging system and the steps of an iterative optimization algorithm. Motivated by the success of these networks for various applications, we introduce an unfolding neural network called U-RDBFB designed for ROI CT reconstruction from limited data. Few-view truncated data are effectively handled thanks to a robust non-convex data fidelity term combined with a sparsity-inducing regularization function. We unfold the Dual Block coordinate Forward-Backward (DBFB) algorithm, embedded in an iterative reweighted scheme, allowing the learning of key parameters in a supervised manner. Our experiments show an improvement over several state-of-the-art methods, including a model-based iterative scheme, a multi-scale deep learning architecture, and deep unfolding methods.

估計/估計量 · 圖 · Learning · Minimax · 稀疏 ·

2023 年 2 月 13 日

Efficient Graph Laplacian Estimation by a Proximal Newton Approach

Yakov Medvedovsky,Eran Treister,Tirza Routtenberg

The Laplacian-constrained Gaussian Markov Random Field (LGMRF) is a common multivariate statistical model for learning a weighted sparse dependency graph from given data. This graph learning problem is formulated as a maximum likelihood estimation (MLE) of the precision matrix, subject to Laplacian structural constraints, with a sparsity-inducing penalty term. This paper aims to solve this learning problem accurately and efficiently. First, since the commonly-used $\ell_1$-norm penalty is less appropriate in this setting, we employ the nonconvex minimax concave penalty (MCP), which promotes sparse solutions with lower estimation bias. Second, as opposed to most existing first-order methods for this problem, we base our method on the second-order proximal Newton approach to obtain an efficient solver for large-scale networks. This approach is considered the most efficient for the related graphical LASSO problem and allows for several algorithmic features we exploit, such as using Conjugate Gradients, preconditioning, and splitting to active/free sets. Numerical experiments demonstrate the advantages of the proposed method in terms of \emph{both} computational complexity and graph learning accuracy compared to existing methods.

Networking · Neural Networks · 均值 · 相互獨立的 · 同分布的 ·

2023 年 2 月 13 日

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

Phan-Minh Nguyen,Huy Tuan Pham

from arxiv, 125 pages; to appear in Mathematical Statistics and Learning. This version incorporates the content of the companion note arXiv:2006.09355 (June 2020)

We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's widths increase, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the \textit{mean field} limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a \textit{neuronal embedding}, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths. Using our framework, we prove several properties of large-width multilayer neural networks. Firstly we show that independent and identically distributed initializations cause strong degeneracy effects on the network's learning trajectory when the network's depth is at least four. Secondly we obtain several global convergence guarantees for feedforward multilayer networks under a number of different setups. These include two-layer and three-layer networks with independent and identically distributed initializations, and multilayer networks of arbitrary depths with a special type of correlated initializations that is motivated by the new concept of \textit{bidirectional diversity}. Unlike previous works that rely on convexity, our results admit non-convex losses and hinge on a certain universal approximation property, which is a distinctive feature of infinite-width neural networks and is shown to hold throughout the training process. Aside from being the first known results for global convergence of multilayer networks in the mean field regime, they demonstrate flexibility of our framework and incorporate several new ideas and insights that depart from the conventional convex optimization wisdom.

多樣性 · 語言模型化 · MoDELS · Automator · INFORMS ·

2023 年 2 月 12 日

MarioGPT: Open-Ended Text2Level Generation through Large Language Models

Shyam Sudhakaran,Miguel González-Duque,Claire Glanois,Matthias Freiberger,Elias Najarro,Sebastian Risi

Procedural Content Generation (PCG) algorithms provide a technique to generate complex and diverse environments in an automated way. However, while generating content with PCG methods is often straightforward, generating meaningful content that reflects specific intentions and constraints remains challenging. Furthermore, many PCG algorithms lack the ability to generate content in an open-ended manner. Recently, Large Language Models (LLMs) have shown to be incredibly effective in many diverse domains. These trained LLMs can be fine-tuned, re-using information and accelerating training for new tasks. In this work, we introduce MarioGPT, a fine-tuned GPT2 model trained to generate tile-based game levels, in our case Super Mario Bros levels. We show that MarioGPT can not only generate diverse levels, but can be text-prompted for controllable level generation, addressing one of the key challenges of current PCG techniques. As far as we know, MarioGPT is the first text-to-level model. We also combine MarioGPT with novelty search, enabling it to generate diverse levels with varying play-style dynamics (i.e. player paths). This combination allows for the open-ended generation of an increasingly diverse range of content.

可約的 · 優化器 · 級聯 · DC · 泛函 ·

2023 年 2 月 12 日

Asymptotically Optimal Cascaded Coded Distributed Computing via Combinatorial Designs

Minquan Cheng,Youlong Wu,Xianxian Li

from arxiv, 15 pages

Coded distributed computing (CDC) introduced by Li \emph{et al.} can greatly reduce the communication load for MapReduce computing systems. In the general cascaded CDC with $K$ workers, $N$ input files and $Q$ Reduce functions, each input file will be mapped by $r$ workers and each Reduce function will be computed by $s$ workers such that coding techniques can be applied to achieve the maximum multicast gain. The main drawback of most existing CDC schemes is that they require the original data to be split into a large number of input files that grows exponentially with $K$, which can significantly increase the coding complexity and degrade system performance. In this paper, we first use a classic combinatorial structure $t$-design, for any integer $t\geq 2$, to develop a low-complexity and asymptotically optimal CDC with $r=s$. The main advantages of our scheme via $t$-design are two-fold: 1) having much smaller $N$ and $Q$ than the existing schemes under the same parameters $K$, $r$ and $s$; and 2) achieving smaller communication loads compared with the state-of-the-art schemes. Remarkably, unlike the previous schemes that realize on large operation fields, our scheme operates on the minimum binary field $\mathbb{F}_2$. Furthermore, we show that our construction method can incorporate the other combinatorial structures that have a similar property to $t$-design. For instance, we use $t$-GDD to obtain another asymptotically optimal CDC scheme over $\mathbb{F}_2$ that has different parameters from $t$-design. Finally, we show that our construction method can also be used to construct CDC schemes with $r\neq s$ that have small file number and Reduce function number.

時間步 · 泛函 · 奇異的 · 離散化 · 模型評估 ·

2023 年 2 月 11 日

A Uniquely Solvable, Positivity-Preserving and Unconditionally Energy Stable Numerical Scheme for the Functionalized Cahn-Hilliard Equation with Logarithmic Potential

Wenbin Chen,Jianyu Jing,Hao Wu

from arxiv, 46 pages, 11 figures. The linear refinement condition on the time step for the optimal rate convergence analysis was improved. Typos were fixed

We propose and analyze a first-order finite difference scheme for the functionalized Cahn-Hilliard (FCH) equation with a logarithmic Flory-Huggins potential. The semi-implicit numerical scheme is designed based on a suitable convex-concave decomposition of the FCH free energy. We prove unique solvability of the numerical algorithm and verify its unconditional energy stability without any restriction on the time step size. Thanks to the singular nature of the logarithmic part in the Flory-Huggins potential near the pure states $\pm 1$, we establish the so-called positivity-preserving property for the phase function at a theoretic level. As a consequence, the numerical solutions will never reach the singular values $\pm 1$ in the point-wise sense and the fully discrete scheme is well defined at each time step. Next, we present a detailed optimal rate convergence analysis and derive error estimates in $l^{\infty}(0,T;L_h^2)\cap l^2(0,T;H^3_h)$ under a linear refinement requirement $\Delta t\leq C_1 h$. To achieve the goal, a higher order asymptotic expansion (up to the second order temporal and spatial accuracy) based on the Fourier projection is utilized to control the discrete maximum norm of solutions to the numerical scheme. We show that if the exact solution to the continuous problem is strictly separated from the pure states $\pm 1$, then the numerical solutions can be kept away from $\pm 1$ by a positive distance that is uniform with respect to the size of the time step and the grid. Finally, a few numerical experiments are presented. Convergence test is performed to demonstrate the accuracy and robustness of the proposed numerical scheme. Pearling bifurcation, meandering instability and spinodal decomposition are observed in the numerical simulations.

分解的 · Cognition · 推斷 · INTERACT · Principle ·

2023 年 2 月 9 日

Gaze-based intention estimation: principles, methodologies, and applications in HRI

Anna Belardinelli

from arxiv, submitted to ACM Transactions on Human-Robot Interaction

Intention prediction has become a relevant field of research in Human-Machine and Human-Robot Interaction. Indeed, any artificial system (co)-operating with and along humans, designed to assist and coordinate its actions with a human partner, would benefit from first inferring the human's current intention. To spare the user the cognitive burden of explicitly uttering their goals, this inference relies mostly on behavioral cues deemed indicative of the current action. It has been long known that eye movements are highly anticipatory of the single steps unfolding during a task, hence they can serve as a very early and reliable behavioural cue for intention recognition. This review aims to draw a line between insights in the psychological literature on visuomotor control and relevant applications of gaze-based intention recognition in technical domains, with a focus on teleoperated and assistive robotic systems. Starting from the cognitive principles underlying the relationship between intentions, eye movements, and action, the use of eye tracking and gaze-based models for intent recognition in Human-Robot Interaction is considered, with prevalent methodologies and their diverse applications. Finally, special consideration is given to relevant human factors issues and current limitations to be factored in when designing such systems.

Neural Networks · Networking · Learning · MoDELS · 深度學習 ·

2023 年 2 月 9 日

PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals

Pengcheng Ai,Zhi Deng,Yi Wang,Hui Gong,Xinchi Ran,Zijian Lang

from arxiv, Accepted by IEEE Transactions on Nuclear Science

Front-end electronics equipped with high-speed digitizers are being used and proposed for future nuclear detectors. Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors. Simulations and experiments demonstrate the satisfactory accuracy and additional benefits of neural networks in this area. However, specific hardware accelerating such models for online operations still needs to be studied. In this work, we introduce PulseDL-II, a system-on-chip (SoC) specially designed for applications of event feature (time, energy, etc.) extraction from pulses with deep learning. Based on the previous version, PulseDL-II incorporates a RISC CPU into the system structure for better functional flexibility and integrity. The neural network accelerator in the SoC adopts a three-level (arithmetic unit, processing element, neural network) hierarchical architecture and facilitates parameter optimization of the digital design. Furthermore, we devise a quantization scheme compatible with deep learning frameworks (e.g., TensorFlow) within a selected subset of layer types. We validate the correct operations of PulseDL-II on field programmable gate arrays (FPGA) alone and with an experimental setup comprising a direct digital synthesis (DDS) and analog-to-digital converters (ADC). The proposed system achieved 60 ps time resolution and 0.40% energy resolution at signal to noise ratio (SNR) of 47.4 dB.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.