国产日黄色大片一区二区_亚洲国产最新AV片_18禁止无码免费久久免费看_人卡视无久线欧按免久频_好大好猛好硬好爽视频免费_欧美一区二区三区激情电影_先锋影音最新资源网

PCM is a popular backing memory for DRAM main memory in tiered memory systems. PCM has asymmetric access energy; writes dominate reads. MLC asymmetry can vary by an order of magnitude. Many schemes have been developed to take advantage of the asymmetric patterns of 0s and 1s in the data to reduce write energy. Because the memory is non-volatile, data can be recovered via physical attack or across system reboot cycles. To protect information stored in PCM against these attacks requires encryption. Unfortunately, most encryption algorithms scramble 0s and 1s in the data, effectively removing any patterns and negatively impacting schemes that leverage data bias and similarity to reduce write energy. In this paper, we introduce Virtual Coset Coding (VCC) as a workload-independent approach that reduces costly symbol transitions for storing encrypted data. VCC is based on two ideas. First, using coset encoding with random coset candidates, it is possible to effectively reduce the frequency of costly bit/symbol transitions when writing encrypted data. Second, a small set of random substrings can be used to achieve the same encoding efficiency as a large number of random coset candidates, but at a much lower encoding/decoding cost. Additionally, we demonstrate how VCC can be leveraged for energy reduction in combination with fault-mitigation and fault-tolerance to dramatically increase the lifetimes of endurance-limited NVMs, such as PCM. We evaluate the design of VCC and demonstrate that it can be implemented on-chip with only a nominal area overhead. VCC reduces dynamic energy by 22-28% while maintaining the same performance. Using our multi-objective optimization approach achieves at least a 36% improvement in lifetime over the state-of-the-art and at least a 50% improvement in lifetime vs. an unencoded memory, while maintaining its energy savings and system performance.

相關內容

可約的

關注 2

預測器/決策函數 · 流 · 可約的 · 代價 · INTERACT ·

2022 年 2 月 6 日

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Yuan Yuan,Gene Cheung,Pascal Frossard,H. Vicky Zhao,Jiwu Huang

from arxiv, 15 pages, 13 figures,accepted by TCSVT

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

可約的 · INFORMS · Extensibility · 傳感器 · 可辨認的 ·

2022 年 2 月 5 日

Combining Accelerometer and Gyroscope Data in Smartphone-Based Activity Recognition using Movelets

Emily Huang,Kebin Yan,Jukka-Pekka Onnela

Physical activity patterns can be informative about a patient's health status. Traditionally, activity data have been gathered using patient self-report. However, these subjective data can suffer from bias and are difficult to collect over long time periods. Smartphones offer an opportunity to address these challenges. The smartphone has built-in sensors that can be programmed to collect data objectively, unobtrusively, and continuously. Due to their widespread adoption, smartphones are also accessible to most of the population. A main challenge in smartphone-based activity recognition is extracting information optimally from multiple sensors to identify the unique features of different activities. In our study, we analyze data collected by the accelerometer and gyroscope, which measure the phone's acceleration and angular velocity, respectively. We propose an extension to the "movelet method" that jointly incorporates both sensors. We also apply this joint-sensor method to a data set we collected previously. The findings show that combining data from the two sensors can result in more accurate activity recognition than using each sensor alone. For example, the joint-sensor method reduces errors of the gyroscope-only method in differentiating between standing and sitting. It also reduces errors of the accelerometer-only method in classifying vigorous activities.

編譯器 · INTERACT · INFORMS · 控制器 · 跡 ·

2022 年 2 月 5 日

SecurePtrs: Proving Secure Compilation with Data-Flow Back-Translation and Turn-Taking Simulation

Akram El-Korashy,Roberto Blanco,Jérémy Thibault,Adrien Durier,Deepak Garg,Catalin Hritcu

from arxiv, CSF 2022 submission

Proving secure compilation of partial programs typically requires back-translating an attack against the compiled program to an attack against the source program. To prove back-translation, one can syntactically translate the target attacker to a source one -- i.e., syntax-directed back-translation -- or show that the interaction traces of the target attacker can also be emitted by source attackers -- i.e., trace-directed back-translation. Syntax-directed back-translation is not suitable when the target attacker may use unstructured control flow that the source language cannot directly represent. Trace-directed back-translation works with such syntactic dissimilarity because only the external interactions of the target attacker have to be mimicked in the source, not its internal control flow. Revealing only external interactions is, however, inconvenient when sharing memory via unforgeable pointers, since information about shared pointers stashed in private memory is not present on the trace. This made prior proofs unnecessarily complex, since the generated attacker had to instead stash all reachable pointers. In this work, we introduce more informative data-flow traces, combining the best of syntax- and trace-directed back-translation in a simpler technique that handles both syntactic dissimilarity and memory sharing well, and that is proved correct in Coq. Additionally, we develop a novel turn-taking simulation relation and use it to prove a recomposition lemma, which is key to reusing compiler correctness in such secure compilation proofs. We are the first to mechanize such a recomposition lemma in the presence of memory sharing. We use these two innovations in a secure compilation proof for a code generation compiler pass between a source language with structured control flow and a target language with unstructured control flow, both with safe pointers and components.

Color · Pair · Extensibility · 可約的 · 數據集 ·

2022 年 2 月 5 日

ROMNet: Renovate the Old Memories

Runsheng Xu,Zhengzhong Tu,Yuanqi Du,Xiaoyu Dong,Jinlong Li,Zibo Meng,Jiaqi Ma,Hongkai YU

from arxiv, Submitted to TIP

Renovating the memories in old photos is an intriguing research topic in computer vision fields. These legacy images often suffer from severe and commingled degradations such as cracks, noise, and color-fading, while lack of large-scale paired old photo datasets makes this restoration task very challenging. In this work, we present a novel reference-based end-to-end learning framework that can jointly repair and colorize the degraded legacy pictures. Specifically, the proposed framework consists of three modules: a restoration sub-network for degradation restoration, a similarity sub-network for color histogram matching and transfer, and a colorization subnet that learns to predict the chroma elements of the images conditioned on chromatic reference signals. The whole system takes advantage of the color histogram priors in a given reference image, which vastly reduces the dependency on large-scale training data. Apart from the proposed method, we also create, to our knowledge, the first public and real-world old photo dataset with paired ground truth for evaluating old photo restoration models, wherein each old photo is paired with a manually restored pristine image by PhotoShop experts. Our extensive experiments conducted on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-arts both quantitatively and qualitatively.

有向 · Processing（編程語言） · 中央處理器 (CPU) · Integration · 連接主義 ·

2022 年 2 月 4 日

Direct Telemetry Access

Jonatan Langlet,Ran Ben Basat,Sivaramakrishnan Ramanathan,Gabriele Oliaro,Michael Mitzenmacher,Minlan Yu,Gianni Antichi

The emergence of programmable switches allows operators to collect a vast amount of fine-grained telemetry data in real time. However, consolidating the telemetry reports at centralized collectors to gain a network-wide view poses an immense challenge. The received data has to be transported from the switches, parsed, manipulated, and inserted in queryable data structures. As the network scales, this requires excessive CPU processing. RDMA is a transport protocol that bypasses the CPU and allows extremely high data transfer rates. Yet, RDMA is not designed for telemetry collection: it requires a stateful connection, supports only a small number of concurrent writers, and has limited writing primitives, which restricts its data aggregation applicability. We introduce Direct Telemetry Access (DTA), a solution that allows fast and efficient telemetry collection, aggregation, and indexing. Our system establishes RDMA connections only from collectors' ToR switches, called \emph{translators}, that process DTA reports from all other switches. DTA features novel and expressive reporting primitives such as Key-Write, Append, Sketch-Merge, and Key-Increment that allow integration of telemetry systems such as INT and others. The translators then aggregate, batch, and write the reports to collectors' memory in queryable form.

圖 · Processing（編程語言） · Better · 中央處理器 (CPU) · 直徑 ·

2022 年 2 月 3 日

Locality-based Graph Reordering for Processing Speed-Ups and Impact of Diameter

Vedant Satav

from arxiv, 20 pages. Bachelor's thesis

Graph analysis involves a high number of random memory access patterns. Earlier research has shownthat the cache miss latency is responsible for more than half of the graph processing time, with the CPU execution having the smaller share. There has been significant study on decreasing the CPU computing time for example, by employing better cache prefetching and replacement policies. In thispaper, we study the various methods that do so by attempting to decrease the CPU cache miss ratio.Graph Reordering attempts to exploit the power-law distribution of graphs -- few sparsely-populated vertices in the graph have high number of connections -- to keep the frequently accessed vertices together locally and hence decrease the cache misses. However, reordering the graph by keeping the hot vertices together may affect the spatial locality of the graph, and thus add to the total CPU compute time.Also, we also need to have a control over the total reordering time and its inverse relation with thefinal CPU execution timeIn order to exploit this trade-off between reordering as per vertex hotness and spatial locality, we introduce the light-weight Community-based Reordering. We attempt to maintain the community-structureof the graph by storing the hot-members in the community locally together. The implementation also takes into consideration the impact of graph diameter on the execution time. We compare our implementation with other reordering implementations and find a significantly better result on five graph processing algorithms: BFS, CC, CCSV, PR and BC. Lorder achieved speed-up of upto 7x and an average speed-up of 1.2x as compared to other reordering algorithms

穩健性 · 聯邦學習 · 可約的 · ML · 約束 ·

2022 年 2 月 3 日

RoFL: Attestable Robustness for Secure Federated Learning

Lukas Burkhalter,Hidde Lycklama,Alexander Viand,Nicolas Küchler,Anwar Hithnawi

from arxiv, 21 pages, 21 figures

Even though recent years have seen many attacks exposing severe vulnerabilities in federated learning (FL), a holistic understanding of what enables these attacks and how they can be mitigated effectively is still lacking. In this work we demystify the inner workings of existing targeted attacks. We provide new insights into why these attacks are possible and why a definitive solution to FL robustness is challenging. We show that the need for ML algorithms to memorize tail data has significant implications for FL integrity. This phenomenon has largely been studied in the context of privacy; our analysis sheds light on its implications for ML integrity. In addition, we show how constraints on client updates can effectively improve robustness. To incorporate these constraints into secure FL protocols, we design and develop RoFL, a new secure FL system that enables constraints to be expressed and enforced on high-dimensional encrypted model updates. In essence, RoFL augments existing secure FL aggregation protocols with zero-knowledge proofs. Due to the scale of FL, realizing these checks efficiently presents a paramount challenge. We introduce several optimizations at the ML layer that allow us to reduce the number of cryptographic checks needed while preserving the effectiveness of our defenses. We show that RoFL scales to the sizes of models used in real-world FL deployments.

圖卷積神經網絡/圖卷積網絡 · 圖 · 有向 · 圖卷積 · Performer ·

2020 年 4 月 29 日

Directed Graph Convolutional Network

Zekun Tong,Yuxuan Liang,Changsheng Sun,David S. Rosenblum,Andrew Lim

Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also expand the receptive field of the convolution operation. A new GCN model, called DGCN, is then designed to learn representations on the directed graph, leveraging both the first- and second-order proximity information. We empirically show the fact that GCNs working only with DGCNs can encode more useful information from graph and help achieve better performance when generalized to other models. Moreover, extensive experiments on citation networks and co-purchase datasets demonstrate the superiority of our model against the state-of-the-art methods.

MoDELS · 注意力機制 · Performer · 輸出 · Machine Translation ·

2019 年 9 月 26 日

Attention Forcing for Sequence-to-sequence Model Training

Qingyun Dou,Yiting Lu,Joshua Efiong,Mark J. F. Gales

from arxiv, 11 pages, 4 figures, conference

Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during inference, where the reference output is replaced by generated output. Several approaches deal with this problem, largely by guiding the model with generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with generated output history and reference attention. This approach can train the model to recover from its mistakes, in a stable fashion, without the need for a schedule or a classifier. In addition, it allows the model to generate output sequences aligned with the references, which can be important for cascaded systems like many speech synthesis systems. Experiments on speech synthesis show that attention forcing yields significant performance gain. Experiments on machine translation show that for tasks where various re-orderings of the output are valid, guiding the model with generated output history is challenging, while guiding the model with reference attention is beneficial.

INFORMS · Performer · 穩健性 · CASES · FPS ·

2018 年 2 月 8 日

Saliency-Enhanced Robust Visual Tracking

Caglar Aytekin,Francesco Cricri,Emre Aksu

from arxiv, Submitted to ICIP 2018

Discrete correlation filter (DCF) based trackers have shown considerable success in visual object tracking. These trackers often make use of low to mid level features such as histogram of gradients (HoG) and mid-layer activations from convolution neural networks (CNNs). We argue that including semantically higher level information to the tracked features may provide further robustness to challenging cases such as viewpoint changes. Deep salient object detection is one example of such high level features, as it make use of semantic information to highlight the important regions in the given scene. In this work, we propose an improvement over DCF based trackers by combining saliency based and other features based filter responses. This combination is performed with an adaptive weight on the saliency based filter responses, which is automatically selected according to the temporal consistency of visual saliency. We show that our method consistently improves a baseline DCF based tracker especially in challenging cases and performs superior to the state-of-the-art. Our improved tracker operates at 9.3 fps, introducing a small computational burden over the baseline which operates at 11 fps.