亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed for each chip individually considering its unique fault map, which further aggravates the problem. To reduce the overall retraining cost, in this work, we introduce the concept of resilience-driven retraining amount selection. To realize this concept, we propose a novel framework, Reduce, that, at first, computes the resilience of the given DNN to faults at different fault rates and with different amounts of retraining. Then, based on the resilience, it computes the amount of retraining required for each chip considering its unique fault map. We demonstrate the effectiveness of our methodology for a systolic array-based DNN accelerator experiencing permanent faults in the computational array.

相關內容

This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Previous works extract the subspaces by using random projection or performing dimensionality reduction method on the training trajectory, but these methods can be inefficient or unstable in terms of dimensionality and numerical operations. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training that generalizes the previous efforts. TWA is efficient in terms of dimensionality and also easy to use, making it a promising new method for subspace training. We further design an efficient scheme for subspace training to cope with large-scale problems, which allows parallel training across multiple nodes and evenly distributing the memory and computation burden to each node. We apply TWA to efficient neural network training and improving fine-tuning performance tasks to demonstrate the great efficiency and effectiveness of our approach. We conduct extensive experiments that cover various benchmark computer vision and neural language processing tasks with various architectures. The code of implementation is available at //github.com/nblt/TWA.

The hardware computing landscape is changing. What used to be distributed systems can now be found on a chip with highly configurable, diverse, specialized and general purpose units. Such Systems-on-a-Chip (SoC) are used to control today's cyber-physical systems, being the building blocks of critical infrastructures. They are deployed in harsh environments and are connected to the cyberspace, which makes them exposed to both accidental faults and targeted cyberattacks. This is in addition to the changing fault landscape that continued technology scaling, emerging devices and novel application scenarios will bring. In this paper, we discuss how the very features, distributed, parallelized, reconfigurable, heterogeneous, that cause many of the imminent and emerging security and resilience challenges, also open avenues for their cure though SoC replication, diversity, rejuvenation, adaptation, and hybridization. We show how to leverage these techniques at different levels across the entire SoC hardware/software stack, calling for more research on the topic.

Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.

北京阿比特科技有限公司