State-of-the-art techniques for addressing scaling-related main memory errors identify and repair bits that are at risk of error from within the memory controller. Unfortunately, modern main memory chips internally use on-die error correcting codes (on-die ECC) that obfuscate the memory controller's view of errors, complicating the process of identifying at-risk bits (i.e., error profiling). To understand the problems that on-die ECC causes for error profiling, we analytically study how on-die ECC changes the way that memory errors appear outside of the memory chip (e.g., to the memory controller). We show that on-die ECC introduces statistical dependence between errors in different bit positions, raising three key challenges for practical and effective error profiling. To address the three challenges, we introduce Hybrid Active-Reactive Profiling (HARP), a new error profiling algorithm that rapidly achieves full coverage of at-risk bits in memory chips that use on-die ECC. HARP separates error profiling into two phases: (1) using existing profiling techniques with the help of small modifications to the on-die ECC mechanism to quickly identify a subset of at-risk bits; and (2) using a secondary ECC within the memory controller to safely identify the remaining at-risk bits, if and when they fail. Our evaluations show that HARP achieves full coverage of all at-risk bits faster (e.g., 99th-percentile coverage 20.6%/36.4%/52.9%/62.1% faster, on average, given 2/3/4/5 raw bit errors per ECC word) than two state-of-the-art baseline error profiling algorithms, which sometimes fail to achieve full coverage. We perform a case study of how each profiler impacts the system's overall bit error rate (BER) when using a repair mechanism to tolerate DRAM data-retention errors. We show that HARP outperforms the best baseline algorithm (e.g., by 3.7x for a raw per-bit error probability of 0.75).
Recent research in micro-architectural attacks has uncovered a variety of vulnerabilities on shared compute devices like CPUs and GPUs which pose a substantial thread to cloud service providers and customers alike. Cloud service providers have therefore moved towards flexible systems that prioritize re-arrangeable hardware components that are not shared between users to minimize attack surfaces while retaining scalability. In this work, we show that for the sake of system security it is necessary to consider not only the security of the processors and peripherals of a system but also the security of the subsystems that connect them. In particular, we investigate the side-channel leakage potential of the I/O translation look-aside buffer (IOTLB) used in I/O memory management units (IOMMUs) to cache address translations. To exploit the IOTLB, we design a hardware module deployed to an FPGA to help us perform precise timing measurements. For the first time, we prove that the IOTLB is the source of a timing-based side-channel leakage and use it to create two covert channels from CPU to peripheral and between peripherals. While the first channel easily achieves an error rate of only 30%, the latter proved to be very reliable as nearly no errors occur. We present a close look at web fingerprints collected through this side-channel, and we examine the I/O operation of a GPU-accelerated SQL database. We then discuss several methods to remedy the observed side-channel leakages, including application design techniques, peripheral layout within existing systems, and micro-architectural features that could harden future IOMMUs.
With the advent of the Internet of Things (IoT), establishing a secure channel between smart devices becomes crucial. Recent research proposes zero-interaction pairing (ZIP), which enables pairing without user assistance by utilizing devices' physical context (e.g., ambient audio) to obtain a shared secret key. The state-of-the-art ZIP schemes suffer from three limitations: (1) prolonged pairing time (i.e., minutes or hours), (2) vulnerability to brute-force offline attacks on a shared key, and (3) susceptibility to attacks caused by predictable context (e.g., replay attack) because they rely on limited entropy of physical context to protect a shared key. We address these limitations, proposing FastZIP, a novel ZIP scheme that significantly reduces pairing time while preventing offline and predictable context attacks. In particular, we adapt a recently introduced Fuzzy Password-Authenticated Key Exchange (fPAKE) protocol and utilize sensor fusion, maximizing their advantages. We instantiate FastZIP for intra-car device pairing to demonstrate its feasibility and show how the design of FastZIP can be adapted to other ZIP use cases. We implement FastZIP and evaluate it by driving four cars for a total of 800 km. We achieve up to three times shorter pairing time compared to the state-of-the-art ZIP schemes while assuring robust security with adversarial error rates below 0.5%.
In recent decades, one of the scientists' main concerns has been to improve the accuracy of satellite attitude, regardless of the expense. The obvious result is that a large number of control strategies have been used to address this problem. In this study, an adaptive neuro-fuzzy integrated (ANFIS) satellite attitude estimation and control system was developed. The controller is trained with the data provided by an optimal controller. A pulse modulator is used to generate the right ON/OFF commands of the thruster actuator. To evaluate the performance of the AN-FIS controller in closed-loop simulation, an ANFIS observer is used to estimate the attitude and angular velocities of the satellite using magnetometer, sun sensor and data gyro data. In addition, a new ANFIS system will be proposed and evaluated that can jointly control and estimate the system. The performance of the ANFIS controller is compared to the optimal PID controller in a Monte Carlo simulation with different initial conditions, disturbance and noise. The results show that the ANFIS controller can surpass the optimal PID controller in several aspects, including time and smoothness. In addition, the ANFIS estimator is examined and the results demonstrate the high ability of this designated observers. Both the control and estimation phases are simulated by a single ANFIS subsystem, taking into account the high capacity of ANFIS, and the results of using the ANFIS model are demonstrated.
We propose a proof-of-sequential-work (PoSW) that can be verified with only a single query to the random oracle for each random challenge. We propose a PoSW that allows any verifier, even the one with no parallelism, to verify using just a single sequential computation on a single challenge. All the existing PoSWs [6, 3, 1, 4] mandate a prover to compute a sequence of responses from a random oracle against N -rounds of queries. Then the prover commits this sequence using a commitment scheme (e.g., Merkle root (like) commitment) predefined in the PoSWs. Now the verifier asks the prover to provide a set of proofs against t randomly chosen checkpoints, called as challenges, in the computed sequence. The verifier finds out the commitment from each of these proofs spending O(log N ) rounds of queries to the oracle. It can be reduced to a single round of queries only if the verifier owns O(log N ) parallelism [4]. The verifier in our PoSW demands no parallelism but uses a single query to the random oracle in order to verify each of the t challenges. The key observation is that the commitment schemes themselves in the prior works demand O(log N ) oracle queries to verify. So our PoSW asks the prover to undergo an additional efficient binary operation x on the responses from the random oracle against N -rounds of queries. The cumulative result of x, represented as a map f , on all such responses serves the purpose of the commitment. The verifier verifies this cumulative result.
We identify a family of binary codes whose structure is similar to Reed-Muller (RM) codes and which include RM codes as a strict subclass. The codes in this family are denoted as $C_n(r,m)$, and their duals are denoted as $B_n(r,m)$. The length of these codes is $n^m$, where $n \geq 2$, and $r$ is their `order'. When $n=2$, $C_n(r,m)$ is the RM code of order $r$ and length $2^m$. The special case of these codes corresponding to $n$ being an odd prime was studied by Berman (1967) and Blackmore and Norton (2001). Following the terminology introduced by Blackmore and Norton, we refer to $B_n(r,m)$ as the Berman code and $C_n(r,m)$ as the dual Berman code. We identify these codes using a recursive Plotkin-like construction, and we show that these codes have a rich automorphism group, they are generated by the minimum weight codewords, and that they can be decoded up to half the minimum distance efficiently. Using a result of Kumar et al. (2016), we show that these codes achieve the capacity of the binary erasure channel (BEC) under bit-MAP decoding. Furthermore, except double transitivity, they satisfy all the code properties used by Reeves and Pfister to show that RM codes achieve the capacity of binary-input memoryless symmetric channels. Finally, when $n$ is odd, we identify a large class of abelian codes that includes $B_n(r,m)$ and $C_n(r,m)$ and which achieves BEC capacity.
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.
Ensemble methods based on subsampling, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate confidence interval coverage without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.
Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.
An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS). To save computational cost, most of existing NAS algorithms often train and evaluate intermediate neural architectures on a small proxy dataset with limited training epochs. But it is difficult to expect an accurate performance estimation of an architecture in such a coarse evaluation way. This paper advocates a new neural architecture evaluation scheme, which aims to determine which architecture would perform better instead of accurately predict the absolute architecture performance. Therefore, we propose a \textbf{relativistic} architecture performance predictor in NAS (ReNAS). We encode neural architectures into feature tensors, and further refining the representations with the predictor. The proposed relativistic performance predictor can be deployed in discrete searching methods to search for the desired architectures without additional evaluation. Experimental results on NAS-Bench-101 dataset suggests that, sampling 424 ($0.1\%$ of the entire search space) neural architectures and their corresponding validation performance is already enough for learning an accurate architecture performance predictor. The accuracies of our searched neural architectures on NAS-Bench-101 and NAS-Bench-201 datasets are higher than that of the state-of-the-art methods and show the priority of the proposed method.
We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT). The hybrid system achieves new state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC system preserves the accuracy of SMT output and, at the same time, generates more fluent sentences as it typical for NMT. Our analysis shows that the created systems are closer to reaching human-level performance than any other GEC system reported so far.