This paper presents a design space exploration for SABER, one of the finalists in NIST's quantum-resistant public-key cryptographic standardization effort. Our design space exploration targets a 65nm ASIC platform and has resulted in the evaluation of 6 different architectures. Our exploration is initiated by setting a baseline architecture which is ported from FPGA. In order to improve the clock frequency (the primary goal in our exploration), we have employed several optimizations: (i) use of compiled memories in a 'smart synthesis' fashion, (ii) pipelining, and (iii) logic sharing between SABER building blocks. The most optimized architecture utilizes four register files, achieves a remarkable clock frequency of 1GHz while only requiring an area of 0.314mm2. Moreover, physical synthesis is carried out for this architecture and a tapeout-ready layout is presented. The estimated dynamic power consumption of the high-frequency architecture is approximately 184mW for key generation and 187mW for encapsulation or decapsulation operations. These results strongly suggest that our optimized accelerator architecture is well suited for high-speed cryptographic applications.
Small-size robots offer access to spaces that are inaccessible to larger ones. This type of access is crucial in applications such as drug delivery, environmental detection, and collection of small samples. However, there are some tasks that are not possible to perform using only one robot including assembly and manufacturing at small scales, manipulation of micro- and nano- objects, and robot-based structuring of small-scale materials. The solution to this problem is to use a group of robots as a system. Thus, we focus on tasks that can be achieved using a group of small-scale robots. These robots are typically externally actuated due to their size limitation. Yet, one faces the challenge of controlling a group of robots using a single global input. We propose a control algorithm to position individual members of a swarm in predefined positions. A single control input applies to the system and moves all robots in the same direction. We also add another control modality by using different length robots. An electromagnetic coil system applied external force and steered the millirobots. This millirobot can move in various modes of motion such as pivot walking and tumbling. We propose two new designs of these millirobots. In the first design, the magnets are placed at the center of body to reduce the magnetic attraction force. In the second design, the millirobots are of identical length with two extra legs acting as the pivot points. This way we vary pivot separation in design to take advantage of variable speed in pivot walking mode while keeping the speed constant in tumbling mode. This paper presents a general algorithm for positional control of n millirobots with different lengths to move them from given initial positions to final desired ones. This method is based on choosing a leader that is fully controllable. Simulations and hardware experiments validate these results.
This paper presents non-binary polar codes for the two-user multiple-access channel (MAC). The bit error rate (BER) performances of the non-binary polar codes with different kernel factors have been investigated in detail to select a proper parameter from GF(q) for the generator matrix. Furthermore, the successive cancellation decoding for the non-binary polar codes in the two-user MAC is introduced in detail. Simulation results show that the choice of the kernel factors has a significant impact on the block error rate (BLER) performance; moreover, the non-binary polar codes provide a better BLER performance than their binary counterpart in the two-user MAC.
We consider the query complexity of finding a local minimum of a function defined on a graph, where at most $k$ rounds of interaction with the oracle are allowed. Rounds model parallel settings, where each query takes resources to complete and is executed on a separate processor. Thus the query complexity in $k$ rounds informs how many processors are needed to achieve a parallel time of $k$. We focus on the d-dimensional grid $[n]^d$, where the dimension $d$ is a constant, and consider two regimes for the number of rounds: constant and polynomial in n. We give algorithms and lower bounds that characterize the trade-off between the number of rounds of adaptivity and the query complexity of local search. When the number of rounds $k$ is constant, we show that the query complexity of local search in $k$ rounds is $\Theta\bigl(n^{\frac{d^{k+1} - d^k}{d^k - 1}}\bigl)$, for both deterministic and randomized algorithms. When the number of rounds is polynomial, i.e. $k = n^{\alpha}$ for $0 < \alpha < d/2$, the randomized query complexity is $\Theta\left(n^{d-1 - \frac{d-2}{d}\alpha}\right)$ for all $d \geq 5$. For $d=3$ and $d=4$, we show the same upper bound expression holds and give almost matching lower bounds. The local search analysis also enables us to characterize the query complexity of computing a Brouwer fixed point in rounds. Our proof technique for lower bounding the query complexity in rounds may be of independent interest as an alternative to the classical relational adversary method of Aaronson from the fully adaptive setting.
The blind deconvolution problem amounts to reconstructing both a signal and a filter from the convolution of these two. It constitutes a prominent topic in mathematical and engineering literature. In this work, we analyze a sparse version of the problem: The filter $h\in \mathbb{R}^\mu$ is assumed to be $s$-sparse, and the signal $b \in \mathbb{R}^n$ is taken to be $\sigma$-sparse, both supports being unknown. We observe a convolution between the filter and a linear transformation of the signal. Motivated by practically important multi-user communication applications, we derive a recovery guarantee for the simultaneous demixing and deconvolution setting. We achieve efficient recovery by relaxing the problem to a hierarchical sparse recovery for which we can build on a flexible framework. At the same time, for this we pay the price of some sub-optimal guarantees compared to the number of free parameters of the problem. The signal model we consider is sufficiently general to capture many applications in a number of engineering fields. Despite their practical importance, we provide first rigorous performance guarantees for efficient and simple algorithms for the bi-sparse and generalized demixing setting. We complement our analytical results by presenting results of numerical simulations. We find evidence that the sub-optimal scaling $s^2\sigma \log(\mu)\log(n)$ of our derived sufficient condition is likely overly pessimistic and that the observed performance is better described by a scaling proportional to $ s\sigma$ up to log-factors.
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this manuscript, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of a hard-thresholding rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.
Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just $\log\sigma + O(1)$ bits per edge, $\sigma$ being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input $\mathcal A$ is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in $O(|\mathcal A|\log |\mathcal A|)$ time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear $O(|\mathcal A|)$ time. When run on de Bruijn WDFAs built from real DNA datasets, an implementation of our algorithm reduces the number of nodes from 14% to 51% at a speed of more than 1 million nodes per second.
Unpaired image-to-image translation has been applied successfully to natural images but has received very little attention for manifold-valued data such as in diffusion tensor imaging (DTI). The non-Euclidean nature of DTI prevents current generative adversarial networks (GANs) from generating plausible images and has mainly limited their application to diffusion MRI scalar maps, such as fractional anisotropy (FA) or mean diffusivity (MD). Even if these scalar maps are clinically useful, they mostly ignore fiber orientations and therefore have limited applications for analyzing brain fibers. Here, we propose a manifold-aware CycleGAN that learns the generation of high-resolution DTI from unpaired T1w images. We formulate the objective as a Wasserstein distance minimization problem of data distributions on a Riemannian manifold of symmetric positive definite 3x3 matrices SPD(3), using adversarial and cycle-consistency losses. To ensure that the generated diffusion tensors lie on the SPD(3) manifold, we exploit the theoretical properties of the exponential and logarithm maps of the Log-Euclidean metric. We demonstrate that, unlike standard GANs, our method is able to generate realistic high-resolution DTI that can be used to compute diffusion-based metrics and potentially run fiber tractography algorithms. To evaluate our model's performance, we compute the cosine similarity between the generated tensors principal orientation and their ground-truth orientation, the mean squared error (MSE) of their derived FA values and the Log-Euclidean distance between the tensors. We demonstrate that our method produces 2.5 times better FA MSE than a standard CycleGAN and up to 30% better cosine similarity than a manifold-aware Wasserstein GAN while synthesizing sharp high-resolution DTI.
Currently, the neural network architecture design is mostly guided by the \emph{indirect} metric of computation complexity, i.e., FLOPs. However, the \emph{direct} metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical \emph{guidelines} for efficient network design. Accordingly, a new architecture is presented, called \emph{ShuffleNet V2}. Comprehensive ablation experiments verify that our model is the state-of-the-art in terms of speed and accuracy tradeoff.
The Pachinko Allocation Machine (PAM) is a deep topic model that allows representing rich correlation structures among topics by a directed acyclic graph over topics. Because of the flexibility of the model, however, approximate inference is very difficult. Perhaps for this reason, only a small number of potential PAM architectures have been explored in the literature. In this paper we present an efficient and flexible amortized variational inference method for PAM, using a deep inference network to parameterize the approximate posterior distribution in a manner similar to the variational autoencoder. Our inference method produces more coherent topics than state-of-art inference methods for PAM while being an order of magnitude faster, which allows exploration of a wider range of PAM architectures than have previously been studied.
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.