CRYSTALS-Dilithium is a lattice-based signature scheme to be standardized by NIST as the primary post-quantum signature algorithm. In this work, we make a thorough study of optimizing the implementations of Dilithium by utilizing the Advanced Vector Extension (AVX) instructions, specifically AVX2 and the latest AVX512. We first present an improved parallel small polynomial multiplication with tailored early evaluation (PSPM-TEE) to further speed up the signing procedure, which results in a speedup of 5\%-6\% compared with the original PSPM Dilithium implementation. We then present a tailored reduction method that is simpler and faster than Montgomery reduction. Our optimized AVX2 implementation exhibits a speedup of 3\%-8\% compared with the state-of-the-art of Dilithium AVX2 software. Finally, for the first time, we propose a fully and highly vectorized implementation of Dilithium using AVX-512. This is achieved by carefully vectorizing most of Dilithium functions with the AVX512 instructions in order to improve efficiency both for time and for space simultaneously. With all the optimization efforts, our AVX-512 implementation improves the performance by 37.3\%/50.7\%/39.7\% in key generation, 34.1\%/37.1\%/42.7\% in signing, and 38.1\%/38.7\%/40.7\% in verification for the parameter sets of Dilithium2/3/5 respectively. To the best of our knowledge, our AVX512 implementation has the best performance for Dilithium on the Intel x64 CPU platform to date.
These are self-contained lecture notes for spectral independence. For an $n$-vertex graph, the spectral independence condition is a bound on the maximum eigenvalue of the $n\times n$ influence matrix whose entries capture the influence between pairs of vertices, it is closely related to the covariance matrix. We will present recent results showing that spectral independence implies the mixing time of the Glauber dynamics is polynomial (where the degree of the polynomial depends on certain parameters). The proof utilizes local-to-global theorems which we will detail in these notes. Finally, we will present more recent results showing that spectral independence implies an optimal bound on the relaxation time (inverse spectral gap) and with some additional conditions implies an optimal mixing time bound of $O(n\log{n})$ for the Glauber dynamics. Our focus is on the analysis of the spectral gap of the Glauber dynamics from a functional analysis perspective of analyzing the associated local and global variance, and we present proofs of the associated local-to-global theorems from this same Markov chain perspective.
Statistical quality control methods are noteworthy to producing standard production in manufacturing processes. In this regard, there are many classical manners to control the process. Many of them have a global assumption around the distributions of the process data. They are supposed to be Normal, but it is clear that it is not always valid for all processes. Such control charts made some wrong decisions that waste funds. So, the main question while working with multivariate data set is how to find the multivariate distribution of the data set, which saves the original dependency between variables. To our knowledge, a copula function guarantees dependence on the result function. It is not enough when there is no other fundamental information about the statistical society, and we have just a data set. Therefore, we apply the maximum entropy concept to deal with this situation. In this paper, first of all, we get the joint distribution of a data set from a manufacturing process that needs to be in-control while running the production process. Then, we get an elliptical control limit via the maximum copula entropy. Finally, we represent a practical example using the method. Average run lengths are calculated for some means and shifts to show the ability of the maximum copula entropy. In the end, two practical data examples are presented, and the results of our method are compared with the traditional way based on Fisher distribution.
Monitoring a process over time is so important in manufacturing processes to reduce the waste of money and time. Some charts as Shewhart, CUSUM, and EWMA are common to monitor a process with a single intended attribute which is used in different kinds of processes with various ranges of shifts. In some cases, the process quality is characterized by different types of profiles. The purpose of this article is to monitor profile coefficients instead of a process mean. In this paper, two methods are proposed for monitoring the intercept and slope of the simple linear profile, simultaneously. In this regard, two methods are compared here. The first one is the linear regression, and the one is the maximum entropy principle. The T2 Hotelling statistics is used to transfer two coefficients to a scalar. A simulation study is applied to compare the two methods in terms of the second type of error and average run length. Finally, two real examples are presented to demonstrate the applicability of the proposed chart. The first one is about semiconductors, and the second one is about pharmaceutical production processes. The performance of the methods is relatively similar. The maximum entropy plays an important role in correctly identifying differences in the pharmaceutical example, while linear regression did not correctly detect these changes.
Viewpoint planning is an important task in any application where objects or scenes need to be viewed from different angles to achieve sufficient coverage. The mapping of confined spaces such as shelves is an especially challenging task since objects occlude each other and the scene can only be observed from the front, posing limitations on the possible viewpoints. In this paper, we propose a deep reinforcement learning framework that generates promising views aiming at reducing the map entropy. Additionally, the pipeline extends standard viewpoint planning by predicting adequate minimally invasive push actions to uncover occluded objects and increase the visible space. Using a 2.5D occupancy height map as state representation that can be efficiently updated, our system decides whether to plan a new viewpoint or perform a push. To learn feasible pushes, we use a neural network to sample push candidates on the map based on training data provided by human experts. As simulated and real-world experimental results with a robotic arm show, our system is able to significantly increase the mapped space compared to different baselines, while the executed push actions highly benefit the viewpoint planner with only minor changes to the object configuration.
The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.
The homology groups of a simplicial complex reveal fundamental properties of the topology of the data or the system and the notion of topological stability naturally poses an important yet not fully investigated question. In the current work, we study the stability in terms of the smallest perturbation sufficient to change the dimensionality of the corresponding homology group. Such definition requires an appropriate weighting and normalizing procedure for the boundary operators acting on the Hodge algebra's homology groups. Using the resulting boundary operators, we then formulate the question of structural stability as a spectral matrix nearness problem for the corresponding higher-order graph Laplacian. We develop a bilevel optimization procedure suitable for the formulated matrix nearness problem and illustrate the method's performance on a variety of synthetic quasi-triangulation datasets and transportation networks.
We study the Landau-de Gennes Q-tensor model of liquid crystals subjected to an electric field and develop a fully discrete numerical scheme for its solution. The scheme uses a convex splitting of the bulk potential, and we introduce a truncation operator for the Q-tensors to ensure well-posedness of the problem. We prove the stability and well-posedness of the scheme. Finally, making a restriction on the admissible parameters of the scheme, we show that up to a subsequence, solutions to the fully discrete scheme converge to weak solutions of the Q-tensor model as the time step and mesh are refined. We then present numerical results computed by the numerical scheme, among which, we show that it is possible to simulate the Fr\'eedericksz transition with this scheme.
A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at //github.com/xinliu20/MEC.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.