The utilization of finite field multipliers is pervasive in contemporary digital systems, with hardware implementation for bit parallel operation often necessitating millions of logic gates. However, various digital design issues, whether natural or stemming from soft errors, can result in gate malfunction, ultimately leading to erroneous multiplier outputs. Thus, to prevent susceptibility to error, it is imperative to employ an effective finite field multiplier implementation that boasts a robust fault detection capability. This study proposes a novel fault detection scheme for a recent bit-parallel polynomial basis multiplier over GF(2m), intended to achieve optimal fault detection performance for finite field multipliers while simultaneously maintaining a low-complexity implementation, a favored attribute in resource-constrained applications like smart cards. The primary concept behind the proposed approach is centered on the implementation of a BCH decoder that utilizes re-encoding technique and FIBM algorithm in its first and second sub-modules, respectively. This approach serves to address hardware complexity concerns while also making use of Berlekamp-Rumsey-Solomon (BRS) algorithm and Chien search method in the third sub-module of the decoder to effectively locate errors with minimal delay. The results of our synthesis indicate that our proposed error detection and correction architecture for a 45-bit multiplier with 5-bit errors achieves a 37% and 49% reduction in critical path delay compared to existing designs. Furthermore, the hardware complexity associated with a 45-bit multiplicand that contains 5 errors is confined to a mere 80%, which is significantly lower than the most exceptional BCH-based fault recognition methodologies, including TMR, Hamming's single error correction, and LDPC-based procedures within the realm of finite field multiplication.
The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized. To address this issue, we propose a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, we treat every window and pixel inside the window as nodes to construct graphs for both views and devise the Graph Transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object's edge. Extensive experiments on three widely used semantic segmentation datasets (Cityscapes, ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph Transformer with Boundary-aware Attention, can achieve state-of-the-art segmentation performance.
Network alignment (NA) is the task of finding the correspondence of nodes between two networks based on the network structure and node attributes. Our study is motivated by the fact that, since most of existing NA methods have attempted to discover all node pairs at once, they do not harness information enriched through interim discovery of node correspondences to more accurately find the next correspondences during the node matching. To tackle this challenge, we propose Grad-Align, a new NA method that gradually discovers node pairs by making full use of node pairs exhibiting strong consistency, which are easy to be discovered in the early stage of gradual matching. Specifically, Grad-Align first generates node embeddings of the two networks based on graph neural networks along with our layer-wise reconstruction loss, a loss built upon capturing the first-order and higher-order neighborhood structures. Then, nodes are gradually aligned by computing dual-perception similarity measures including the multi-layer embedding similarity as well as the Tversky similarity, an asymmetric set similarity using the Tversky index applicable to networks with different scales. Additionally, we incorporate an edge augmentation module into Grad-Align to reinforce the structural consistency. Through comprehensive experiments using real-world and synthetic datasets, we empirically demonstrate that Grad-Align consistently outperforms state-of-the-art NA methods.
Non-volatile memory (NVM) crossbars have been identified as a promising technology, for accelerating important machine learning operations, with matrix-vector multiplication being a key example. Binary neural networks (BNNs) are especially well-suited for use with NVM crossbars due to their use of a low-bitwidth representation for both activations and weights. However, the aggressive quantization of BNNs can result in suboptimal accuracy, and the analog effects of NVM crossbars can further degrade the accuracy during inference. This paper presents a comprehensive study that benchmarks BNNs trained and validated on ImageNet and deployed on NeuroSim, a simulator for NVM-crossbar-based PIM architecture. Our study analyzes the impact of various parameters, such as input precision and ADC resolution, on both the accuracy of the inference and the hardware performance metrics. We have found that an ADC resolution of 8-bit with an input precision of 4-bit achieves near-optimal accuracy compared to the original BNNs. In addition, we have identified bottleneck components in the PIM architecture that affect area, latency, and energy consumption, and we demonstrate the impact that different BNN layers have on hardware performance.
Virtual try-on is a critical image synthesis task that aims to transfer clothes from one image to another while preserving the details of both humans and clothes. While many existing methods rely on Generative Adversarial Networks (GANs) to achieve this, flaws can still occur, particularly at high resolutions. Recently, the diffusion model has emerged as a promising alternative for generating high-quality images in various applications. However, simply using clothes as a condition for guiding the diffusion model to inpaint is insufficient to maintain the details of the clothes. To overcome this challenge, we propose an exemplar-based inpainting approach that leverages a warping module to guide the diffusion model's generation effectively. The warping module performs initial processing on the clothes, which helps to preserve the local details of the clothes. We then combine the warped clothes with clothes-agnostic person image and add noise as the input of diffusion model. Additionally, the warped clothes is used as local conditions for each denoising process to ensure that the resulting output retains as much detail as possible. Our approach, namely Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), effectively utilizes the power of the diffusion model, and the incorporation of the warping module helps to produce high-quality and realistic virtual try-on results. Experimental results on VITON-HD demonstrate the effectiveness and superiority of our method.
We present a new hydrodynamic model for incompressible binary fluids that is thermodynamically consistent and non-isothermal. This model follows the generalized Onsager principle and Boussinesq approximation and preserves the volume of each fluid phase and the positive entropy production rate under consistent boundary conditions. To solve the governing partial differential equations in the model numerically, we design a set of second-order, volume and entropy-production-rate preserving numerical algorithms. Using an efficient adaptive time-stepping strategy, we conduct several numerical simulations. These simulations accurately simulate the Rayleigh-B\'{e}nard convection in binary fluids and the interfacial dynamics between two immiscible fluids under the effects of the temperature gradient, gravity, and interfacial forces. Our numerical results show roll cell patterns and thermally induced mixing of binary fluids in a rectangular computational domain with a set of specific boundary conditions: a zero-velocity boundary condition all around, the insulation boundary condition at the lateral boundaries, and an imposed temperature difference vertically. We also perform long-time simulations of interfacial dynamics, demonstrating the robustness of our new structure-preserving schemes and reveal interesting fluid mixing phenomena.
Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation-two things that are necessary for the competition, we developed additional \texttt{R} functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured victory in two out of four sub-competitions, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.
By interacting, synchronizing, and cooperating with its physical counterpart in real time, digital twin is promised to promote an intelligent, predictive, and optimized modern city. Via interconnecting massive physical entities and their virtual twins with inter-twin and intra-twin communications, the Internet of digital twins (IoDT) enables free data exchange, dynamic mission cooperation, and efficient information aggregation for composite insights across vast physical/virtual entities. However, as IoDT incorporates various cutting-edge technologies to spawn the new ecology, severe known/unknown security flaws and privacy invasions of IoDT hinders its wide deployment. Besides, the intrinsic characteristics of IoDT such as \emph{decentralized structure}, \emph{information-centric routing} and \emph{semantic communications} entail critical challenges for security service provisioning in IoDT. To this end, this paper presents an in-depth review of the IoDT with respect to system architecture, enabling technologies, and security/privacy issues. Specifically, we first explore a novel distributed IoDT architecture with cyber-physical interactions and discuss its key characteristics and communication modes. Afterward, we investigate the taxonomy of security and privacy threats in IoDT, discuss the key research challenges, and review the state-of-the-art defense approaches. Finally, we point out the new trends and open research directions related to IoDT.
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.