Face morphing attacks seek to deceive a Face Recognition (FR) system by presenting a morphed image consisting of the biometric qualities from two different identities with the aim of triggering a false acceptance with one of the two identities, thereby presenting a significant threat to biometric systems. The success of a morphing attack is dependent on the ability of the morphed image to represent the biometric characteristics of both identities that were used to create the image. We present a novel morphing attack that uses a Diffusion-based architecture to improve the visual fidelity of the image and the ability of the morphing attack to represent characteristics from both identities. We demonstrate the effectiveness of the proposed attack by evaluating its visual fidelity via the Frechet Inception Distance (FID). Also, extensive experiments are conducted to measure the vulnerability of FR systems to the proposed attack. The ability of a morphing attack detector to detect the proposed attack is measured and compared against two state-of-the-art GAN-based morphing attacks along with two Landmark-based attacks. Additionally, a novel metric to measure the relative strength between different morphing attacks is introduced and evaluated.
We revisit the fundamental Boolean Matrix Multiplication (BMM) problem. With the invention of algebraic fast matrix multiplication over 50 years ago, it also became known that BMM can be solved in truly subcubic $O(n^\omega)$ time, where $\omega<3$; much work has gone into bringing $\omega$ closer to $2$. Since then, a parallel line of work has sought comparably fast combinatorial algorithms but with limited success. The naive $O(n^3)$-time algorithm was initially improved by a $\log^2{n}$ factor [Arlazarov et al.; RAS'70], then by $\log^{2.25}{n}$ [Bansal and Williams; FOCS'09], then by $\log^3{n}$ [Chan; SODA'15], and finally by $\log^4{n}$ [Yu; ICALP'15]. We design a combinatorial algorithm for BMM running in time $n^3 / 2^{\Omega(\sqrt[7]{\log n})}$ -- a speed-up over cubic time that is stronger than any poly-log factor. This comes tantalizingly close to refuting the conjecture from the 90s that truly subcubic combinatorial algorithms for BMM are impossible. This popular conjecture is the basis for dozens of fine-grained hardness results. Our main technical contribution is a new regularity decomposition theorem for Boolean matrices (or equivalently, bipartite graphs) under a notion of regularity that was recently introduced and analyzed analytically in the context of communication complexity [Kelley, Lovett, Meka; arXiv'23], and is related to a similar notion from the recent work on $3$-term arithmetic progression free sets [Kelley, Meka; FOCS'23].
This article proposes to integrate two Reeb graphs with the information of their isosurfaces' inclusion relation. As computing power evolves, there arise numerical data that have small-scale physics inside larger ones -- for example, small clouds in a simulation can be contained inside an atmospheric layer, which is further contained in an enormous hurricane. Extracting such inclusions between isosurfaces is a challenge for isosurfacing: the user would have to explore the vast combinations of isosurfaces $(f_1^{-1}(l_1), f_2^{-1}(l_2))$ from scalar fields $f_i: M(n) \to \mathbb{R}$, $i = 1, 2$, where $M$ is an $n$-dimensional domain manifold and $f_i$ are physical quantities, to find inclusion of one isosurface within another. For this, we propose the \textit{Reeb complement}, a topological space that integrates two Reeb graphs with the inclusion relation. The Reeb complement has a natural partition that classifies equivalent containment of isosurfaces. This is a handy characteristic that lets the Reeb complement serve as an overview of the inclusion relationship in the data. We also propose level-of-detail control of the inclusions through simplification of the Reeb complement. We demonstrate that the relationship of two independent scalar fields can be extracted by taking the product of Reeb graphs (which we call the Reeb product) and by then subtracting the projection of the Reeb space, which opens up a new possibility for feature analysis.
This paper presents a novel approach for signal reconstruction using Spiking Neural Networks (SNN) based on the principles of Cognitive Informatics and Cognitive Computing. The proposed SNN leverages the Discrete Fourier Transform (DFT) to represent and reconstruct arbitrary time series signals. By employing N spiking neurons, the SNN captures the frequency components of the input signal, with each neuron assigned a unique frequency. The relationship between the magnitude and phase of the spiking neurons and the DFT coefficients is explored, enabling the reconstruction of the original signal. Additionally, the paper discusses the encoding of impulse delays and the phase differences between adjacent frequency components. This research contributes to the field of signal processing and provides insights into the application of SNN for cognitive signal analysis and reconstruction.
Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When improving efficiency, researchers often use the original diffusion model to train an additional network designed specifically for fast image generation. In contrast, our approach seeks to reduce latency directly, without any retraining, fine-tuning, or knowledge distillation. In particular, we find the repeated calculation of attention maps to be costly yet redundant, and instead suggest reusing them during sampling. Our specific reuse strategies are based on ODE theory, which implies that the later a map is reused, the smaller the distortion in the final image. We empirically compare these reuse strategies with few-step sampling procedures of comparable latency, finding that reuse generates images that are closer to those produced by the original high-latency diffusion model.
In spiking neural networks, neuron dynamics are described by the biologically realistic integrate-and-fire model that captures membrane potential accumulation and above-threshold firing behaviors. Among the hardware implementations of integrate-and-fire neuron devices, one important feature, reset, has been largely ignored. Here, we present the design and fabrication of a magnetic domain wall and magnetic tunnel junction based artificial integrate-and-fire neuron device that achieves reliable reset at the end of the integrate-fire cycle. We demonstrate the domain propagation in the domain wall racetrack (integration), reading using a magnetic tunnel junction (fire), and reset as the domain is ejected from the racetrack, showing the artificial neuron can be operated continuously over 100 integrate-fire-reset cycles. Both pulse amplitude and pulse number encoding is demonstrated. The device data is applied on an image classification task using a spiking neural network and shown to have comparable performance to an ideal leaky, integrate-and-fire neural network. These results achieve the first demonstration of reliable integrate-fire-reset in domain wall-magnetic tunnel junction-based neuron devices and shows the promise of spintronics for neuromorphic computing.
We present a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of a GNN probabilistic classifier evolve as we apply it on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can uniformly express. This strong convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including sparse and dense variants of the Erd\H{o}s-R\'enyi model, the stochastic block model, and the Barab\'asi-Albert model. We empirically validate these findings, observing that the convergence phenomenon appears not only on random graphs but also on some real-world graphs.
Since the development of Large Language Models (LLMs) has achieved remarkable success, understanding and controlling their internal complex mechanisms has become an urgent problem. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, developing practical and efficient methods for applying these representations for general and flexible model editing remains challenging. In this work, we explore how to use representation engineering methods to guide the editing of LLMs by deploying a representation sensor as an oracle. We first identify the importance of a robust and reliable sensor during editing, then propose an Adversarial Representation Engineering (ARE) framework to provide a unified and interpretable approach for conceptual model editing without compromising baseline performance. Experiments on multiple model editing paradigms demonstrate the effectiveness of ARE in various settings. Code and data are available at //github.com/Zhang-Yihao/Adversarial-Representation-Engineering.
Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter $\epsilon$ and the sample size $n$ only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate $n^{-1/2}$ with factor $\epsilon^{-d/2}$, where $d$ is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.