Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at //github.com/Xianhua-He/cvpr2024-face-anti-spoofing-challenge.
Learning to represent and simulate the dynamics of physical systems is a crucial yet challenging task. Existing equivariant Graph Neural Network (GNN) based methods have encapsulated the symmetry of physics, \emph{e.g.}, translations, rotations, etc, leading to better generalization ability. Nevertheless, their frame-to-frame formulation of the task overlooks the non-Markov property mainly incurred by unobserved dynamics in the environment. In this paper, we reformulate dynamics simulation as a spatio-temporal prediction task, by employing the trajectory in the past period to recover the Non-Markovian interactions. We propose Equivariant Spatio-Temporal Attentive Graph Networks (ESTAG), an equivariant version of spatio-temporal GNNs, to fulfill our purpose. At its core, we design a novel Equivariant Discrete Fourier Transform (EDFT) to extract periodic patterns from the history frames, and then construct an Equivariant Spatial Module (ESM) to accomplish spatial message passing, and an Equivariant Temporal Module (ETM) with the forward attention and equivariant pooling mechanisms to aggregate temporal message. We evaluate our model on three real datasets corresponding to the molecular-, protein- and macro-level. Experimental results verify the effectiveness of ESTAG compared to typical spatio-temporal GNNs and equivariant GNNs.
Graph Neural Networks (GNNs) have emerged as potent tools for predicting outcomes in graph-structured data. Despite their efficacy, a significant drawback of GNNs lies in their limited ability to provide robust uncertainty estimates, posing challenges to their reliability in contexts where errors carry significant consequences. Moreover, GNNs typically excel in in-distribution settings, assuming that training and test data follow identical distributions: a condition often unmet in real-world graph data scenarios. In this article, we leverage conformal prediction, a widely recognized statistical technique for quantifying uncertainty by transforming predictive model outputs into prediction sets, to address uncertainty quantification in GNN predictions amidst conditional shift \footnote{Representing the change in conditional probability distribution $P(label |input)$ from source domain to target domain.} in graph-based semi-supervised learning (SSL). Additionally, we propose a novel loss function aimed at refining model predictions by minimizing conditional shift in latent stages. Termed Conditional Shift Robust (CondSR) conformal prediction for GNNs, our approach CondSR is model-agnostic and adaptable to various classification models. We validate the effectiveness of our method on standard graph benchmark datasets, integrating it with state-of-the-art GNNs in node classification tasks. The code implementation is publicly available for further exploration and experimentation.
In statistical mechanics, computing the partition function is generally difficult. An approximation method using a variational autoregressive network (VAN) has been proposed recently. This approach offers the advantage of directly calculating the generation probabilities while obtaining a significantly large number of samples. The present study introduces a novel approximation method that employs samples derived from quantum annealing machines in conjunction with VAN, which are empirically assumed to adhere to the Gibbs-Boltzmann distribution. When applied to the finite-size Sherrington-Kirkpatrick model, the proposed method demonstrates enhanced accuracy compared to the traditional VAN approach and other approximate methods, such as the widely utilized naive mean field.
Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.
Frequency-multiplexing is an effective method to achieve resource-efficient superconducting qubit readout. Allowing multiple resonators to share a common feedline, the number of cables and passive components involved in the readout of a qubit can be drastically reduced. However, this improvement in scalability comes at the price of a crucial non-ideality -- an increased readout crosstalk. Prior works have targeted building better devices and discriminators to reduce its effects, as readout-crosstalk-induced qubit measurement errors are detrimental to the reliability of a quantum computer. However, in this work, we show that beyond the reliability of a system, readout crosstalk can introduce vulnerabilities in a system being shared among multiple users. These vulnerabilities are directly related to correlated errors due to readout crosstalk. These correlated errors can be exploited by nefarious attackers to predict the state of the victim qubits, resulting in information leakage.
Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a `student' and a `teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at //github.com/wiedersehne/FinDNA.
We model a Markov decision process, parametrized by an unknown parameter, and study the asymptotic behavior of a sampling-based algorithm, called Thompson sampling. The standard definition of regret is not always suitable to evaluate a policy, especially when the underlying chain structure is general. We show that the standard (expected) regret can grow (super-)linearly and fails to capture the notion of learning in realistic settings with non-trivial state evolution. By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret, which forgets the immutable consequences of past actions. Instead, it measures regret against the optimal reward moving forward from the current period. We show that the expected residual regret of the Thompson sampling algorithm is upper bounded by a term which converges exponentially fast to 0. We present conditions under which the posterior sampling error of Thompson sampling converges to 0 almost surely. We then introduce the probabilistic version of the expected residual regret and present conditions under which it converges to 0 almost surely. Thus, we provide a viable concept of learning for sampling algorithms which will serve useful in broader settings than had been considered previously.
Generative Adversarial Networks (GANs) should produce synthetic data that fits the underlying distribution of the data being modeled. For real valued time-series data, this implies the need to simultaneously capture the static distribution of the data, but also the full temporal distribution of the data for any potential time horizon. This temporal element produces a more complex problem that can potentially leave current solutions under-constrained, unstable during training, or prone to varying degrees of mode collapse. In FETSGAN, entire sequences are translated directly to the generator's sampling space using a seq2seq style adversarial auto encoder (AAE), where adversarial training is used to match the training distribution in both the feature space and the lower dimensional sampling space. This additional constraint provides a loose assurance that the temporal distribution of the synthetic samples will not collapse. In addition, the First Above Threshold (FAT) operator is introduced to supplement the reconstruction of encoded sequences, which improves training stability and the overall quality of the synthetic data being generated. These novel contributions demonstrate a significant improvement to the current state of the art for adversarial learners in qualitative measures of temporal similarity and quantitative predictive ability of data generated through FETSGAN.
We investigate the fundamental limits of reliable communication over multi-view channels, in which the channel output is comprised of a large number of independent noisy views of a transmitted symbol. We consider first the setting of multi-view discrete memoryless channels and then extend our results to general multi-view channels (using multi-letter formulas). We argue that the channel capacity and dispersion of such multi-view channels converge exponentially fast in the number of views to the entropy and varentropy of the input distribution, respectively. We identify the exact rate of convergence as the smallest Chernoff information between two conditional distributions of the output, conditioned on unequal inputs. For the special case of the deletion channel, we compute upper bounds on this Chernoff information. Finally, we present a new channel model we term the Poisson approximation channel -- of possible independent interest -- whose capacity closely approximates the capacity of the multi-view binary symmetric channel for any fixed number of views.
Attribute-based Zero-Shot Learning (ZSL) has revolutionized the ability of models to recognize new classes not seen during training. However, with the advancement of large-scale models, the expectations have risen. Beyond merely achieving zero-shot generalization, there is a growing demand for universal models that can continually evolve in expert domains using unlabeled data. To address this, we introduce a scaled-down instantiation of this challenge: Evolutionary Generalized Zero-Shot Learning (EGZSL). This setting allows a low-performing zero-shot model to adapt to the test data stream and evolve online. We elaborate on three challenges of this special task, \ie, catastrophic forgetting, initial prediction bias, and evolutionary data class bias. Moreover, we propose targeted solutions for each challenge, resulting in a generic method capable of continuous evolution from a given initial IGZSL model. Experiments on three popular GZSL benchmark datasets demonstrate that our model can learn from the test data stream while other baselines fail. Codes are available at \url{//github.com/cdb342/EGZSL}.