Cloth-changing person re-identification aims to retrieve and identify spe-cific pedestrians by using cloth-irrelevant features in person cloth-changing scenarios. However, pedestrian images captured by surveillance probes usually contain occlusions in real-world scenarios. The perfor-mance of existing cloth-changing re-identification methods is significantly degraded due to the reduction of discriminative cloth-irrelevant features caused by occlusion. We define cloth-changing person re-identification in occlusion scenarios as occluded cloth-changing person re-identification (Occ-CC-ReID), and to the best of our knowledge, we are the first to pro-pose occluded cloth-changing person re-identification as a new task. We constructed two occluded cloth-changing person re-identification datasets for different occlusion scenarios: Occluded-PRCC and Occluded-LTCC. The datasets can be obtained from the following link: //github.com/1024AILab/Occluded-Cloth-Changing-Person- Re-Identification.
Sequences with low aperiodic autocorrelation sidelobes have been extensively researched in literatures. With sufficiently low integrated sidelobe level (ISL), their power spectrums are asymptotically flat over the whole frequency domain. However, for the beam sweeping in the massive multi-input multi-output (MIMO) broadcast channels, the flat spectrum should be constrained in a passband with tunable bandwidth to achieve the flexible tradeoffs between the beamforming gain and the beam sweeping time. Motivated by this application, we construct a family of sequences termed the generalized step-chirp (GSC) sequence with a closed-form expression, where some parameters can be tuned to adjust the bandwidth flexibly. In addition to the application in beam sweeping, some GSC sequences are closely connected with Mow's unified construction of sequences with perfect periodic autocorrelations, and may have a coarser phase resolution than the Mow sequence while their ISLs are comparable.
We present a velocity-based Monte Carlo fluid solver that overcomes the limitations of its existing vorticity-based counterpart. Because the velocity-based formulation is more commonly used in graphics, our Monte Carlo solver can be readily extended with various techniques from the fluid simulation literature. We derive our method by solving the Navier-Stokes equations via operator splitting and designing a pointwise Monte Carlo estimator for each substep. We reformulate the projection and diffusion steps as integration problems based on the recently introduced walk-on-boundary technique [Sugimoto et al. 2023]. We transform the volume integral arising from the source term of the pressure Poisson equation into a form more amenable to practical numerical evaluation. Our resulting velocity-based formulation allows for the proper simulation of scenes that the prior vorticity-based Monte Carlo method [Rioux-Lavoie and Sugimoto et al. 2022] either simulates incorrectly or cannot support. We demonstrate that our method can easily incorporate advancements drawn from conventional non-Monte Carlo methods by showing how one can straightforwardly add buoyancy effects, divergence control capabilities, and numerical dissipation reduction methods, such as advection-reflection and PIC/FLIP methods.
Efficient Reduce and AllReduce communication collectives are a critical cornerstone of high-performance computing (HPC) applications. We present the first systematic investigation of Reduce and AllReduce on the Cerebras Wafer-Scale Engine (WSE). This architecture has been shown to achieve unprecedented performance both for machine learning workloads and other computational problems like FFT. We introduce a performance model to estimate the execution time of algorithms on the WSE and validate our predictions experimentally for a wide range of input sizes. In addition to existing implementations, we design and implement several new algorithms specifically tailored to the architecture. Moreover, we establish a lower bound for the runtime of a Reduce operation on the WSE. Based on our model, we automatically generate code that achieves near-optimal performance across the whole range of input sizes. Experiments demonstrate that our new Reduce and AllReduce algorithms outperform the current vendor solution by up to 3.27x. Additionally, our model predicts performance with less than 4% error. The proposed communication collectives increase the range of HPC applications that can benefit from the high throughput of the WSE. Our model-driven methodology demonstrates a disciplined approach that can lead the way to further algorithmic advancements on wafer-scale architectures.
Most existing methods of Out-of-Domain (OOD) intent classification rely on extensive auxiliary OOD corpora or specific training paradigms. However, they are underdeveloped in the underlying principle that the models should have differentiated confidence in In- and Out-of-domain intent. In this work, we shed light on the fundamental cause of model overconfidence on OOD and demonstrate that calibrated subnetworks can be uncovered by pruning the overparameterized model. Calibrated confidence provided by the subnetwork can better distinguish In- and Out-of-domain, which can be a benefit for almost all post hoc methods. In addition to bringing fundamental insights, we also extend the Lottery Ticket Hypothesis to open-world scenarios. We conduct extensive experiments on four real-world datasets to demonstrate our approach can establish consistent improvements compared with a suite of competitive baselines.
Travel mode choice (TMC) prediction, which can be formulated as a classification task, helps in understanding what makes citizens choose different modes of transport for individual trips. This is also a major step towards fostering sustainable transportation. As behaviour may evolve over time, we also face the question of detecting concept drift in the data. This necessitates using appropriate methods to address potential concept drift. In particular, it is necessary to decide whether batch or stream mining methods should be used to develop periodically updated TMC models. To address the challenge of the development of TMC models, we propose the novel Incremental Ensemble of Batch and Stream Models (IEBSM) method aimed at adapting travel mode choice classifiers to concept drift possibly occurring in the data. It relies on the combination of drift detectors with batch learning and stream mining models. We compare it against batch and incremental learners, including methods relying on active drift detection. Experiments with varied travel mode data sets representing both city and country levels show that the IEBSM method both detects drift in travel mode data and successfully adapts the models to evolving travel mode choice data. The method has a higher rank than batch and stream learners.
With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA sets. Further sample analysis indicates that the retriever consistently retrieves samples mostly from the same speaker with acoustic characteristics highly consistent with the query audio, thereby improving detection performance.
Convergence of the block iterative method in image reconstruction for positron emission tomography (PET) requires careful control of relaxation parameters, which is a challenging task. The automatic determination of relaxation parameters for list-mode reconstructions also remains challenging. Therefore, a different approach would be desirable. In this study, we propose a list-mode maximum likelihood Dykstra-like splitting PET reconstruction (LM-MLDS). LM-MLDS converges the list-mode block iterative method by adding the distance from an initial image as a penalty term into an objective function. LM-MLDS takes a two-step approach because its performance depends on the quality of the initial image. The first step uses a uniform image as the initial image, and then the second step uses a reconstructed image after one main iteration as the initial image. In a simulation study, LM-MLDS provided a better tradeoff curve between noise and contrast than the other methods. In a clinical study, LM-MLDS removed the false hotspots at the edge of the axial field of view and improved the image quality of slices covering the top of the head to the cerebellum. List-mode proximal splitting reconstruction is useful not only for optimizing nondifferential functions but also for converging block iterative methods without controlling relaxation parameters.
Face Anti-Spoofing (FAS) is crucial to safeguard Face Recognition (FR) Systems. In real-world scenarios, FRs are confronted with both physical and digital attacks. However, existing algorithms often address only one type of attack at a time, which poses significant limitations in real-world scenarios where FR systems face hybrid physical-digital threats. To facilitate the research of Unified Attack Detection (UAD) algorithms, a large-scale UniAttackData dataset has been collected. UniAttackData is the largest public dataset for Unified Attack Detection, with a total of 28,706 videos, where each unique identity encompasses all advanced attack types. Based on this dataset, we organized a Unified Physical-Digital Face Attack Detection Challenge to boost the research in Unified Attack Detections. It attracted 136 teams for the development phase, with 13 qualifying for the final round. The results re-verified by the organizing team were used for the final ranking. This paper comprehensively reviews the challenge, detailing the dataset introduction, protocol definition, evaluation criteria, and a summary of published results. Finally, we focus on the detailed analysis of the highest-performing algorithms and offer potential directions for unified physical-digital attack detection inspired by this competition. Challenge Website: //sites.google.com/view/face-anti-spoofing-challenge/welcome/challengecvpr2024.
Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[//github.com/phecy/SSL-FEW-SHOT.]{//github.com/phecy/ssl-few-shot.}
Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing performance on a variety of domains such as few-shot image classification and reinforcement learning. However, one important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In this paper, we augment MAML with the capability to identify the mode of tasks sampled from a multimodal task distribution and adapt quickly through gradient updates. Specifically, we propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation. We evaluate the proposed model on a diverse set of few-shot learning tasks, including regression, image classification, and reinforcement learning. The results not only demonstrate the effectiveness of our model in modulating the meta-learned prior in response to the characteristics of tasks but also show that training on a multimodal distribution can produce an improvement over unimodal training.