Deepfake detection automatically recognizes the manipulated medias through the analysis of the difference between manipulated and non-altered videos. It is natural to ask which are the top performers among the existing deepfake detection approaches to identify promising research directions and provide practical guidance. Unfortunately, it's difficult to conduct a sound benchmarking comparison of existing detection approaches using the results in the literature because evaluation conditions are inconsistent across studies. Our objective is to establish a comprehensive and consistent benchmark, to develop a repeatable evaluation procedure, and to measure the performance of a range of detection approaches so that the results can be compared soundly. A challenging dataset consisting of the manipulated samples generated by more than 13 different methods has been collected, and 11 popular detection approaches (9 algorithms) from the existing literature have been implemented and evaluated with 6 fair-minded and practical evaluation metrics. Finally, 92 models have been trained and 644 experiments have been performed for the evaluation. The results along with the shared data and evaluation methodology constitute a benchmark for comparing deepfake detection approaches and measuring progress.
The presence of inequity is a fundamental problem in the outcomes of decision-making systems, especially when human lives are at stake. Yet, estimating notions of unfairness or inequity is difficult, particularly if they rely on hard-to-measure concepts such as risk. Such measurements of risk can be accurately obtained when no unobserved confounders have jointly influenced past decisions and outcomes. However, in the real world, this assumption rarely holds. In this paper, we show a surprising result that one can still give meaningful bounds on treatment rates to high-risk individuals, even when entirely eliminating or relaxing the assumption that all relevant risk factors are observed. We use the fact that in many real-world settings (e.g., the release of a new treatment) we have data from prior to any allocation to derive unbiased estimates of risk. This result is of immediate practical interest: we can audit unfair outcomes of existing decision-making systems in a principled manner. For instance, in a real-world study of Paxlovid allocation, our framework provably identifies that observed racial inequity cannot be explained by unobserved confounders of the same strength as important observed covariates.
Neural reflectance models are capable of reproducing the spatially-varying appearance of many real-world materials at different scales. Unfortunately, existing techniques such as NeuMIP have difficulties handling materials with strong shadowing effects or detailed specular highlights. In this paper, we introduce a neural appearance model that offers a new level of accuracy. Central to our model is an inception-based core network structure that captures material appearances at multiple scales using parallel-operating kernels and ensures multi-stage features through specialized convolution layers. Furthermore, we encode the inputs into frequency space, introduce a gradient-based loss, and employ it adaptive to the progress of the learning phase. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.
Unsupervised clustering of wafer map defect patterns is challenging because the appearance of certain defect patterns varies significantly. This includes changing shape, location, density, and rotation of the defect area on the wafer. We present a harvesting approach, which can cluster even challenging defect patterns of wafer maps well. Our approach makes use of a well-known, three-step procedure: feature extraction, dimension reduction, and clustering. The novelty in our approach lies in repeating dimensionality reduction and clustering iteratively while filtering out one cluster per iteration according to its silhouette score. This method leads to an improvement of clustering performance in general and is especially useful for difficult defect patterns. The low computational effort allows for a quick assessment of large datasets and can be used to support manual labeling efforts. We benchmark against related approaches from the literature and show improved results on a real-world industrial dataset.
This work considers the notion of random tensors and reviews some fundamental concepts in statistics when applied to a tensor based data or signal. In several engineering fields such as Communications, Signal Processing, Machine learning, and Control systems, the concepts of linear algebra combined with random variables have been indispensable tools. With the evolution of these subjects to multi-domain communication systems, multi-way signal processing, high dimensional data analysis, and multi-linear systems theory, there is a need to bring in multi-linear algebra equipped with the notion of random tensors. Also, since several such application areas deal with complex-valued entities, it is imperative to study this subject from a complex random tensor perspective, which is the focus of this paper. Using tools from multi-linear algebra, we characterize statistical properties of complex random tensors, both proper and improper, study various correlation structures, and fundamentals of tensor valued random processes. Furthermore, the asymptotic distribution of various tensor eigenvalue and singular value definitions is also considered, which is used for the study of spiked real tensor models that deals with recovery of low rank tensor signals perturbed by noise. This paper aims to provide an overview of the state of the art in random tensor theory of both complex and real valued tensors, for the purpose of enabling its application in engineering and applied science.
The constraint satisfaction problem (CSP) on a finite relational structure B is to decide, given a set of constraints on variables where the relations come from B, whether or not there is a assignment to the variables satisfying all of the constraints; the surjective CSP is the variant where one decides the existence of a surjective satisfying assignment onto the universe of B. We present an algebraic framework for proving hardness results on surjective CSPs; essentially, this framework computes global gadgetry that permits one to present a reduction from a classical CSP to a surjective CSP. We show how to derive a number of hardness results for surjective CSP in this framework, including the hardness of the disconnected cut problem, of the no-rainbow 3-coloring problem, and of the surjective CSP on all 2-element structures known to be intractable (in this setting). Our framework thus allows us to unify these hardness results, and reveal common structure among them; we believe that our hardness proof for the disconnected cut problem is more succinct than the original. In our view, the framework also makes very transparent a way in which classical CSPs can be reduced to surjective CSPs.
Accurate uncertainty estimates are important in sequential model-based decision-making tasks such as Bayesian optimization. However, these estimates can be imperfect if the data violates assumptions made by the model (e.g., Gaussianity). This paper studies which uncertainties are needed in model-based decision-making and in Bayesian optimization, and argues that uncertainties can benefit from calibration -- i.e., an 80% predictive interval should contain the true outcome 80% of the time. Maintaining calibration, however, can be challenging when the data is non-stationary and depends on our actions. We propose using simple algorithms based on online learning to provably maintain calibration on non-i.i.d. data, and we show how to integrate these algorithms in Bayesian optimization with minimal overhead. Empirically, we find that calibrated Bayesian optimization converges to better optima in fewer steps, and we demonstrate improved performance on standard benchmark functions and hyperparameter optimization tasks.
Information-processing tasks modelled by homomorphisms between relational structures can witness quantum advantage when entanglement is used as a computational resource. We prove that the occurrence of quantum advantage is determined by the same type of algebraic structure (known as a minion) that captures the polymorphism identities of CSPs and, thus, CSP complexity. We investigate the connection between the minion of quantum advantage and other known minions controlling CSP tractability and width. In this way, we make use of complexity results from the algebraic theory of CSPs to characterise the occurrence of quantum advantage in the case of graphs, and to obtain new necessary and sufficient conditions in the case of arbitrary relational structures.
The distribution of efficient individuals in the economy and the efforts that they will put in if they are hired, there are two important concerns for a technologically advanced firm. wants to open a new branch. The firm does not have information about the exact level of efficiency of an individual when she is hired. We call this situation incomplete information. The standard principal agent models assume that employees know their efficiency levels. Hence these models design incentive-compatible mechanisms. An incentive-compatible mechanism ensures that a participant does not have the incentive to misreport her efficiency level. This paper does not assume that employees know how efficient they are. This paper assumes that the production technology of the firm is intelligent, that is, the output of the machine reveals the efficiency levels of employees. Employees marginal contributions to the total output of the intelligent machine, the probability distribution of the levels of efficiency and employees costs of efforts together define a game of incomplete information. A characterization of ex-ante Nash Equilibrium is established. The results of the characterization formalize the relationship between the distribution of efficiency levels and the distribution of output.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.