In recent years, novel view synthesis has gained popularity in generating high-fidelity images. While demonstrating superior performance in the task of synthesizing novel views, the majority of these methods are still based on the conventional multi-layer perceptron for scene embedding. Furthermore, light field models suffer from geometric blurring during pixel rendering, while radiance field-based volume rendering methods have multiple solutions for a certain target of density distribution integration. To address these issues, we introduce the Convolutional Neural Radiance Fields to model the derivatives of radiance along rays. Based on 1D convolutional operations, our proposed method effectively extracts potential ray representations through a structured neural network architecture. Besides, with the proposed ray modeling, a proposed recurrent module is employed to solve geometric ambiguity in the fully neural rendering process. Extensive experiments demonstrate the promising results of our proposed model compared with existing state-of-the-art methods.
Neural radiance field (NeRF) has shown remarkable performance in generating photo-realistic novel views. Among recent NeRF related research, the approaches that involve the utilization of explicit structures like grids to manage features achieve exceptionally fast training by reducing the complexity of multilayer perceptron (MLP) networks. However, storing features in dense grids demands a substantial amount of memory space, resulting in a notable memory bottleneck within computer system. Consequently, it leads to a significant increase in training times without prior hyper-parameter tuning. To address this issue, in this work, we are the first to propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality. Specifically, we first design a mixed-feature hash encoding to adaptively mix part of multi-level feature grids and map it to a single hash table. Following that, in order to obtain the correct index of a grid point, we further develop an index transformation method that transforms indices of an arbitrary level grid to those of a canonical grid. Extensive experiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, opcode features, account features, and transaction behavior features of smart contracts, and the performance of identifying schemes is insufficient. In this paper, we propose SourceP, a method to detect smart Ponzi schemes on the Ethereum platform using pre-trained models and data flow, which only requires using the source code of smart contracts as features to explore the possibility of detecting smart Ponzi schemes from another direction. SourceP reduces the difficulty of data acquisition and feature extraction of existing detection methods while increasing the interpretability of the model. Specifically, we first convert the source code of a smart contract into a data flow graph and then introduce a pre-trained model based on learning code representations to build a classification model to identify Ponzi schemes in smart contracts. The experimental results show that SourceP achieves 87.2\% recall and 90.7\% F-score for detecting smart Ponzi schemes within Ethereum's smart contract dataset, outperforming state-of-the-art methods in terms of performance and sustainability. We also demonstrate through additional experiments that pre-trained models and data flow play an important contribution to SourceP, as well as proving that SourceP has a good generalization ability.
In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.
The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative metrics designed specifically for assessing the effectiveness of backdoor attacks on text-to-image models. The efficacy of BAGM is established by attacking state-of-the-art generative models, using a marketing scenario as the target domain. To that end, we contribute a dataset of branded product images. Our embedded backdoors increase the bias towards the target outputs by more than five times the usual, without compromising the model robustness or the generated content utility. By exposing generative AI's vulnerabilities, we encourage researchers to tackle these challenges and practitioners to exercise caution when using pre-trained models. Relevant code, input prompts and supplementary material can be found at //github.com/JJ-Vice/BAGM, and the dataset is available at: //ieee-dataport.org/documents/marketable-foods-mf-dataset. Keywords: Generative Artificial Intelligence, Generative Models, Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.
Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.
The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. The target audience focuses on, but is not limited to inexperienced researchers and students, especially from the field of computer science. The tool is based on a fuzzy matching of author surnames, titles, and DOIs. Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. With 77.94 %, computer science is highly affected by missing publication information in arXiv. The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3 %) arXiv-preprints from the research field of computer science and without any publication information. All four literature databases contributed to the final result. In a manual validation, a random sample of 100 resolved preprints was checked. For all preprints, at least one result is plausible. For nine preprints, more than one result was identified, three of which are partially invalid. In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (//preprintresolver.eu, Available from 2023-08-01) and source code (//gitlab.com/ippolis_wp3/preprint-resolver, Accessed: 2023-07-19) is available online.
Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to translate into real-world facial changes. In this paper, we propose adversarial head turn (AdvHeat), the first attempt at 3D adversarial face views against deepfake detectors, based on face view synthesis from a single-view fake image. Extensive experiments validate the vulnerability of various detectors to AdvHeat in realistic, black-box scenarios. For example, AdvHeat based on a simple random search yields a high attack success rate of 96.8% with 360 searching steps. When additional query access is allowed, we can further reduce the step budget to 50. Additional analyses demonstrate that AdvHeat is better than conventional attacks on both the cross-detector transferability and robustness to defenses. The adversarial images generated by AdvHeat are also shown to have natural looks. Our code, including that for generating a multi-view dataset consisting of 360 synthetic views for each of 1000 IDs from FaceForensics++, is available at //github.com/twowwj/AdvHeaT.
During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.
Nowadays, research into personalization has been focusing on explainability and fairness. Several approaches proposed in recent works are able to explain individual recommendations in a post-hoc manner or by explanation paths. However, explainability techniques applied to unfairness in recommendation have been limited to finding user/item features mostly related to biased recommendations. In this paper, we devised a novel algorithm that leverages counterfactuality methods to discover user unfairness explanations in the form of user-item interactions. In our counterfactual framework, interactions are represented as edges in a bipartite graph, with users and items as nodes. Our Bipartite Graph Explainer perturbs the topological structure to find an altered version (counterfactual explanation) that minimizes the disparity in utility between the protected and unprotected demographic groups. Experiments on four real-world graphs coming from various domains showed that our method can systematically explain user unfairness on three state-of-the-art GNN-based recommendation models. Moreover, an empirical evaluation of the perturbed network uncovered relevant patterns that justify the nature of the unfairness discovered by the generated explanations. The source code and the preprocessed data sets are available at //github.com/jackmedda/RS-BGExplainer.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.