This communication is about an application of image forensics where we use camera sensor fingerprints to identify source camera (SCI: Source Camera Identification) in webcam/smartphone videos. Sensor or camera fingerprints are based on computing the intrinsic noise that is always present in this kind of sensors due to manufacturing imperfections. This is an unavoidable characteristic that links each sensor with its noise pattern. PRNU (Photo Response Non-Uniformity) has become the default technique to compute a camera fingerprint. There are many applications nowadays dealing with PRNU patterns for camera identification using still images. In this work we focus on video, first on webcam video and afterwards on smartphone video. Webcams and smartphones are the most used video cameras nowadays. Three possible methods for SCI are implemented and assessed in this work.
The human footprint is having a unique set of ridges unmatched by any other human being, and therefore it can be used in different identity documents for example birth certificate, Indian biometric identification system AADHAR card, driving license, PAN card, and passport. There are many instances of the crime scene where an accused must walk around and left the footwear impressions as well as barefoot prints and therefore, it is very crucial to recovering the footprints from identifying the criminals. Footprint-based biometric is a considerably newer technique for personal identification. Fingerprints, retina, iris and face recognition are the methods most useful for attendance record of the person. This time the world is facing the problem of global terrorism. It is challenging to identify the terrorist because they are living as regular as the citizens do. Their soft target includes the industries of special interests such as defence, silicon and nanotechnology chip manufacturing units, pharmacy sectors. They pretend themselves as religious persons, so temples and other holy places, even in markets is in their targets. These are the places where one can obtain their footprints quickly. The gait itself is sufficient to predict the behaviour of the suspects. The present research is driven to identify the usefulness of footprint and gait as an alternative to personal identification.
Self-supervised learning has proved to be a powerful approach to learn image representations without the need of large labeled datasets. For underwater robotics, it is of great interest to design computer vision algorithms to improve perception capabilities such as sonar image classification. Due to the confidential nature of sonar imaging and the difficulty to interpret sonar images, it is challenging to create public large labeled sonar datasets to train supervised learning algorithms. In this work, we investigate the potential of three self-supervised learning methods (RotNet, Denoising Autoencoders, and Jigsaw) to learn high-quality sonar image representation without the need of human labels. We present pre-training and transfer learning results on real-life sonar image datasets. Our results indicate that self-supervised pre-training yields classification performance comparable to supervised pre-training in a few-shot transfer learning setup across all three methods. Code and self-supervised pre-trained models are be available at //github.com/agrija9/ssl-sonar-images
The blockchain-based smart contract lacks privacy since the contract state and instruction code are exposed to the public. Combining smart-contract execution with Trusted Execution Environments (TEEs) provides an efficient solution, called TEE-assisted smart contracts, for protecting the confidentiality of contract states. However, the combination approaches are varied, and a systematic study is absent. Newly released systems may fail to draw upon the experience learned from existing protocols, such as repeating known design mistakes or applying TEE technology in insecure ways. In this paper, we first investigate and categorize the existing systems into two types: the layer-one solution and layer-two solution. Then, we establish an analysis framework to capture their common lights, covering the desired properties (for contract services), threat models, and security considerations (for underlying systems). Based on our taxonomy, we identify their ideal functionalities and uncover the fundamental flaws and reasons for the challenges in each specification design. We believe that this work would provide a guide for the development of TEE-assisted smart contracts, as well as a framework to evaluate future TEE-assisted confidential contract systems.
The development of autonomous vehicles provides an opportunity to have a complete set of camera sensors capturing the environment around the car. Thus, it is important for object detection and tracking to address new challenges, such as achieving consistent results across views of cameras. To address these challenges, this work presents a new Global Association Graph Model with Link Prediction approach to predict existing tracklets location and link detections with tracklets via cross-attention motion modeling and appearance re-identification. This approach aims at solving issues caused by inconsistent 3D object detection. Moreover, our model exploits to improve the detection accuracy of a standard 3D object detector in the nuScenes detection challenge. The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset.
Due to the high human cost of annotation, it is non-trivial to curate a large-scale medical dataset that is fully labeled for all classes of interest. Instead, it would be convenient to collect multiple small partially labeled datasets from different matching sources, where the medical images may have only been annotated for a subset of classes of interest. This paper offers an empirical understanding of an under-explored problem, namely partially supervised multi-label classification (PSMLC), where a multi-label classifier is trained with only partially labeled medical images. In contrast to the fully supervised counterpart, the partial supervision caused by medical data scarcity has non-trivial negative impacts on the model performance. A potential remedy could be augmenting the partial labels. Though vicinal risk minimization (VRM) has been a promising solution to improve the generalization ability of the model, its application to PSMLC remains an open question. To bridge the methodological gap, we provide the first VRM-based solution to PSMLC. The empirical results also provide insights into future research directions on partially supervised learning under data scarcity.
Videos are accessible media for analyzing sports postures and providing feedback to athletes. Existing video-based coaching systems often present feedback on the correctness of poses by augmenting videos with visual markers either manually by a coach or automatically by computing key parameters from poses. However, previewing and augmenting videos limit the analysis and visualization of human poses due to the fixed viewpoints, which confine the observation of captured human movements and cause ambiguity in the augmented feedback. Besides, existing sport-specific systems with embedded bespoke pose attributes can hardly generalize to new attributes; directly overlaying two poses might not clearly visualize the key differences that viewers would like to pursue. To address these issues, we analyze and visualize human pose data with customizable viewpoints and attributes in the context of common biomechanics of running poses, such as joint angles and step distances. Based on existing literature and a formative study, we have designed and implemented a system, VCoach, to provide feedback on running poses for amateurs. VCoach provides automatic low-level comparisons of the running poses between a novice and an expert, and visualizes the pose differences as part-based 3D animations on a human model. Meanwhile, it retains the users' controllability and customizability in high-level functionalities, such as navigating the viewpoint for previewing feedback and defining their own pose attributes through our interface. We conduct a user study to verify our design components and conduct expert interviews to evaluate the usefulness of the system.
Recruitment in large organisations often involves interviewing a large number of candidates. The process is resource intensive and complex. Therefore, it is important to carry it out efficiently and effectively. Planning the selection process consists of several problems, each of which maps to one or the other well-known computing problem. Research that looks at each of these problems in isolation is rich and mature. However, research that takes an integrated view of the problem is not common. In this paper, we take two of the most important aspects of the application processing problem, namely review/interview panel creation and interview scheduling. We have implemented our approach as a prototype system and have used it to automatically plan the interview process of a real-life data set. Our system provides a distinctly better plan than the existing practice, which is predominantly manual. We have explored various algorithmic options and have customised them to solve these panel creation and interview scheduling problems. We have evaluated these design options experimentally on a real data set and have presented our observations. Our prototype and experimental process and results may be a very good starting point for a full-fledged development project for automating application processing process.
Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scientific sources. The SciFact dataset and corresponding task provide a benchmarking leaderboard to the community to develop automatic scientific claim verification systems via extracting and assimilating relevant evidence rationales from source abstracts. In this work, we propose a modular approach that sequentially carries out binary classification for every prediction subtask as in the SciFact leaderboard. Our simple classifier-based approach uses reduced abstract representations to retrieve relevant abstracts. These are further used to train the relevant rationale-selection model. Finally, we carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim. Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard against large-scale, modular, and joint modeling approaches. We make our codebase available at //github.com/ashishrana160796/RerrFact.
Multi-camera vehicle tracking is one of the most complicated tasks in Computer Vision as it involves distinct tasks including Vehicle Detection, Tracking, and Re-identification. Despite the challenges, multi-camera vehicle tracking has immense potential in transportation applications including speed, volume, origin-destination (O-D), and routing data generation. Several recent works have addressed the multi-camera tracking problem. However, most of the effort has gone towards improving accuracy on high-quality benchmark datasets while disregarding lower camera resolutions, compression artifacts and the overwhelming amount of computational power and time needed to carry out this task on its edge and thus making it prohibitive for large-scale and real-time deployment. Therefore, in this work we shed light on practical issues that should be addressed for the design of a multi-camera tracking system to provide actionable and timely insights. Moreover, we propose a real-time city-scale multi-camera vehicle tracking system that compares favorably to computationally intensive alternatives and handles real-world, low-resolution CCTV instead of idealized and curated video streams. To show its effectiveness, in addition to integration into the Regional Integrated Transportation Information System (RITIS), we participated in the 2021 NVIDIA AI City multi-camera tracking challenge and our method is ranked among the top five performers on the public leaderboard.