An important application scenario of precision agriculture is detecting and measuring crop health threats using sensors and data analysis techniques. However, the textual data are still under-explored among the existing solutions due to the lack of labelled data and fine-grained semantic resources. Recent research suggests that the increasing connectivity of farmers and the emergence of online farming communities make social media like Twitter a participatory platform for detecting unfamiliar plant health events if we can extract essential information from unstructured textual data. ChouBERT is a French pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
Detecting non-termination is crucial for ensuring program correctness and security, such as preventing denial-of-service attacks. While termination analysis has been studied for many years, existing methods have limited scalability and are only effective on small programs. To address this issue, we propose a practical termination checking technique, called EndWatch, for detecting non-termination caused by infinite loops through testing. Specifically, we introduce two methods to generate non-termination oracles based on checking state revisits, i.e., if the program returns to a previously visited state at the same program location, it does not terminate. The non-termination oracles can be incorporated into testing tools (e.g., AFL used in this paper) to detect non-termination in large programs. For linear loops, we perform symbolic execution on individual loops to infer State Revisit Conditions (SRCs) and instrument SRCs into target loops. For non-linear loops, we instrument target loops for checking concrete state revisits during execution. We evaluated EndWatch on standard benchmarks with small-sized programs and real-world projects with large-sized programs. The evaluation results show that EndWatch is more effective than the state-of-the-art tools on standard benchmarks (detecting 87% of non-terminating programs while the best baseline detects only 67%), and useful in detecting non-termination in real-world projects (detecting 90% of known non-termination CVEs and 4 unknown bugs).
We propose an easy-to-use adjustment estimator for the effect of a treatment based on observational data from a single (social) network of units. The approach allows for interactions among units within the network, called interference, and for observed confounding. We define a simplified causal graph that does not differentiate between units, called generic graph. Using valid adjustment sets determined in the generic graph, we can identify the treatment effect and build a corresponding estimator. We establish the estimator's consistency and its convergence to a Gaussian limiting distribution at the parametric rate under certain regularity conditions that restrict the growth of dependencies among units. We empirically verify the theoretical properties of our estimator through a simulation study and apply it to estimate the effect of a strict facial-mask policy on the spread of COVID-19 in Switzerland.
We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected. We implement the proposed methodology with an approach based on knowledge distillation, which we investigate both at the feature and model level. To evaluate our approach, we introduce a new benchmark based on the Assembly101 dataset. Results demonstrate the feasibility and effectiveness of the proposed method against classic unsupervised domain adaptation and temporal sequence alignment approaches. Remarkably, without bells and whistles, our best model performs on par with supervised approaches trained on labeled egocentric data, without ever seeing a single egocentric label, achieving a +15.99% (28.59% vs 12.60%) improvement in the edit score on the Assembly101 dataset compared to a baseline model trained solely on exocentric data.
Robotic interaction in fast-paced environments presents a substantial challenge, particularly in tasks requiring the prediction of dynamic, non-stationary objects for timely and accurate responses. An example of such a task is ping-pong, where the physical limitations of a robot may prevent it from reaching its goal in the time it takes the ball to cross the table. The scene of a ping-pong match contains rich visual information of a player's movement that can allow future game state prediction, with varying degrees of uncertainty. To this aim, we present a visual modeling, prediction, and control system to inform a ping-pong playing robot utilizing visual model uncertainty to allow earlier motion of the robot throughout the game. We present demonstrations and metrics in simulation to show the benefit of incorporating model uncertainty, the limitations of current standard model uncertainty estimators, and the need for more verifiable model uncertainty estimation. Our code is publicly available.
Numerous studies use regression discontinuity design (RDD) for panel data by assuming that the treatment effects are homogeneous across all individuals/groups and pooling the data together. It is unclear how to test for the significance of treatment effects when the treatments vary across individuals/groups and the error terms may exhibit complicated dependence structures. This paper examines the estimation and inference of multiple treatment effects when the errors are not independent and identically distributed, and the treatment effects vary across individuals/groups. We derive a simple analytical expression for approximating the variance-covariance structure of the treatment effect estimators under general dependence conditions and propose two test statistics, one is to test for the overall significance of the treatment effect and the other for the homogeneity of the treatment effects. We find that in the Gaussian approximations to the test statistics, the dependence structures in the data can be safely ignored due to the localized nature of the statistics. This has the important implication that the simulated critical values can be easily obtained. Simulations demonstrate our tests have superb size control and reasonable power performance in finite samples regardless of the presence of strong cross-section dependence or/and weak serial dependence in the data. We apply our tests to two datasets and find significant overall treatment effects in each case.
Training neural networks that require adversarial optimization, such as generative adversarial networks (GANs) and unsupervised domain adaptations (UDAs), suffers from instability. This instability problem comes from the difficulty of the minimax optimization, and there have been various approaches in GANs and UDAs to overcome this problem. In this study, we tackle this problem theoretically through a functional analysis. Specifically, we show the convergence property of the minimax problem by the gradient descent over the infinite-dimensional spaces of continuous functions and probability measures under certain conditions. Using this setting, we can discuss GANs and UDAs comprehensively, which have been studied independently. In addition, we show that the conditions necessary for the convergence property are interpreted as stabilization techniques of adversarial training such as the spectral normalization and the gradient penalty.
Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. In recent years, Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation. In medical images, structures are usually highly interconnected and globally distributed. ViTs utilize their multi-scale attention mechanism to model the long-range relationships in the images. However, they do lack image-related inductive bias and translational invariance, potentially impacting their performance. Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs) to capture local correlation in addition to the global information in the images. This survey paper provides a detailed review of the recent advancements in ViTs and HVTs for medical image segmentation. Along with the categorization of ViT and HVT-based medical image segmentation approaches we also present a detailed overview of their real-time applications in several medical image modalities. This survey may serve as a valuable resource for researchers, healthcare practitioners, and students in understanding the state-of-the-art approaches for ViT-based medical image segmentation.
Digital Imaging and Communication System (DICOM) is widely used throughout the public health sector for portability in medical imaging. However, these DICOM files have vulnerabilities present in the preamble section. Successful exploitation of these vulnerabilities can allow attackers to embed executable codes in the 128-Byte preamble of DICOM files. Embedding the malicious executable will not interfere with the readability or functionality of DICOM imagery. However, it will affect the underline system silently upon viewing these files. This paper shows the infiltration of Windows malware executables into DICOM files. On viewing the files, the malicious DICOM will get executed and eventually infect the entire hospital network through the radiologist's workstation. The code injection process of executing malware in DICOM files affects the hospital networks and workstations' memory. Memory forensics for the infected radiologist's workstation is crucial as it can detect which malware disrupts the hospital environment, and future detection methods can be deployed. In this paper, we consider the machine learning (ML) algorithms to conduct memory forensics on three memory dump categories: Trojan, Spyware, and Ransomware, taken from the CIC-MalMem-2022 dataset. We obtain the highest accuracy of 75\% with the Random Forest model. For estimating the feature importance for ML model prediction, we leveraged the concept of Shapley values.
Robotic collectives for military and disaster response applications require coalition formation algorithms to partition robots into appropriate task teams. Collectives' missions will often incorporate tasks that require multiple high-level robot behaviors or services, which coalition formation must accommodate. The highly dynamic and unstructured application domains also necessitate that coalition formation algorithms produce near optimal solutions (i.e., >95% utility) in near real-time (i.e., <5 minutes) with very large collectives (i.e., hundreds of robots). No previous coalition formation algorithm satisfies these requirements. An initial evaluation found that traditional auction-based algorithms' runtimes are too long, even though the centralized simulator incorporated ideal conditions unlikely to occur in real-world deployments (i.e., synchronization across robots and perfect, instantaneous communication). The hedonic game-based GRAPE algorithm can produce solutions in near real-time, but cannot be applied to multiple service collectives. This manuscript integrates GRAPE and a services model, producing GRAPE-S and Pair-GRAPE-S. These algorithms and two auction baselines were evaluated using a centralized simulator with up to 1000 robots, and via the largest distributed coalition formation simulated evaluation to date, with up to 500 robots. The evaluations demonstrate that auctions transfer poorly to distributed collectives, resulting in excessive runtimes and low utility solutions. GRAPE-S satisfies the target domains' coalition formation requirements, producing near optimal solutions in near real-time, and Pair-GRAPE-S more than satisfies the domain requirements, producing optimal solutions in near real-time. GRAPE-S and Pair-GRAPE-S are the first algorithms demonstrated to support near real-time coalition formation for very large, distributed collectives with multiple services.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.