Object detectors are vulnerable to backdoor attacks. In contrast to classifiers, detectors possess unique characteristics, architecturally and in task execution; often operating in challenging conditions, for instance, detecting traffic signs in autonomous cars. But, our knowledge dominates attacks against classifiers and tests in the "digital domain". To address this critical gap, we conducted an extensive empirical study targeting multiple detector architectures and two challenging detection tasks in real-world settings: traffic signs and vehicles. Using the diverse, methodically collected videos captured from driving cars and flying drones, incorporating physical object trigger deployments in authentic scenes, we investigated the viability of physical object-triggered backdoor attacks in application settings. Our findings revealed 8 key insights. Importantly, the prevalent "digital" data poisoning method for injecting backdoors into models does not lead to effective attacks against detectors in the real world, although proven effective in classification tasks. We construct a new, cost-efficient attack method, dubbed MORPHING, incorporating the unique nature of detection tasks; ours is remarkably successful in injecting physical object-triggered backdoors, even capable of poisoning triggers with clean label annotations or invisible triggers without diminishing the success of physical object triggered backdoors. We discovered that the defenses curated are ill-equipped to safeguard detectors against such attacks. To underscore the severity of the threat and foster further research, we, for the first time, release an extensive video test set of real-world backdoor attacks. Our study not only establishes the credibility and seriousness of this threat but also serves as a clarion call to the research community to advance backdoor defenses in the context of object detection.
Human-object interaction (HOI) detectors with popular query-transformer architecture have achieved promising performance. However, accurately identifying uncommon visual patterns and distinguishing between ambiguous HOIs continue to be difficult for them. We observe that these difficulties may arise from the limited capacity of traditional detector queries in representing diverse intra-category patterns and inter-category dependencies. To address this, we introduce the Interaction Prompt Distribution Learning (InterProDa) approach. InterProDa learns multiple sets of soft prompts and estimates category distributions from various prompts. It then incorporates HOI queries with category distributions, making them capable of representing near-infinite intra-category dynamics and universal cross-category relationships. Our InterProDa detector demonstrates competitive performance on HICO-DET and vcoco benchmarks. Additionally, our method can be integrated into most transformer-based HOI detectors, significantly enhancing their performance with minimal additional parameters.
We propose an overview of the decentralized reconfiguration language Concerto-D through its Maude formalization. Concerto-D extends the already published Concerto language. Concerto-D improves on two different parameters compared with related work: the decentralized coordination of numerous local reconfiguration plans which avoid a single point of failure when considering unstable networks such as edge computing, or cyber-physical systems (CPS) for instance; and a mechanized formal semantics of the language with Maude which offers guarantees on the executability of the semantics. Throughout the paper, the Concerto-D language and its semantics are exemplified with a reconfiguration extracted from a real case study on a CPS. We rely on the Maude formal specification language, which is based on rewriting logic, and consequently perfectly suited for describing a concurrent model.
Social robot navigation algorithms are often demonstrated in overly simplified scenarios, prohibiting the extraction of practical insights about their relevance to real-world domains. Our key insight is that an understanding of the inherent complexity of a social robot navigation scenario could help characterize the limitations of existing navigation algorithms and provide actionable directions for improvement. Through an exploration of recent literature, we identify a series of factors contributing to the complexity of a scenario, disambiguating between contextual and robot-related ones. We then conduct a simulation study investigating how manipulations of contextual factors impact the performance of a variety of navigation algorithms. We find that dense and narrow environments correlate most strongly with performance drops, while the heterogeneity of agent policies and directionality of interactions have a less pronounced effect. Our findings motivate a shift towards developing and testing algorithms under higher-complexity settings.
Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. We then validate the above hypothesis by tracking the internal states of the LLMs through identifying working languages and language transferring neurons. Experimental results show that the internal state changes of the LLM are consistent with our Babel Tower Hypothesis. Building on these insights, we propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs, which significantly outperforms LLMs trained on the original corpus. The proposed Babel Tower Hypothesis provides new insights into designing pre-training data distributions to achieve optimal multilingual capabilities in LLMs.
This study establishes the consistency of Bayesian adaptive testing methods under the Rasch model, addressing a gap in the literature on their large-sample guarantees. Although Bayesian approaches are recognized for their finite-sample performance and capability to circumvent issues such as the cold-start problem; however, rigorous proofs of their asymptotic properties, particularly in non-i.i.d. structures, remain lacking. We derive conditions under which the posterior distributions of latent traits converge to the true values for a sequence of given items, and demonstrate that Bayesian estimators remain robust under the mis-specification of the prior. Our analysis then extends to adaptive item selection methods in which items are chosen endogenously during the test. Additionally, we develop a Bayesian decision-theoretical framework for the item selection problem and propose a novel selection that aligns the test process with optimal estimator performance. These theoretical results provide a foundation for Bayesian methods in adaptive testing, complementing prior evidence of their finite-sample advantages.
AI assistants are being created to help software engineers conduct a variety of coding-related tasks, such as writing, documenting, and testing code. We describe the use of the watsonx Code Assistant (WCA), an LLM-powered coding assistant deployed internally within IBM. Through surveys of two user cohorts (N=669) and unmoderated usability testing (N=15), we examined developers' experiences with WCA and its impact on their productivity. We learned about their motivations for using (or not using) WCA, we examined their expectations of its speed and quality, and we identified new considerations regarding ownership of and responsibility for generated code. Our case study characterizes the impact of an LLM-powered assistant on developers' perceptions of productivity and it shows that although such tools do often provide net productivity increases, these benefits may not always be experienced by all users.
Live programming features can be found in a range of programming environments, from individual prototypes to widely used environments. While liveness is generally considered a useful property, there is little empirical evidence on when and how liveness can be beneficial. Even though there are few experimental studies, their results are largely inconclusive. We reviewed existing experiments and related studies to gather a collection of potential effects of liveness and moderating factors. Based on this collection, we concluded that **task complexity** and **prior experience addressing liveness** are potentially essential factors neglected in previous experiments. To fill this gap, we devised and conducted a controlled experiment (N = 37) testing the hypothesis that task complexity moderates the effects of live introspection tools on participants? debugging efficiency, given participants with prior experience with liveness. Our results do not support the hypothesis that task complexity moderates the effect of live introspection tools. This non-significant moderation effect might result from the low number of participants, as the data suggests a moderation effect. The results also show that in our experiment setting, live introspection tools significantly reduced the time participants took to debug the tasks. For researchers interested in the effects of liveness, our findings suggest that studies on liveness should make conscious choices about task complexity and participants' prior experience with liveness. For designers of programming environments, the results of our experiment are a step toward understanding when and how programming tools should be live to become more helpful to programmers.
Metastructures are engineered systems composed of periodic arrays of identical components, called resonators, designed to achieve specific dynamic effects, such as creating a band gap-a frequency range where waves cannot propagate through the structure. When equipped with patches of piezoelectric material, these metastructures exhibit an additional capability: they can harvest energy effectively even from frequencies much lower than the fundamental frequency of an individual resonator. This energy harvesting capability is particularly valuable for applications where low-frequency vibrations dominate. To support the design of metastructures for dual purposes, such as energy harvesting and vibration suppression (reducing unwanted oscillations in the structure), we develop a multi-patch isogeometric model of a piezoelectric energy harvester. This model is based on a piezoelectric Kirchhoff-Love plate-a thin, flexible structure with embedded piezoelectric patches-and uses Nitsche's method to enforce compatibility conditions in terms of displacement, rotations, shear force, and bending moments across the boundaries of different patches. The model is validated against experimental and numerical data from the literature. We then present a novel, parameterized metastructure plate design and conduct a parametric study to explore how resonator geometries affect key performance metrics, including the location and width of the band gap and the position of the first peak in the voltage frequency response function. This model can be integrated with optimization algorithms to maximize outcomes such as energy harvesting efficiency or vibration reduction, depending on application needs.
Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences, leading to harmful or undesirable outputs. Preference learning, which trains models to distinguish between preferred and non-preferred responses based on human feedback, has become a crucial component for ensuring that LLMs align with human values. Despite the widespread adoption in real-world systems, a thorough theoretical understanding of the generalization guarantees for these models remain lacking. This paper bridges that gap by introducing a new theoretical framework to analyze the generalization guarantees of models trained with direct preference optimization (DPO). While existing generalization theory often focuses on overparameterized models achieving near-optimal loss or models independent of the training process, our framework rigorously assesses how well models generalize after a finite number of gradient steps, reflecting real-world LLM training practices. By analyzing the reward margin associated with each sample and its trajectory throughout training, we can effectively bound the generalization error. We derive learning guarantees showing that, under specific conditions, models trained with DPO can correctly discern preferred responses on unseen data with high probability. These insights are empirically validated on contemporary LLMs, underscoring the practical relevance of our theoretical findings.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.