Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained with expert annotations in domain-specific tasks? In this work, we conduct an empirical experiment on four datasets from three different domains comparing SOTA LLMs with small models trained on expert annotations with AL. We found that small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller. Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications and human experts remain indispensable in tasks involving data annotation driven by domain-specific knowledge.
Autonomous Micro Aerial Vehicles (MAVs) such as quadrotors equipped with manipulation mechanisms have the potential to assist humans in tasks such as construction and package delivery. Cables are a promising option for manipulation mechanisms due to their low weight, low cost, and simple design. However, designing control and planning strategies for cable mechanisms presents challenges due to indirect load actuation, nonlinear configuration space, and highly coupled system dynamics. In this paper, we propose a novel Nonlinear Model Predictive Control (NMPC) method that enables a team of quadrotors to manipulate a rigid-body payload in all 6 degrees of freedom via suspended cables. Our approach can concurrently exploit, as part of the receding horizon optimization, the available mechanical system redundancies to perform additional tasks such as inter-robot separation and obstacle avoidance while respecting payload dynamics and actuator constraints. To address real-time computational requirements and scalability, we employ a lightweight state vector parametrization that includes only payload states in all six degrees of freedom. This also enables the planning of trajectories on the $SE(3)$ manifold load configuration space, thereby also reducing planning complexity. We validate the proposed approach through simulation and real-world experiments.
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.
Large Language Models (LLMs) are regularly being used to label data across many domains and for myriad tasks. By simply asking the LLM for an answer, or ``prompting,'' practitioners are able to use LLMs to quickly get a response for an arbitrary task. This prompting is done through a series of decisions by the practitioner, from simple wording of the prompt, to requesting the output in a certain data format, to jailbreaking in the case of prompts that address more sensitive topics. In this work, we ask: do variations in the way a prompt is constructed change the ultimate decision of the LLM? We answer this using a series of prompt variations across a variety of text classification tasks. We find that even the smallest of perturbations, such as adding a space at the end of a prompt, can cause the LLM to change its answer. Further, we find that requesting responses in XML and commonly used jailbreaks can have cataclysmic effects on the data labeled by LLMs.
Autism Spectrum Disorder (ASD) is a complicated neurological condition which is challenging to diagnose. Numerous studies demonstrate that children diagnosed with autism struggle with maintaining attention spans and have less focused vision. The eye-tracking technology has drawn special attention in the context of ASD since anomalies in gaze have long been acknowledged as a defining feature of autism in general. Deep Learning (DL) approaches coupled with eye-tracking sensors are exploiting additional capabilities to advance the diagnostic and its applications. By learning intricate nonlinear input-output relations, DL can accurately recognize the various gaze and eye-tracking patterns and adjust to the data. Convolutions alone are insufficient to capture the important spatial information in gaze patterns or eye tracking. The dynamic kernel-based process known as involutions can improve the efficiency of classifying gaze patterns or eye tracking data. In this paper, we utilise two different image-processing operations to see how these processes learn eye-tracking patterns. Since these patterns are primarily based on spatial information, we use involution with convolution making it a hybrid, which adds location-specific capability to a deep learning model. Our proposed model is implemented in a simple yet effective approach, which makes it easier for applying in real life. We investigate the reasons why our approach works well for classifying eye-tracking patterns. For comparative analysis, we experiment with two separate datasets as well as a combined version of both. The results show that IC with three involution layers outperforms the previous approaches.
Technological progress has persistently shaped the dynamics of human-machine interactions in task execution. In response to the advancements in Generative AI, this paper outlines a detailed study plan that investigates various human-AI interaction modalities across a range of tasks, characterized by differing levels of creativity and complexity. This exploration aims to inform and contribute to the development of Graphical User Interfaces (GUIs) that effectively integrate with and enhance the capabilities of Generative AI systems. The study comprises three parts: exploring fixed-scope tasks through news headline generation, delving into atomic creative tasks with analogy generation, and investigating complex tasks via data visualization. Future work aims to extend this exploration to linearize complex data analysis results into narratives understandable to a broader audience, thereby enhancing the interpretability of AI-generated content.
For generative AI to succeed, how engaging a conversationalist must it be? For almost sixty years, some conversational agents have responded to any question or comment to keep a conversation going. In recent years, several utilized machine learning or sophisticated language processing, such as Tay, Xiaoice, Zo, Hugging Face, Kuki, and Replika. Unlike generative AI, they focused on engagement, not expertise. Millions of people were motivated to engage with them. What were the attractions? Will generative AI do better if it is equally engaging, or should it be less engaging? Prior to the emergence of generative AI, we conducted a large-scale quantitative and qualitative analysis to learn what motivated millions of people to engage with one such 'virtual companion,' Microsoft's Zo. We examined the complete chat logs of 2000 anonymized people. We identified over a dozen motivations that people had for interacting with this software. Designers learned different ways to increase engagement. Generative conversational AI does not yet have a clear revenue model to address its high cost. It might benefit from being more engaging, even as it supports productivity and creativity. Our study and analysis point to opportunities and challenges.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.