This paper introduces the Definite Finite Automaton augmented large language model (DFA-LLM), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). Traditional LLMs face challenges in generating regulated and compliant responses in special scenarios with predetermined response guidelines, like emotional support and customer service. Our framework addresses these challenges by embedding a Definite Finite Automaton (DFA), learned from training dialogues, within the LLM. This structured approach enables the LLM to adhere to a deterministic response pathway, guided by the DFA. The advantages of DFA-LLM include an interpretable structure through human-readable DFA, context-aware retrieval for responses in conversations, and plug-and-play compatibility with existing LLMs. Extensive benchmarks validate DFA-LLM's effectiveness, indicating its potential as a valuable contribution to the conversational agent.
This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings in complicated scenarios. To make it manipulable, we propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations. With this design, Wear-Any-Way gets state-of-the-art performance for the standard setting and provides a novel interaction form for customizing the wearing style. For instance, it supports users to drag the sleeve to make it rolled up, drag the coat to make it open, and utilize clicks to control the style of tuck, etc. Wear-Any-Way enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry.
We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizing each solution based on specific initial conditions, it effectively approximates a mapping between function spaces. Our method enhances parameter efficiency by incorporating low-rank matrices, thereby boosting computational efficiency and scalability. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference, even in cases of out-of-distribution examples.
This paper presents a novel split learning (SL) framework, referred to as SplitMAC, which reduces the latency of SL by leveraging simultaneous uplink transmission over multiple access channels. The key strategy is to divide devices into multiple groups and allow the devices within the same group to simultaneously transmit their smashed data and device-side models over the multiple access channels. The optimization problem of device grouping to minimize SL latency is formulated, and the benefit of device grouping in reducing the uplink latency of SL is theoretically derived. By examining a two-device grouping case, two asymptotically-optimal algorithms are devised for device grouping in low and high signal-to-noise ratio (SNR) scenarios, respectively, while providing proofs of their optimality. By merging these algorithms, a near-optimal device grouping algorithm is proposed to cover a wide range of SNR. Our SL framework is also extended to consider practical fading channels and to support a general group size. Simulation results demonstrate that our SL framework with the proposed device grouping algorithm is superior to existing SL frameworks in reducing SL latency.
This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information about the user's intention to SAM. Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on both the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We conducted evaluations using four different point augmentation strategies: random sampling, sampling based on maximum difference entropy, maximum distance, and saliency. Experiment results on the COCO, Fundus, COVID QUEx, and ISIC2018 datasets show that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency. SAMAug demonstrates the potential of visual prompt augmentation for computer vision. Codes of SAMAug are available at github.com/yhydhx/SAMAug
This paper introduces a groundbreaking digital neuromorphic architecture that innovatively integrates Brain Code Unit (BCU) and Fundamental Code Unit (FCU) using mixedsignal design methodologies. Leveraging open-source datasets and the latest advances in materials science, our research focuses on enhancing the computational efficiency, accuracy, and adaptability of neuromorphic systems. The core of our approach lies in harmonizing the precision and scalability of digital systems with the robustness and energy efficiency of analog processing. Through experimentation, we demonstrate the effectiveness of our system across various metrics. The BCU achieved an accuracy of 88.0% and a power efficiency of 20.0 GOP/s/W, while the FCU recorded an accuracy of 86.5% and a power efficiency of 18.5 GOP/s/W. Our mixed-signal design approach significantly improved latency and throughput, achieving a latency as low as 0.75 ms and throughput up to 213 TOP/s. These results firmly establish the potential of our architecture in neuromorphic computing, providing a solid foundation for future developments in this domain. Our study underscores the feasibility of mixedsignal neuromorphic systems and their promise in advancing the field, particularly in applications requiring high efficiency and adaptability
This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at //github.com/ZhengtongXu/LeTO.
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.
This paper introduces a novel adversarial attack method targeting text classification models, termed the Modified Word Saliency-based Adversarial At-tack (MWSAA). The technique builds upon the concept of word saliency to strategically perturb input texts, aiming to mislead classification models while preserving semantic coherence. By refining the traditional adversarial attack approach, MWSAA significantly enhances its efficacy in evading detection by classification systems. The methodology involves first identifying salient words in the input text through a saliency estimation process, which prioritizes words most influential to the model's decision-making process. Subsequently, these salient words are subjected to carefully crafted modifications, guided by semantic similarity metrics to ensure that the altered text remains coherent and retains its original meaning. Empirical evaluations conducted on diverse text classification datasets demonstrate the effectiveness of the proposed method in generating adversarial examples capable of successfully deceiving state-of-the-art classification models. Comparative analyses with existing adversarial attack techniques further indicate the superiority of the proposed approach in terms of both attack success rate and preservation of text coherence.
This paper introduces two novel approaches for Online Multi-Task Learning (MTL) Regression Problems. We employ a high performance graph-based MTL formulation and develop two alternative recursive versions based on the Weighted Recursive Least Squares (WRLS) and the Online Sparse Least Squares Support Vector Regression (OSLSSVR) strategies. Adopting task-stacking transformations, we demonstrate the existence of a single matrix incorporating the relationship of multiple tasks and providing structural information to be embodied by the MT-WRLS method in its initialization procedure and by the MT-OSLSSVR in its multi-task kernel function. Contrasting the existing literature, which is mostly based on Online Gradient Descent (OGD) or cubic inexact approaches, we achieve exact and approximate recursions with quadratic per-instance cost on the dimension of the input space (MT-WRLS) or on the size of the dictionary of instances (MT-OSLSSVR). We compare our online MTL methods to other contenders in a real-world wind speed forecasting case study, evidencing the significant gain in performance of both proposed approaches.
Since the launch of ChatGPT, a powerful AI Chatbot developed by OpenAI, large language models (LLMs) have made significant advancements in both academia and industry, bringing about a fundamental engineering paradigm shift in many areas. While LLMs are powerful, it is also crucial to best use their power where "prompt'' plays a core role. However, the booming LLMs themselves, including excellent APIs like ChatGPT, have several inherent limitations: 1) temporal lag of training data, and 2) the lack of physical capabilities to perform external actions. Recently, we have observed the trend of utilizing prompt-based tools to better utilize the power of LLMs for downstream tasks, but a lack of systematic literature and standardized terminology, partly due to the rapid evolution of this field. Therefore, in this work, we survey related prompting tools and promote the concept of the "Prompting Framework" (PF), i.e. the framework for managing, simplifying, and facilitating interaction with large language models. We define the lifecycle of the PF as a hierarchical structure, from bottom to top, namely: Data Level, Base Level, Execute Level, and Service Level. We also systematically depict the overall landscape of the emerging PF field and discuss potential future research and challenges. To continuously track the developments in this area, we maintain a repository at //github.com/lxx0628/Prompting-Framework-Survey, which can be a useful resource sharing platform for both academic and industry in this field.