Neural operators, which can act as implicit solution operators of hidden governing equations, have recently become popular tools for learning the responses of complex real-world physical systems. Nevertheless, most neural operator applications have thus far been data-driven and neglect the intrinsic preservation of fundamental physical laws in data. In this work, we introduce a novel integral neural operator architecture called the Peridynamic Neural Operator (PNO) that learns a nonlocal constitutive law from data. This neural operator provides a forward model in the form of state-based peridynamics, with objectivity and momentum balance laws automatically guaranteed. As applications, we demonstrate the expressivity and efficacy of our model in learning complex material behaviors from both synthetic and experimental data sets. We show that, owing to its ability to capture complex responses, our learned neural operator achieves improved accuracy and efficiency compared to baseline models that use predefined constitutive laws. Moreover, by preserving the essential physical laws within the neural network architecture, the PNO is robust in treating noisy data. The method shows generalizability to different domain configurations, external loadings, and discretizations.
Stacked intelligent metasurfaces (SIM) is a revolutionary technology, which can outperform its single-layer counterparts by performing advanced signal processing relying on wave propagation. In this work, we exploit SIM to enable transmit precoding and receiver combining in holographic multiple-input multiple-output (HMIMO) communications, and we study the achievable rate by formulating a joint optimization problem of the SIM phase shifts at both sides of the transceiver and the covariance matrix of the transmitted signal. Notably, we propose its solution by means of an iterative optimization algorithm that relies on the projected gradient method, and accounts for all optimization parameters simultaneously. We also obtain the step size guaranteeing the convergence of the proposed algorithm. Simulation results provide fundamental insights such the performance improvements compared to the single-RIS counterpart and conventional MIMO system. Remarkably, the proposed algorithm results in the same achievable rate as the alternating optimization (AO) benchmark but with a less number of iterations.
Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six different benchmark settings. Most recent state-of-the-art LVLMs like LLaVa-1.5, InstructBLIP and GPT-4V not only severely deteriorate in terms of classification performance, e.g., average drop of 65.58 in EM for Stanford Dogs for LLaVA-1.5, but also struggle to generate an accurate explanation with detailed attributes based on the concept that appears within an input image despite their capability to generate holistic image-level descriptions. In-depth analyses show that instruction-tuned LVLMs exhibit modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept, preventing the image modality from leveraging the rich parametric knowledge within the LLMs. In an effort to further the community's endeavor in this direction, we propose a multiple granularity attribute-centric evaluation benchmark, Finer, which aims to establish a ground to evaluate LVLMs' fine-grained visual comprehension ability and provide significantly improved explainability.
Numerical solution of discrete PDEs corresponding to saddle point problems is highly relevant to physical systems such as Stokes flow. However, scaling up numerical solvers for such systems is often met with challenges in efficiency and convergence. Multigrid is an approach with excellent applicability to elliptic problems such as the Stokes equations, and can be a solution to such challenges of scalability and efficiency. The degree of success of such methods, however, is highly contingent on the design of key components of a multigrid scheme, including the hierarchy of discretizations, and the relaxation scheme used. Additionally, in many practical cases, it may be more effective to use a multigrid scheme as a preconditioner to an iterative Krylov subspace solver, as opposed to striving for maximum efficacy of the relaxation scheme in all foreseeable settings. In this paper, we propose an efficient symmetric multigrid preconditioner for the Stokes Equations on a staggered finite-difference discretization. Our contribution is focused on crafting a preconditioner that (a) is symmetric indefinite, matching the property of the Stokes system itself, (b) is appropriate for preconditioning the SQMR iterative scheme, and (c) has the requisite symmetry properties to be used in this context. In addition, our design is efficient in terms of computational cost and facilitates scaling to large domains.
Vehicular networks use Decentralized Congestion Control (DCC) mechanisms to operate effectively, but this mechanism may introduce queuing delays. Freshness of Cooperative Awareness Messages (CAMs) is critical for their usefulness. In this letter we explore how the presence of other types of traffic additional to CAMs, even with lower priorities, has an impact on the freshness of CAM messages due to DCC queuing. Finally, we propose Generate-on-Time (GoT), which is a simple mechanism that reduces DCC queuing delays for CAM messages without introducing any downside in other performance metrics.
This study explores the realm of knowledge-base question answering (KBQA). KBQA is considered a challenging task, particularly in parsing intricate questions into executable logical forms. Traditional semantic parsing (SP)-based methods require extensive data annotations, which result in significant costs. Recently, the advent of few-shot in-context learning, powered by large language models (LLMs), has showcased promising capabilities. Yet, fully leveraging LLMs to parse questions into logical forms in low-resource scenarios poses a substantial challenge. To tackle these hurdles, we introduce Interactive-KBQA, a framework designed to generate logical forms through direct interaction with knowledge bases (KBs). Within this framework, we have developed three generic APIs for KB interaction. For each category of complex question, we devised exemplars to guide LLMs through the reasoning processes. Our method achieves competitive results on the WebQuestionsSP, ComplexWebQuestions, KQA Pro, and MetaQA datasets with a minimal number of examples (shots). Importantly, our approach supports manual intervention, allowing for the iterative refinement of LLM outputs. By annotating a dataset with step-wise reasoning processes, we showcase our model's adaptability and highlight its potential for contributing significant enhancements to the field.
Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notably for its efficacy in complex reasoning tasks, such as mathematical proofs. However, its application in code generation faces a distinct challenge, i.e., although the code generated with CoT reasoning is logically correct, it faces the problem of syntax error (e.g., invalid syntax error report) during code execution, which causes the CoT result's pass@1 in HumanEval even lower than the zero-shot result. In this paper, we present Code Chain-of-Thought (CodeCoT) that integrates CoT with a self-examination process for code generation. CodeCoT begins with the LLMs using CoT for initial code development to ensure the generated code follows the correct logic flow. Then, CodeCoT will generate test cases to validate whether the code has syntax errors during the execution. CodeCoT then employs a self-examination phase, in which the generated code is executed against these test cases in the local environment. If the local environment raises error information (e.g., invalid syntax error), CodeCoT will iteratively refine the code based on the feedback information. Within this loop, CodeCoT can make sure their generated codes not only follow the logic flow of the code description, but the syntax error will also be addressed with the self-examination process. Our evaluation results reveal that CodeCoT improves the effectiveness of code generation. For example, CodeCoT increases pass@1 from 75.6% to 79.3% for the HumanEval dataset.
Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.
Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5.80, 4.60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.
Sentiment analysis is a widely studied NLP task where the goal is to determine opinions, emotions, and evaluations of users towards a product, an entity or a service that they are reviewing. One of the biggest challenges for sentiment analysis is that it is highly language dependent. Word embeddings, sentiment lexicons, and even annotated data are language specific. Further, optimizing models for each language is very time consuming and labor intensive especially for recurrent neural network models. From a resource perspective, it is very challenging to collect data for different languages. In this paper, we look for an answer to the following research question: can a sentiment analysis model trained on a language be reused for sentiment analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the data is more limited? Our goal is to build a single model in the language with the largest dataset available for the task, and reuse it for languages that have limited resources. For this purpose, we train a sentiment analysis model using recurrent neural networks with reviews in English. We then translate reviews in other languages and reuse this model to evaluate the sentiments. Experimental results show that our robust approach of single model trained on English reviews statistically significantly outperforms the baselines in several different languages.
Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.