In this paper, we develop an effective degrees of freedom (EDoF) performance analysis framework specifically tailored for near-field XL-MIMO systems. We explore five representative distinct XL-MIMO hardware designs, including uniform planar array (UPA)-based with point antennas, two-dimensional (2D) continuous aperture (CAP) plane-based, UPA-based with patch antennas, uniform linear array (ULA)-based, and one-dimensional (1D) CAP line segment-based XL-MIMO systems. Our analysis encompasses two near-field channel models: the scalar and dyadic Green's function-based channel models. More importantly, when applying the scalar Green's function-based channel, we derive EDoF expressions in the closed-form, characterizing the impacts of the physical size of the transceiver, the transmitting distance, and the carrier frequency. In our numerical results, we evaluate and compare the EDoF performance across all examined XL-MIMO designs, confirming the accuracy of our proposed closed-form expressions. Furthermore, we observe that with an increasing number of antennas, the EDoF performance for both UPA-based and ULA-based systems approaches that of 2D CAP plane and 1D CAP line segment-based systems, respectively. Moreover, we unveil that the EDoF performance for near-field XL-MIMO systems is predominantly determined by the array aperture size rather than the sheer number of antennas.
In this paper, we provide a systematic approach for assessing and comparing the computational complexity of neural network layers in digital signal processing. We provide and link four software-to-hardware complexity measures, defining how the different complexity metrics relate to the layers' hyper-parameters. This paper explains how to compute these four metrics for feed-forward and recurrent layers, and defines in which case we ought to use a particular metric depending on whether we characterize a more soft- or hardware-oriented application. One of the four metrics, called `the number of additions and bit shifts (NABS)', is newly introduced for heterogeneous quantization. NABS characterizes the impact of not only the bitwidth used in the operation but also the type of quantization used in the arithmetical operations. We intend this work to serve as a baseline for the different levels (purposes) of complexity estimation related to the neural networks' application in real-time digital signal processing, aiming at unifying the computational complexity estimation.
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks, shedding new light on multi-task learning with neural networks. Finally, this paper develops a connection between weight-decay regularization and the multi-task lasso problem. This connection leads to novel bounds for layer widths in deep networks that depend on the intrinsic dimensions of the training data representations. This insight not only deepens the understanding of the deep network architectural requirements, but also yields a simple convex optimization method for deep neural network compression. The performance of this compression procedure is evaluated on various architectures.
In this paper, we propose a class of nonlocal models to approximate the Poisson model on manifolds with homogeneous Neumann boundary condition, where the manifolds are assumed to be embedded in high dimensional Euclid spaces. In comparison to the existing nonlocal approximation of Poisson models with Neumann boundary, we optimize the truncation error of model by adding an augmented term along the $2\delta$ layer of boundary, with $2\delta$ be the nonlocal interaction horizon. Such term is formulated by the integration of the second order normal derivative of solution through the boundary, while the second order normal derivative is expressed as the difference between the interior Laplacian and the boundary Laplacian. The concentration of our paper is on the construction of nonlocal model, the well-posedness of model, and its second-order convergence rate to its local counterpart. The localization rate of our nonlocal model is currently optimal among all related works even for the case of high dimensional Euclid spaces.
While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable.
Hyper-redundant Robotic Manipulators (HRMs) offer great dexterity and flexibility of operation, but solving Inverse Kinematics (IK) is challenging. In this work, we introduce VO-FABRIK, an algorithm combining Forward and Backward Reaching Inverse Kinematics (FABRIK) for repeatable deterministic IK computation, and an approach inspired from velocity obstacles to perform path planning under collision and joint limits constraints. We show preliminary results on an industrial HRM with 19 actuated joints. Our algorithm achieves good performance where a state-of-the-art IK solver fails.
In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon.
Dynamic graph neural networks (DyGNNs) have demonstrated powerful predictive abilities by exploiting graph structural and temporal dynamics. However, the existing DyGNNs fail to handle distribution shifts, which naturally exist in dynamic graphs, mainly because the patterns exploited by DyGNNs may be variant with respect to labels under distribution shifts. In this paper, we propose Disentangled Intervention-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA) to handle spatio-temporal distribution shifts in dynamic graphs by discovering and utilizing invariant patterns, i.e., structures and features whose predictive abilities are stable across distribution shifts. Specifically, we first propose a disentangled spatio-temporal attention network to capture the variant and invariant patterns. By utilizing the disentangled patterns, we design a spatio-temporal intervention mechanism to create multiple interventional distributions and an environment inference module to infer the latent spatio-temporal environments, and minimize the variance of predictions among these intervened distributions and environments, so that our model can make predictions based on invariant patterns with stable predictive abilities under distribution shifts. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts. Our work is the first study of spatio-temporal distribution shifts in dynamic graphs, to the best of our knowledge.
This paper explores the potential of Physics-Informed Neural Networks (PINNs) to serve as Reduced Order Models (ROMs) for simulating the flow field within stirred tank reactors (STRs). We solve the two-dimensional stationary Navier-Stokes equations within a geometrically intricate domain and explore methodologies that allow us to integrate additional physical insights into the model. These approaches include imposing the Dirichlet boundary conditions (BCs) strongly and employing domain decomposition (DD), with both overlapping and non-overlapping subdomains. We adapt the Extended Physics-Informed Neural Network (XPINN) approach to solve different sets of equations in distinct subdomains based on the diverse flow characteristics present in each region. Our exploration results in a hierarchy of models spanning various levels of complexity, where the best models exhibit l1 prediction errors of less than 1% for both pressure and velocity. To illustrate the reproducibility of our approach, we track the errors over repeated independent training runs of the best identified model and show its reliability. Subsequently, by incorporating the stirring rate as a parametric input, we develop a fast-to-evaluate model of the flow capable of interpolating across a wide range of Reynolds numbers. Although we exclusively restrict ourselves to STRs in this work, we conclude that the steps taken to obtain the presented model hierarchy can be transferred to other applications.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.