This paper provides a specification test for semiparametric models with nonparametrically generated regressors. Such variables are not observed by the researcher but are nonparametrically identified and estimable. Applications of the test include models with endogenous regressors identified by control functions, semiparametric sample selection models, or binary games with incomplete information. The statistic is built from the residuals of the semiparametric model. A novel wild bootstrap procedure is shown to provide valid critical values. We consider nonparametric estimators with an automatic bias correction that makes the test implementable without undersmoothing. In simulations the test exhibits good small sample performances, and an application to women's labor force participation decisions shows its implementation in a real data context.
Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural languages. In this paper, we explore the advantages and disadvantages of LLMs and RMs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates information refinement through synergy between RMs and LLMs. InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections and enables LLMs to enhance prompt formulation using retrieved documents. This iterative refinement process augments the inputs of RMs and LLMs, leading to more accurate retrieval. Experiments on large-scale retrieval benchmarks involving web search and low-resource retrieval tasks demonstrate that InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods, even those using relevance judgment. Source code is available at //github.com/Cyril-JZ/InteR
This paper introduces TunesFormer, an efficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melodies that conform to desired musical forms. Our evaluation demonstrates TunesFormer's superior efficiency, being 3.22 times faster than GPT-2 and 1.79 times faster than a model with linear complexity of equal scale while offering comparable performance in controllability and other metrics. TunesFormer provides a novel tool for musicians, composers, and music enthusiasts alike to explore the vast landscape of Irish music. Our model and code are available at //github.com/sander-wood/tunesformer.
This paper introduces a new numerical approach that integrates local randomized neural networks (LRNNs) and the hybridized discontinuous Petrov-Galerkin (HDPG) method for solving coupled fluid flow problems. The proposed method partitions the domain of interest into several subdomains and constructs an LRNN on each subdomain. Then, the HDPG scheme is used to couple the LRNNs to approximate the unknown functions. We develop LRNN-HDPG methods based on velocity-stress formulation to solve two types of problems: Stokes-Darcy problems and Brinkman equations, which model the flow in porous media and free flow. We devise a simple and effective way to deal with the interface conditions in the Stokes-Darcy problems without adding extra terms to the numerical scheme. We conduct extensive numerical experiments to demonstrate the stability, efficiency, and robustness of the proposed method. The numerical results show that the LRNN-HDPG method can achieve high accuracy with a small number of degrees of freedom.
This paper proposes a fully scalable multi-agent reinforcement learning (MARL) approach for packet scheduling in conflict graphs, aiming to minimizing average packet delays. Each agent autonomously manages the schedule of a single link over one or multiple sub-bands, considering its own state and states of conflicting links. The problem can be conceptualized as a decentralized partially observable Markov decision process (Dec-POMDP). The proposed solution leverages an on-policy reinforcement learning algorithms multi-agent proximal policy optimization (MAPPO) within a multi-agent networked system, incorporating advanced recurrent structures in the neural network. The MARL design allows for fully decentralized training and execution, seamlessly scaling to very large networks. Extensive simulations across a diverse range of conflict graphs demonstrate that the proposed solution compares favorably to well-established schedulers in terms of both throughput and delay under various traffic conditions.
Topology identification (TI) is a key task for state estimation (SE) in distribution grids, especially the one with high-penetration renewables. The uncertainties, initiated by the time-series behavior of renewables, will almost certainly lead to bad TI results without a proper treatment. These uncertainties are analytically intractable under conventional framework-they are usually jointly spatial-temporal dependent, and hence cannot be simply treated as white noise. For this purpose, a hybrid framework is suggested in this paper to handle these uncertainties in a systematic and theoretical way; in particular, big data analytics are studied to harness the jointly spatial-temporal statistical properties of those uncertainties. With some prior knowledge, a model bank is built first to store the countable typical models of network configurations; therefore, the difference between the SE outputs of each bank model and our observation is capable of being defined as a matrix variate-the so-called random matrix. In order to gain insight into the random matrix, a well-designed metric space is needed. Auto-regression (AR) model, factor analysis (FA), and random matrix theory (RMT) are tied together for the metric space design, followed by jointly temporal-spatial analysis of those matrices which is conducted in a high-dimensional (vector) space. Under the proposed framework, some big data analytics and theoretical results are obtained to improve the TI performance. Our framework is validated using IEEE standard distribution network with some field data in practice.
When interest lies in the progression of a disease rather than on a single outcome, non-homogeneous multi-state Markov models constitute a natural and powerful modelling approach. Constant monitoring of a phenomenon of interest is often unfeasible, hence leading to an intermittent observation scheme. This setting is challenging and existing models and their implementations do not yet allow for flexible enough specifications that can fully exploit the information contained in the data. To widen significantly the scope of multi-state Markov models, we propose a closed-form expression for the local curvature information of a key quantity, the transition probability matrix. Such development allows one to model any type of multi-state Markov process, where the transition intensities are flexibly specified as functions of additive predictors. Parameter estimation is carried out through a carefully structured, stable penalised likelihood approach. The methodology is exemplified via two case studies that aim at modelling the onset of cardiac allograft vasculopathy, and cognitive decline. To support applicability and reproducibility, all developed tools are implemented in the R package flexmsm.
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach. The implementation is publicly available at //github.com/tanjatang/CAN
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.