This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context. Previous studies have suggested that key phrase selection is essential for question generation (QG), yet it is still challenging to connect such disjointed phrases into meaningful questions, particularly for long context. To mitigate this issue, we propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions. Here, full answer generation is introduced to connect the short answer with the selected key phrases, thus forming an answer-aware summary to facilitate QG. Both FA-model and Q-model are formalized as simple-yet-effective Phrase-Enhanced Transformers, our joint model for phrase selection and text generation. Experimental results show that our method outperforms strong baselines on two popular QG datasets. Our code is available at //github.com/zeaver/MultiFactor.
Uniformly valid inference for cointegrated vector autoregressive processes has so far proven difficult due to certain discontinuities arising in the asymptotic distribution of the least squares estimator. We extend asymptotic results from the univariate case to multiple dimensions and show how inference can be based on these results. Furthermore, we show that lag augmentation and a recent instrumental variable procedure can also yield uniformly valid tests and confidence regions. We verify the theoretical findings and investigate finite sample properties in simulation experiments for two specific examples.
We consider a distributed coding for computing problem with constant decoding locality, i.e. with a vanishing error probability, any single sample of the function can be approximately recovered by probing only constant number of compressed bits. We establish an achievable rate region by designing an efficient coding scheme. The scheme reduces the required rate by introducing auxiliary random variables and supports local decoding at the same time. Then we show the rate region is optimal under mild regularity conditions on source distributions. A coding for computing problem with side information is analogously studied. These results indicate that more rate has to be taken in order to achieve lower coding complexity in distributed computing settings. Moreover, useful graph characterizations are developed to simplify the computation of the achievable rate region.
There has been a lot of work in question generation where different methods to provide target answers as input, have been employed. This experimentation has been mostly carried out for RNN based models. We use three different methods and their combinations for incorporating answer information and explore their effect on several automatic evaluation metrics. The methods that are used are answer prompting, using a custom product method using answer embeddings and encoder outputs, choosing sentences from the input paragraph that have answer related information, and using a separate cross-attention attention block in the decoder which attends to the answer. We observe that answer prompting without any additional modes obtains the best scores across rouge, meteor scores. Additionally, we use a custom metric to calculate how many of the generated questions have the same answer, as the answer which is used to generate them.
This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies. Prior literature on bankruptcy prediction mainly focuses on developing more sophisticated prediction methodologies with financial variables. However, in our study, we focus on improving the quality of input dataset. Specifically, we employ BERT model to perform sentiment analysis on MD&A disclosures. We show that BERT outperforms dictionary-based predictions and Word2Vec-based predictions in terms of adjusted R-square in logistic regression, k-nearest neighbor (kNN-5), and linear kernel support vector machine (SVM). Further, instead of pre-training the BERT model from scratch, we apply self-learning with confidence-based filtering to corporate disclosure data (10-K). We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.
Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that supports both reconstruction and classification losses, or any combination of the two. This approach ensures that generated audio can match its surrounding context, or conform to a class distribution or latent representation specified relative to any suitable pre-trained classifier or embedding model. Audio samples are available at //machinelearning.apple.com/research/controllable-music
Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i.e., batch-wise dynamic attention-masking and triple-wise calibration). Experimental results on three datasets (i.e., NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines.
This paper explores meta-learning in sequential recommendation to alleviate the item cold-start problem. Sequential recommendation aims to capture user's dynamic preferences based on historical behavior sequences and acts as a key component of most online recommendation scenarios. However, most previous methods have trouble recommending cold-start items, which are prevalent in those scenarios. As there is generally no side information in the setting of sequential recommendation task, previous cold-start methods could not be applied when only user-item interactions are available. Thus, we propose a Meta-learning-based Cold-Start Sequential Recommendation Framework, namely Mecos, to mitigate the item cold-start problem in sequential recommendation. This task is non-trivial as it targets at an important problem in a novel and challenging context. Mecos effectively extracts user preference from limited interactions and learns to match the target cold-start item with the potential user. Besides, our framework can be painlessly integrated with neural network-based models. Extensive experiments conducted on three real-world datasets verify the superiority of Mecos, with the average improvement up to 99%, 91%, and 70% in HR@10 over state-of-the-art baseline methods.
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.