Leveraging biased click data for optimizing learning to rank systems has been a popular approach in information retrieval. Because click data is often noisy and biased, a variety of methods have been proposed to construct unbiased learning to rank (ULTR) algorithms for the learning of unbiased ranking models. Among them, automatic unbiased learning to rank (AutoULTR) algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their differences in theories and algorithm design, existing studies on ULTR usually use uni-variate ranking functions to score each document or result independently. On the other hand, recent advances in context-aware learning-to-rank models have shown that multivariate scoring functions, which read multiple documents together and predict their ranking scores jointly, are more powerful than uni-variate ranking functions in ranking tasks with human-annotated relevance labels. Whether such superior performance would hold in ULTR with noisy data, however, is mostly unknown. In this paper, we investigate existing multivariate scoring functions and AutoULTR algorithms in theory and prove that permutation invariance is a crucial factor that determines whether a context-aware learning-to-rank model could be applied to existing AutoULTR framework. Our experiments with synthetic clicks on two large-scale benchmark datasets show that AutoULTR models with permutation-invariant multivariate scoring functions significantly outperform those with uni-variate scoring functions and permutation-variant multivariate scoring functions.
For a given linear algebra problem, we consider those solution algorithms that are mathematically equivalent to one another, and that mostly consist of a sequence of calls to kernels from optimized libraries such as BLAS and LAPACK. Although equivalent (at least in exact precision), those algorithms typically exhibit significant differences in terms of performance, and naturally, we are interested in finding the fastest one(s). In practice, we often observe that multiple algorithms yield comparable performance characteristics. Therefore, we aim to identify the subset of algorithms that are reliably faster than the rest. To this end, instead of quantifying the performance of an algorithm in absolute terms, we present a measurement-based approach that assigns a relative score to the algorithms in comparison to one another. The relative performance is encoded by sorting the algorithms based on pair-wise comparisons and ranking them into equivalence classes, where more than one algorithm can obtain the same rank. We show that the relative performance leads to robust identification of the fastest algorithms, that is, reliable identifications even with noisy system conditions
This paper is concerned with the factorization and equivalence problems of multivariate polynomial matrices. We present some new criteria for the existence of matrix factorizations for a class of multivariate polynomial matrices, and obtain a necessary and sufficient condition for the equivalence of a square polynomial matrix and a diagonal matrix. Based on the constructive proof of the new criteria, we give a factorization algorithm and prove the uniqueness of the factorization. We implement the algorithm on Maple, and two illustrative examples are given to show the effectiveness of the algorithm.
This paper is concerned with factor left prime factorization problems for multivariate polynomial matrices without full row rank. We propose a necessary and sufficient condition for the existence of factor left prime factorizations of a class of multivariate polynomial matrices, and then design an algorithm to compute all factor left prime factorizations if they exist. We implement the algorithm on the computer algebra system Maple, and two examples are given to illustrate the effectiveness of the algorithm. The results presented in this paper are also true for the existence of factor right prime factorizations of multivariate polynomial matrices without full column rank.
Response selection plays a vital role in building retrieval-based conversation systems. Despite that response selection is naturally a learning-to-rank problem, most prior works take a point-wise view and train binary classifiers for this task: each response candidate is labeled either relevant (one) or irrelevant (zero). On the one hand, this formalization can be sub-optimal due to its ignorance of the diversity of response quality. On the other hand, annotating grayscale data for learning-to-rank can be prohibitively expensive and challenging. In this work, we show that grayscale data can be automatically constructed without human effort. Our method employs off-the-shelf response retrieval models and response generation models as automatic grayscale data generators. With the constructed grayscale data, we propose multi-level ranking objectives for training, which can (1) teach a matching model to capture more fine-grained context-response relevance difference and (2) reduce the train-test discrepancy in terms of distractor strength. Our method is simple, effective, and universal. Experiments on three benchmark datasets and four state-of-the-art matching models show that the proposed approach brings significant and consistent performance improvements.
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series that linearly transforms the observed series into several groups of lower-dimensional multivariate subseries. These multivariate subseries have non-zero spectral coherence among components within a group but have zero spectral coherence among components across groups. The observed series is expressed as a sum of frequency components whose variances are proportional to the spectral matrices at the respective frequencies. The demixing matrix is then estimated using an eigendecomposition on the sum of the variance matrices of these frequency components and its asymptotic properties are derived. Finally, a consistent test on the cross-spectrum of pairs of components is used to find the desired segmentation into the lower-dimensional subseries. The numerical performance of the proposed method is illustrated through simulation examples and an application to modeling and forecasting wind data is presented.
Frequentist statistical methods, such as hypothesis testing, are standard practice in papers that provide benchmark comparisons. Unfortunately, these frequentist tools have often been misused, without testing for the statistical test assumptions, without control for family-wise errors in multiple group comparisons, among several other problems. Bayesian Data Analysis (BDA) addresses many of the previously mentioned shortcomings but its use is not widely spread in the analysis of empirical data in the evolutionary computing community. This paper provides three main contributions. First, we motivate the need for utilizing Bayesian data analysis and provide an overview to this topic. Second, we discuss the practical aspects of BDA to ensure that our models are valid and the results transparent. Finally, we provide five statistical models that can be used to answer multiple research questions. The online appendix provides a step-by-step guide on how to perform the analysis of the models discussed in this paper, including the code for the statistical models, the data transformations and the discussed tables and figures.
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.
Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.
The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks: (i) the existing works have not paid much attention to the importance of different types of sentiment terms, which is an important concept in this area; and (ii) the loss function currently employed does not well reflect the degree of error of sentiment misclassification. To overcome such problem, we propose to combine domain knowledge with deep learning. Our proposal includes using sentiment scores, learnt by regression, to augment training data; and introducing penalty matrix for enhancing the loss function of cross entropy. When experimented, we achieved a significant improvement in classification results.
This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected,WikiQA, SelQA, SQuAD, and InfoQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and triggering. An indexing-based method for the creation of a silver-standard dataset for answer retrieval using the entire Wikipedia is also presented. Our analysis shows the uniqueness of these corpora and suggests a better use of them for statistical question answering learning.