Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of the complete data and the missing mask. But we empirically find that directly incorporating these statistical methods into deep generative models is sub-optimal. Specifically, it would neglect the confidence of the reconstructed mask during the MNAR imputation process, which leads to insufficient information extraction and less-guaranteed imputation quality. In this paper, we revisit the MNAR problem from a novel perspective that the complete data and missing mask are two modalities of incomplete data on an equal footing. Along with this line, we put forward a generative-model-specific joint probability decomposition method, conjunction model, to represent the distributions of two modalities in parallel and extract sufficient information from both complete data and missing mask. Taking a step further, we exploit a deep generative imputation model, namely GNR, to process the real-world missing mechanism in the latent space and concurrently impute the incomplete data and reconstruct the missing mask. The experimental results show that our GNR surpasses state-of-the-art MNAR baselines with significant margins (averagely improved from 9.9% to 18.8% in RMSE) and always gives a better mask reconstruction accuracy which makes the imputation more principle.
For Arithmetization-Oriented ciphers and hash functions Gr\"obner basis attacks are generally considered as the most competitive attack vector. Unfortunately, the complexity of Gr\"obner basis algorithms is only understood for special cases, and it is needless to say that these cases do not apply to most cryptographic polynomial systems. Therefore, cryptographers have to resort to experiments, extrapolations and hypotheses to assess the security of their designs. One established measure to quantify the complexity of linear algebra-based Gr\"obner basis algorithms is the so-called solving degree. Caminata \& Gorla revealed that under a certain genericity condition on a polynomial system the solving degree is always upper bounded by the Castelnuovo-Mumford regularity and henceforth by the Macaulay bound, which only takes the degrees and number of variables of the input polynomials into account. In this paper we extend their framework to iterated polynomial systems, the standard polynomial model for symmetric ciphers and hash functions. In particular, we prove solving degree bounds for various attacks on MiMC, Feistel-MiMC, Feistel-MiMC-Hash, Hades and GMiMC. Our bounds fall in line with the hypothesized complexity of Gr\"obner basis attacks on these designs, and to the best of our knowledge this is the first time that a mathematical proof for these complexities is provided. Moreover, by studying polynomials with degree falls we can prove lower bounds on the Castelnuovo-Mumford regularity for attacks on MiMC, Feistel-MiMC and Feistel-MiMC-Hash provided that only a few solutions of the corresponding iterated polynomial system originate from the base field. Hence, regularity-based solving degree estimations can never surpass a certain threshold, a desirable property for cryptographic polynomial systems.
Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.
In this article we consider Bayesian parameter inference for a type of partially observed stochastic Volterra equation (SVE). SVEs are found in many areas such as physics and mathematical finance. In the latter field they can be used to represent long memory in unobserved volatility processes. In many cases of practical interest, SVEs must be time-discretized and then parameter inference is based upon the posterior associated to this time-discretized process. Based upon recent studies on time-discretization of SVEs (e.g. Richard et al. 2021), we use Euler-Maruyama methods for the afore-mentioned discretization. We then show how multilevel Markov chain Monte Carlo (MCMC) methods (Jasra et al. 2018) can be applied in this context. In the examples we study, we give a proof that shows that the cost to achieve a mean square error (MSE) of $\mathcal{O}(\epsilon^2)$, $\epsilon>0$, is $\mathcal{O}(\epsilon^{-20/9})$. If one uses a single level MCMC method then the cost is $\mathcal{O}(\epsilon^{-38/9})$ to achieve the same MSE. We illustrate these results in the context of state-space and stochastic volatility models, with the latter applied to real data.
In this contribution, we are concerned with model order reduction in the context of iterative regularization methods for the solution of inverse problems arising from parameter identification in elliptic partial differential equations. Such methods typically require a large number of forward solutions, which makes the use of the reduced basis method attractive to reduce computational complexity. However, the considered inverse problems are typically ill-posed due to their infinite-dimensional parameter space. Moreover, the infinite-dimensional parameter space makes it impossible to build and certify classical reduced-order models efficiently in a so-called "offline phase". We thus propose a new algorithm that adaptively builds a reduced parameter space in the online phase. The enrichment of the reduced parameter space is naturally inherited from the Tikhonov regularization within an iteratively regularized Gau{\ss}-Newton method. Finally, the adaptive parameter space reduction is combined with a certified reduced basis state space reduction within an adaptive error-aware trust region framework. Numerical experiments are presented to show the efficiency of the combined parameter and state space reduction for inverse parameter identification problems with distributed reaction or diffusion coefficients.
In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a self-supervised pretrained model. To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.
Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.