Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.
In the field of unsupervised feature selection, sparse principal component analysis (SPCA) methods have attracted more and more attention recently. Compared to spectral-based methods, SPCA methods don't rely on the construction of a similarity matrix and show better feature selection ability on real-world data. The original SPCA formulates a nonconvex optimization problem. Existing convex SPCA methods reformulate SPCA as a convex model by regarding the reconstruction matrix as an optimization variable. However, they are lack of constraints equivalent to the orthogonality restriction in SPCA, leading to larger solution space. In this paper, it's proved that the optimal solution to a convex SPCA model falls onto the Positive Semidefinite (PSD) cone. A standard convex SPCA-based model with PSD constraint for unsupervised feature selection is proposed. Further, a two-step fast optimization algorithm via PSD projection is presented to solve the proposed model. Two other existing convex SPCA-based models are also proven to have their solutions optimized on the PSD cone in this paper. Therefore, the PSD versions of these two models are proposed to accelerate their convergence as well. We also provide a regularization parameter setting strategy for our proposed method. Experiments on synthetic and real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.
Performing a Bayesian inference on large spatio-temporal models requires extracting inverse elements of large sparse precision matrices for marginal variances. Although direct matrix factorizations can be used for the inversion, such methods fail to scale well for distributed problems when run on large computing clusters. On the contrary, Krylov subspace methods for the selected inversion have been gaining traction. We propose a parallel hybrid approach based on domain decomposition, which extends the Rao-Blackwellized Monte Carlo estimator for distributed precision matrices. Our approach exploits the strength of Krylov subspace methods as global solvers and efficiency of direct factorizations as base case solvers to compute the marginal variances using a divide-and-conquer strategy. By introducing subdomain overlaps, one can achieve a greater accuracy at an increased computational effort with little to no additional communication. We demonstrate the speed improvements on both simulated models and a massive US daily temperature data.
Until high-fidelity quantum computers with a large number of qubits become widely available, classical simulation remains a vital tool for algorithm design, tuning, and validation. We present a simulator for the Quantum Approximate Optimization Algorithm (QAOA). Our simulator is designed with the goal of reducing the computational cost of QAOA parameter optimization and supports both CPU and GPU execution. Our central observation is that the computational cost of both simulating the QAOA state and computing the QAOA objective to be optimized can be reduced by precomputing the diagonal Hamiltonian encoding the problem. We reduce the time for a typical QAOA parameter optimization by eleven times for $n = 26$ qubits compared to a state-of-the-art GPU quantum circuit simulator based on cuQuantum. Our simulator is available on GitHub: //github.com/jpmorganchase/QOKit
A numerical search approach is used to design high-order diagonally implicit Runge-Kutta (DIRK) schemes equipped with embedded error estimators, some of which have identical diagonal elements (SDIRK) and explicit first stage (ESDIRK). In each of these classes, we present new A-stable schemes of order six (the highest order of previously known A-stable DIRK-type schemes) up to order eight. For each order, we include one scheme that is only A-stable as well as schemes that are L-stable, stiffly accurate, and/or have stage order two. The latter types require more stages, but give better convergence rates for differential-algebraic equations (DAEs), and those which have stage order two give better accuracy for moderately stiff problems. The development of the eighth-order schemes requires, in addition to imposing A-stability, finding highly accurate numerical solutions for a system of 200 equations in over 100 variables, which is accomplished via a combination of global and local optimization strategies. The accuracy, stability, and adaptive stepsize control of the schemes are demonstrated on diverse problems.
The Function-as-a-service (FaaS) computing model has recently seen significant growth especially for highly scalable, event-driven applications. The easy-to-deploy and cost-efficient fine-grained billing of FaaS is highly attractive to big data applications. However, the stateless nature of serverless platforms poses major challenges when supporting stateful I/O intensive workloads such as a lack of native support for stateful execution, state sharing, and inter-function communication. In this paper, we explore the feasibility of performing stateful big data analytics on serverless platforms and improving I/O throughput of functions by using modern storage technologies such as Intel Optane DC Persistent Memory (PMEM). To this end, we propose Marvel, an end-to-end architecture built on top of the popular serverless platform, Apache OpenWhisk and Apache Hadoop. Marvel makes two main contributions: (1) enable stateful function execution on OpenWhisk by maintaining state information in an in-memory caching layer; and (2) provide access to PMEM backed HDFS storage for faster I/O performance. Our evaluation shows that Marvel reduces the overall execution time of big data applications by up to 86.6% compared to current MapReduce implementations on AWS Lambda.
The investigation of the similarity between artists and music is crucial in music retrieval and recommendation, and addressing the challenge of the long-tail phenomenon is increasingly important. This paper proposes a Long-Tail Friendly Representation Framework (LTFRF) that utilizes neural networks to model the similarity relationship. Our approach integrates music, user, metadata, and relationship data into a unified metric learning framework, and employs a meta-consistency relationship as a regular term to introduce the Multi-Relationship Loss. Compared to the Graph Neural Network (GNN), our proposed framework improves the representation performance in long-tail scenarios, which are characterized by sparse relationships between artists and music. We conduct experiments and analysis on the AllMusic dataset, and the results demonstrate that our framework provides a favorable generalization of artist and music representation. Specifically, on similar artist/music recommendation tasks, the LTFRF outperforms the baseline by 9.69%/19.42% in Hit Ratio@10, and in long-tail cases, the framework achieves 11.05%/14.14% higher than the baseline in Consistent@10.
Variational inference, such as the mean-field (MF) approximation, requires certain conjugacy structures for efficient computation. These can impose unnecessary restrictions on the viable prior distribution family and further constraints on the variational approximation family. In this work, we introduce a general computational framework to implement MF variational inference for Bayesian models, with or without latent variables, using the Wasserstein gradient flow (WGF), a modern mathematical technique for realizing a gradient flow over the space of probability measures. Theoretically, we analyze the algorithmic convergence of the proposed approaches, providing an explicit expression for the contraction factor. We also strengthen existing results on MF variational posterior concentration from a polynomial to an exponential contraction, by utilizing the fixed point equation of the time-discretized WGF. Computationally, we propose a new constraint-free function approximation method using neural networks to numerically realize our algorithm. This method is shown to be more precise and efficient than traditional particle approximation methods based on Langevin dynamics.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.