Many engineering problems involve solving large linear systems of equations. Conjugate gradient (CG) is one of the most popular iterative methods for solving such systems. However, CG typically requires a good preconditioner to speed up convergence. One such preconditioner is the sparse approximate inverse (SPAI). In this paper, we explore the computation of an SPAI on quantum annealing machines by solving a series of quadratic unconstrained binary optimization (QUBO) problems. Numerical experiments are conducted using both well-conditioned and poorly-conditioned linear systems arising from a 2D finite difference formulation of the Poisson problem.
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and on only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter $u\in\mathbb{R}^d$, an $\ell_q$ penalty term, $\Vert u\Vert_q$, is usually added to the objective function. What is the probabilistic distribution corresponding to such $\ell_q$ penalty? What is the correct stochastic process corresponding to $\Vert u\Vert_q$ when we model functions $u\in L^q$? This is important for statistically modeling large dimensional objects, e.g. images, with penalty to preserve certainty properties, e.g. edges in the image. In this work, we generalize the $q$-exponential distribution (with density proportional to) $\exp{(- \frac{1}{2}|u|^q)}$ to a stochastic process named $Q$-exponential (Q-EP) process that corresponds to the $L_q$ regularization of functions. The key step is to specify consistent multivariate $q$-exponential distributions by choosing from a large family of elliptic contour distributions. The work is closely related to Besov process which is usually defined by the expanded series. Q-EP can be regarded as a definition of Besov process with explicit probabilistic formulation and direct control on the correlation length. From the Bayesian perspective, Q-EP provides a flexible prior on functions with sharper penalty ($q<2$) than the commonly used Gaussian process (GP). We compare GP, Besov and Q-EP in modeling functional data, reconstructing images, and solving inverse problems and demonstrate the advantage of our proposed methodology.
We give near-optimal algorithms for computing an ellipsoidal rounding of a convex polytope whose vertices are given in a stream. The approximation factor is linear in the dimension (as in John's theorem) and only loses an excess logarithmic factor in the aspect ratio of the polytope. Our algorithms are nearly optimal in two senses: first, their runtimes nearly match those of the most efficient known algorithms for the offline version of the problem. Second, their approximation factors nearly match a lower bound we show against a natural class of geometric streaming algorithms. In contrast to existing works in the streaming setting that compute ellipsoidal roundings only for centrally symmetric convex polytopes, our algorithms apply to general convex polytopes. We also show how to use our algorithms to construct coresets from a stream of points that approximately preserve both the ellipsoidal rounding and the convex hull of the original set of points.
Through the Bayesian lens of data assimilation, uncertainty on model parameters is traditionally quantified through the posterior covariance matrix. However, in modern settings involving high-dimensional and computationally expensive forward models, posterior covariance knowledge must be relaxed to deterministic or stochastic approximations. In the carbon flux inversion literature, Chevallier et al. proposed a stochastic method capable of approximating posterior variances of linear functionals of the model parameters that is particularly well-suited for large-scale Earth-system data assimilation tasks. This note formalizes this algorithm and clarifies its properties. We provide a formal statement of the algorithm, demonstrate why it converges to the desired posterior variance quantity of interest, and provide additional uncertainty quantification allowing incorporation of the Monte Carlo sampling uncertainty into the method's Bayesian credible intervals. The methodology is demonstrated using toy simulations and a realistic carbon flux inversion observing system simulation experiment.
Rational Identity Testing (RIT) is the decision problem of determining whether or not a noncommutative rational formula computes zero in the free skew field. It admits a deterministic polynomial-time white-box algorithm [Garg, Gurvits, Oliveira, and Wigderson (2016); Ivanyos, Qiao, Subrahmanyam (2018); Hamada and Hirai (2021)], and a randomized polynomial-time algorithm [Derksen and Makam (2017)] in the black-box setting, via singularity testing of linear matrices over the free skew field. Indeed, a randomized NC algorithm for RIT in the white-box setting follows from the result of Derksen and Makam (2017). Designing an efficient deterministic black-box algorithm for RIT and understanding the parallel complexity of RIT are major open problems in this area. Despite being open since the work of Garg, Gurvits, Oliveira, and Wigderson (2016), these questions have seen limited progress. In fact, the only known result in this direction is the construction of a quasipolynomial-size hitting set for rational formulas of only inversion height two [Arvind, Chatterjee, Mukhopadhyay (2022)]. In this paper, we significantly improve the black-box complexity of this problem and obtain the first quasipolynomial-size hitting set for all rational formulas of polynomial size. Our construction also yields the first deterministic quasi-NC upper bound for RIT in the white-box setting.
Brain modulation is a modification process of brain activity through external stimulations. However, which condition can induce the activation is still unclear. Therefore, we aimed to identify brain activation conditions using 40 Hz monaural beat (MB). Under this stimulation, auditory sense status which is determined by frequency and power range is the condition to consider. Hence, we designed five sessions to compare; no stimulation, audible (AB), inaudible in frequency, inaudible in power, and inaudible in frequency and power. Ten healthy participants underwent each stimulation session for ten minutes with electroencephalogram (EEG) recording. For analysis, we calculated the power spectral density (PSD) of EEG for each session and compared them in frequency, time, and five brain regions. As a result, we observed the prominent power peak at 40 Hz in only AB. The induced EEG amplitude increase started at one minute and increased until the end of the session. These results of AB had significant differences in frontal, central, temporal, parietal, and occipital regions compared to other stimulations. From the statistical analysis, the PSD of the right temporal region was significantly higher than the left. We figure out the role that the auditory sense is important to lead brain activation. These findings help to understand the neurophysiological principle and effects of auditory stimulation.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.