We resurrect the infamous harmonic mean estimator for computing the marginal likelihood (Bayesian evidence) and solve its problematic large variance. The marginal likelihood is a key component of Bayesian model selection to evaluate model posterior probabilities; however, its computation is challenging. The original harmonic mean estimator, first proposed by Newton and Raftery in 1994, involves computing the harmonic mean of the likelihood given samples from the posterior. It was immediately realised that the original estimator can fail catastrophically since its variance can become very large (possibly not finite). A number of variants of the harmonic mean estimator have been proposed to address this issue although none have proven fully satisfactory. We present the \emph{learnt harmonic mean estimator}, a variant of the original estimator that solves its large variance problem. This is achieved by interpreting the harmonic mean estimator as importance sampling and introducing a new target distribution. The new target distribution is learned to approximate the optimal but inaccessible target, while minimising the variance of the resulting estimator. Since the estimator requires samples of the posterior only, it is agnostic to the sampling strategy used. We validate the estimator on a variety of numerical experiments, including a number of pathological examples where the original harmonic mean estimator fails catastrophically. We also consider a cosmological application, where our approach leads to $\sim$ 3 to 6 times more samples than current state-of-the-art techniques in 1/3 of the time. In all cases our learnt harmonic mean estimator is shown to be highly accurate. The estimator is computationally scalable and can be applied to problems of dimension $O(10^3)$ and beyond. Code implementing the learnt harmonic mean estimator is made publicly available
The rise of AI in human contexts places new demands on automated systems to be transparent and explainable. We examine some anthropomorphic ideas and principles relevant to such accountablity in order to develop a theoretical framework for thinking about digital systems in complex human contexts and the problem of explaining their behaviour. Structurally, systems are made of modular and hierachical components, which we abstract in a new system model using notions of modes and mode transitions. A mode is an independent component of the system with its own objectives, monitoring data, and algorithms. The behaviour of a mode, including its transitions to other modes, is determined by functions that interpret each mode's monitoring data in the light of its objectives and algorithms. We show how these belief functions can help explain system behaviour by visualising their evaluation as trajectories in higher-dimensional geometric spaces. These ideas are formalised mathematically by abstract and concrete simplicial complexes. We offer three techniques: a framework for design heuristics, a general system theory based on modes, and a geometric visualisation, and apply them in three types of human-centred systems.
Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in four successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.9% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda.
Best possible self (BPS) is a positive psychological intervention shown to enhance well-being which involves writing a description of an ideal future scenario. This paper presents a comparison of psychophysiological effects of a BPS activity that has been adapted for classroom settings and a time-matched control activity (NA). Thirty-three undergraduate students participated in the study that assessed state anxiety (State-Trait Anxiety Inventory, STAI), affect (Affective Slider, AS), and cardiac vagal activity (heart-rate variability, HRV) as an indicator of self-regulatory resource usage, at three time periods (PRE, DURING, POST). Results show that BPS led to a significantly greater increase in positive valence (DURING) and overall higher levels of cardiac vagal activity (HRV) compared to NA. These findings suggest that BPS has promising characteristics as a self-regulatory technique aimed at fostering positive affect and positively impacting self-regulatory resources. As BPS does not require expert knowledge nor specialized technology to administer, it may be a suitable activity for educators to use when teaching and having students practice self-regulation. This study presents evidence collected in a replicable multimodal approach of the self-regulatory effects of a brief BPS activity on undergraduate students.
We study the Euler scheme for scalar non-autonomous stochastic differential equations, whose diffusion coefficient is not globally Lipschitz but a fractional power of a globally Lipschitz function. We analyse the strong error and establish a criterion, which relates the convergence order of the Euler scheme to an inverse moment condition for the diffusion coefficient. Our result in particular applies to Cox-Ingersoll-Ross-, Chan-Karolyi-Longstaff-Sanders- or Wright-Fisher-type stochastic differential equations and thus provides a unifying framework.
It is well-known that a multilinear system with a nonsingular M-tensor and a positive right-hand side has a unique positive solution. Tensor splitting methods generalizing the classical iterative methods for linear systems have been proposed for finding the unique positive solution. The Alternating Anderson-Richardson (AAR) method is an effective method to accelerate the classical iterative methods. In this study, we apply the idea of AAR for finding the unique positive solution quickly. We first present a tensor Richardson method based on tensor regular splittings, then apply Anderson acceleration to the tensor Richardson method and derive a tensor Anderson-Richardson method, finally, we periodically employ the tensor Anderson-Richardson method within the tensor Richardson method and propose a tensor AAR method. Numerical experiments show that the proposed method is effective in accelerating tensor splitting methods.
This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies, specifically Cardano, Binance, Bitcoin, Dogecoin, Ethereum, Fantom, Matic, Shiba, and Ripple. Our primary objective was to conduct a psycholinguistic and emotion analysis of social media content associated with these cryptocurrencies. To enable investigators to make more informed decisions. The study involved comparing linguistic characteristics across the diverse digital coins, shedding light on the distinctive linguistic patterns that emerge within each coin's community. To achieve this, we utilized advanced text analysis techniques. Additionally, our work unveiled an intriguing Understanding of the interplay between these digital assets within the cryptocurrency community. By examining which coin pairs are mentioned together most frequently in the dataset, we established correlations between different cryptocurrencies. To ensure the reliability of our findings, we initially gathered a total of 832,559 tweets from Twitter. These tweets underwent a rigorous preprocessing stage, resulting in a refined dataset of 115,899 tweets that were used for our analysis. Overall, our research offers valuable Perception into the linguistic nuances of various digital coins' online communities and provides a deeper understanding of their interactions in the cryptocurrency space.
The numerical solution of continuum damage mechanics (CDM) problems suffers from convergence-related challenges during the material softening stage, and consequently existing iterative solvers are subject to a trade-off between computational expense and solution accuracy. In this work, we present a novel unified arc-length (UAL) method, and we derive the formulation of the analytical tangent matrix and governing system of equations for both local and non-local gradient damage problems. Unlike existing versions of arc-length solvers that monolithically scale the external force vector, the proposed method treats the latter as an independent variable and determines the position of the system on the equilibrium path based on all the nodal variations of the external force vector. This approach renders the proposed solver substantially more efficient and robust than existing solvers used in CDM problems. We demonstrate the considerable advantages of the proposed algorithm through several benchmark 1D problems with sharp snap-backs and 2D examples under various boundary conditions and loading scenarios. The proposed UAL approach exhibits a superior ability of overcoming critical increments along the equilibrium path. Moreover, in the presented examples, the proposed UAL method is 1-2 orders of magnitude faster than force-controlled arc-length and monolithic Newton-Raphson solvers.
We consider scalar semilinear elliptic PDEs, where the nonlinearity is strongly monotone, but only locally Lipschitz continuous. To linearize the arising discrete nonlinear problem, we employ a damped Zarantonello iteration, which leads to a linear Poisson-type equation that is symmetric and positive definite. The resulting system is solved by a contractive algebraic solver such as a multigrid method with local smoothing. We formulate a fully adaptive algorithm that equibalances the various error components coming from mesh refinement, iterative linearization, and algebraic solver. We prove that the proposed adaptive iteratively linearized finite element method (AILFEM) guarantees convergence with optimal complexity, where the rates are understood with respect to the overall computational cost (i.e., the computational time). Numerical experiments investigate the involved adaptivity parameters.
The variational quantum algorithms are crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called variational quantum state diagonalization method, which constitutes an important algorithmic subroutine and can be used directly to work with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning (RL). We use a novel encoding method for the RL-state, a dense reward function, and an $\epsilon$-greedy policy to achieve this. We demonstrate that the circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm and thus can be used in situations where hardware capabilities limit the depth of quantum circuits. The methods we propose in the paper can be readily adapted to address a wide range of variational quantum algorithms.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.