We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d \log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\sqrt{T}$ order regret. Ours is also the first result that allows infinite action sets.
The distribution for the minimum of Brownian motion or the Cauchy process is well-known using the reflection principle. Here we consider the problem of finding the sample-by-sample minimum, which we call the online minimum search. We consider the possibility of the golden search method, but we show quantitatively that the bisection method is more efficient. In the bisection method there is a hierarchical parameter, which tunes the depth to which each sub-search is conducted, somewhat similarly to how a depth-first search works to generate a topological ordering on nodes. Finally, we consider the possibility of using harmonic measure, which is a novel idea that has so far been unexplored.
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
We introduce two iterative methods, GPBiLQ and GPQMR, for solving unsymmetric partitioned linear systems. The basic mechanism underlying GPBiLQ and GPQMR is a novel simultaneous tridiagonalization via biorthogonality that allows for short-recurrence iterative schemes. Similar to the biconjugate gradient method, it is possible to develop another method, GPBiCG, whose iterate (if it exists) can be obtained inexpensively from the GPBiLQ iterate. Whereas the iterate of GPBiCG may not exist, the iterates of GPBiLQ and GPQMR are always well defined as long as the biorthogonal tridiagonal reduction process does not break down. We discuss connections between the proposed methods and some existing methods, and give numerical experiments to illustrate the performance of the proposed methods.
The Levin method is a well-known technique for evaluating oscillatory integrals, which operates by solving a certain ordinary differential equation in order to construct an antiderivative of the integrand. It was long believed that this approach suffers from "low-frequency breakdown," meaning that the accuracy of the calculated value of the integral deteriorates when the integrand is only slowly oscillating. Recently presented experimental evidence, however, suggests that if a Chebyshev spectral method is used to discretize the differential equation and the resulting linear system is solved via a truncated singular value decomposition, then no low-frequency breakdown occurs. Here, we provide a proof that this is the case, and our proof applies not only when the integrand is slowly oscillating, but even in the case of stationary points. Our result puts adaptive schemes based on the Levin method on a firm theoretical foundation and accounts for their behavior in the presence of stationary points. We go on to point out that by combining an adaptive Levin scheme with phase function methods for ordinary differential equations, a large class of oscillatory integrals involving special functions, including products of such functions and the compositions of such functions with slowly-varying functions, can be easily evaluated without the need for symbolic computations. Finally, we present the results of numerical experiments which illustrate the consequences of our analysis and demonstrate the properties of the adaptive Levin method.
In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions, that extends the classical Bayes one, for estimating the law of an $n$-sample. The loss functions we have in mind are based on the total variation and Hellinger distances as well as some $\mathbb{L}_{j}$-ones. We prove that, with a probability close to one, this new posterior distribution concentrates its mass in a neighbourhood of the law of the data, for the chosen loss function, provided that this law belongs to the support of the prior or, at least, lies close enough to it. We therefore establish that the new posterior distribution enjoys some robustness properties with respect to a possible misspecification of the prior, or more precisely, its support. For the total variation and squared Hellinger losses, we also show that the posterior distribution keeps its concentration properties when the data are only independent, hence not necessarily i.i.d., provided that most of their marginals or the average of these are close enough to some probability distribution around which the prior puts enough mass. The posterior distribution is therefore also stable with respect to the equidistribution assumption. We illustrate these results by several applications. We consider the problems of estimating a location parameter or both the location and the scale of a density in a nonparametric framework. Finally, we also tackle the problem of estimating a density, with the squared Hellinger loss, in a high-dimensional parametric model under some sparsity conditions. The results established in this paper are non-asymptotic and provide, as much as possible, explicit constants.
In this contribution we deal with Gaussian quadrature rules based on orthogonal polynomials associated with a weight function $w(x)= x^{\alpha} e^{-x}$ supported on an interval $(0,z)$, $z>0.$ The modified Chebyshev algorithm is used in order to test the accuracy in the computation of the coefficients of the three-term recurrence relation, the zeros and weights, as well as the dependence on the parameter $z.$
In many jurisdictions, forensic evidence is presented in the form of categorical statements by forensic experts. Several large-scale performance studies have been performed that report error rates to elucidate the uncertainty associated with such categorical statements. There is growing scientific consensus that the likelihood ratio (LR) framework is the logically correct form of presentation for forensic evidence evaluation. Yet, results from the large-scale performance studies have not been cast in this framework. Here, I show how to straightforwardly calculate an LR for any given categorical statement using data from the performance studies. This number quantifies how much more we should believe the hypothesis of same source vs different source, when provided a particular expert witness statement. LRs are reported for categorical statements resulting from the analysis of latent fingerprints, bloodstain patterns, handwriting, footwear and firearms. The highest LR found for statements of identification was 376 (fingerprints), the lowest found for statements of exclusion was 1/28 (handwriting). The LRs found may be more insightful for those used to this framework than the various error rates reported previously. An additional advantage of using the LR in this way is the relative simplicity; there are no decisions necessary on what error rate to focus on or how to handle inconclusive statements. The values found are closer to 1 than many would have expected. One possible explanation for this mismatch is that we undervalue numerical LRs. Finally, a note of caution: the LR values reported here come from a simple calculation that does not do justice to the nuances of the large-scale studies and their differences to casework, and should be treated as ball-park figures rather than definitive statements on the evidential value of whole forensic scientific fields.
We consider wave scattering from a system of highly contrasting resonators with time-modulated material parameters. In this setting, the wave equation reduces to a system of coupled Helmholtz equations that models the scattering problem. We consider the one-dimensional setting. In order to understand the energy of the system, we prove a novel higher-order discrete, capacitance matrix approximation of the subwavelength resonant quasifrequencies. Further, we perform numerical experiments to support and illustrate our analytical results and show how periodically time-dependent material parameters affect the scattered wave field.
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author. Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN. In general, they look more smooth than that of Newton's method. In this paper, we continue to experimentally explore in depth this remarkable phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link poses a couple of challenging puzzles to be explained. Experiments also indicate that BNQN is more robust against random perturbations than Newton's method and Random Relaxed Newton's method.
Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e.g., features) from training classes (i.e., seen classes) to unseen classes. However, the priors adopted by the existing methods are relatively limited with incomplete semantics. In this paper, we explore richer and more competitive prior knowledge to model the inter-class relationship for ZSL via ontology-based knowledge representation and semantic embedding. Meanwhile, to address the data imbalance between seen classes and unseen classes, we developed a generative ZSL framework with Generative Adversarial Networks (GANs). Our main findings include: (i) an ontology-enhanced ZSL framework that can be applied to different domains, such as image classification (IMGC) and knowledge graph completion (KGC); (ii) a comprehensive evaluation with multiple zero-shot datasets from different domains, where our method often achieves better performance than the state-of-the-art models. In particular, on four representative ZSL baselines of IMGC, the ontology-based class semantics outperform the previous priors e.g., the word embeddings of classes by an average of 12.4 accuracy points in the standard ZSL across two example datasets (see Figure 4).