The illness-death model for chronic conditions is combined with a renewal equation for the number of newborns taking into account possibly different fertility rates in the healthy and diseased parts of the population. The resulting boundary value problem consists of a system of partial differential equations with an integral boundary condition. As an application, the boundary value problem is applied to an example about type 2 diabetes.
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
We study least-squares trace regression when the parameter is the sum of a $r$-low-rank matrix and a $s$-sparse matrix and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions and feature-dependent noise, we highlight three needed design properties, each one derived from a different process inequality: a "product process inequality", "Chevet's inequality" and a "multiplier process inequality". These properties handle, simultaneously, additive decomposition, label contamination and design-noise interaction. They imply the near-optimality of a tractable estimator with respect to the effective dimensions $d_{eff,r}$ and $d_{eff,s}$ of the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. The near-optimal rate is $\mathsf{r}(n,d_{eff,r}) + \mathsf{r}(n,d_{eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon)$, where $\mathsf{r}(n,d_{eff,r})+\mathsf{r}(n,d_{eff,s})$ is the optimal rate in average with no contamination. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Without matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Our estimators are based on "sorted" versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of "sorted" Huber's losses over the classical Huber's loss.
The present work is devoted to strong approximations of a generalized Ait-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size h>0. A mean-square convergence rate of order 0.5 is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. To the best of our knowledge, this is the first paper to propose an unconditionally positivity preserving explicit scheme with order 1/2 of mean-square convergence for the model. Numerical experiments are finally provided to confirm the theoretical findings.
We propose a finite difference scheme for the numerical solution of a two-dimensional singularly perturbed convection-diffusion partial differential equation whose solution features interacting boundary and interior layers, the latter due to discontinuities in source term. The problem is posed on the unit square. The second derivative is multiplied by a singular perturbation parameter, $\epsilon$, while the nature of the first derivative term is such that flow is aligned with a boundary. These two facts mean that solutions tend to exhibit layers of both exponential and characteristic type. We solve the problem using a finite difference method, specially adapted to the discontinuities, and applied on a piecewise-uniform (Shishkin). We prove that that the computed solution converges to the true one at a rate that is independent of the perturbation parameter, and is nearly first-order. We present numerical results that verify that these results are sharp.
There is a folkloric belief that a depth-$\Theta(m)$ quantum circuit is needed to estimate the trace of the product of $m$ density matrices (i.e., a multivariate trace), a subroutine crucial to applications in condensed matter and quantum information science. We prove that this belief is overly conservative by constructing a constant quantum-depth circuit for the task, inspired by the method of Shor error correction. Furthermore, our circuit demands only local gates in a two dimensional circuit -- we show how to implement it in a highly parallelized way on an architecture similar to that of Google's Sycamore processor. With these features, our algorithm brings the central task of multivariate trace estimation closer to the capabilities of near-term quantum processors. We instantiate the latter application with a theorem on estimating nonlinear functions of quantum states with "well-behaved" polynomial approximations.
A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the base policy is equal to $\log (n) - (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. We also explore the tightness of this upper bound in different regimes. Finally, we propose a new estimator for the KL divergence and empirically show that it provides a tight approximation through a few examples.
The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then employed to estimate the source model related parameters. In reality, however, the spectral coefficients from different frequency bands may be dependent, which is not considered in the existing ILRMA algorithm. This paper presents an improved version of ILRMA, which considers the dependency between the spectral coefficients from different frequency bands. The Sinkhorn divergence is then exploited to optimize the source model parameters. As a result of using the cross-band information, the BSS performance is improved. But the number of parameters to be estimated also increases significantly, and so is the computational complexity. To reduce the algorithm complexity, we apply the Kronecker product to decompose the modeling matrix into the product of a number of matrices of much smaller dimensionality. An efficient algorithm is then developed to implement the Sinkhorn divergence based BSS algorithm and the complexity is reduced by an order of magnitude.
We consider a time-fractional subdiffusion equation with a Caputo derivative in time, a general second-order elliptic spatial operator, and a right-hand side that is non-smooth in time. The presence of the latter may lead to locking problems in our time stepping procedure recently introduced in [2,4]. Hence, a generalized version of the residual barrier is proposed to rectify the issue. We also consider related alternatives to this generalized algorithm, and, furthermore, show that this new residual barrier may be useful in the case of a negative reaction coefficient.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.