Various noise models have been developed in quantum computing study to describe the propagation and effect of the noise which is caused by imperfect implementation of hardware. Identifying parameters such as gate and readout error rates are critical to these models. We use a Bayesian inference approach to identity posterior distributions of these parameters, such that they can be characterized more elaborately. By characterizing the device errors in this way, we can further improve the accuracy of quantum error mitigation. Experiments conducted on IBM's quantum computing devices suggest that our approach provides better error mitigation performance than existing techniques used by the vendor. Also, our approach outperforms the standard Bayesian inference method in such experiments.
Wireless systems must be resilient to jamming attacks. Existing mitigation methods require knowledge of the jammer's transmit characteristics. However, this knowledge may be difficult to acquire, especially for smart jammers that attack only specific instants during transmission in order to evade mitigation. We propose a novel method that mitigates attacks by smart jammers on massive multi-user multiple-input multiple-output (MU-MIMO) basestations (BSs). Our approach builds on recent progress in joint channel estimation and data detection (JED) and exploits the fact that a jammer cannot change its subspace within a coherence interval. Our method, called MAED (short for MitigAtion, Estimation, and Detection), uses a novel problem formulation that combines jammer estimation and mitigation, channel estimation, and data detection, instead of separating these tasks. We solve the problem approximately with an efficient iterative algorithm. Our results show that MAED effectively mitigates a wide range of smart jamming attacks without having any a priori knowledge about the attack type.
In the present paper non-convex multi-objective parameter optimization problems are considered which are governed by elliptic parametrized partial differential equations (PDEs). To solve these problems numerically the Pascoletti-Serafini scalarization is applied and the obtained scalar optimization problems are solved by an augmented Lagrangian method. However, due to the PDE constraints, the numerical solution is very expensive so that a model reduction is utilized by using the reduced basis (RB) method. The quality of the RB approximation is ensured by a trust-region strategy which does not require any offline procedure, where the RB functions are computed in a greedy algorithm. Moreover, convergence of the proposed method is guaranteed. Numerical examples illustrate the efficiency of the proposed solution technique.
In this work we propose and unify classes of different models for information propagation over graphs. In a first class, propagation is modeled as a wave which emanates from a set of known nodes at an initial time, to all other unknown nodes at later times with an ordering determined by the time at which the information wave front reaches nodes. A second class of models is based on the notion of a travel time along paths between nodes. The time of information propagation from an initial known set of nodes to a node is defined as the minimum of a generalized travel time over subsets of all admissible paths. A final class is given by imposing a local equation of an eikonal form at each unknown node, with boundary conditions at the known nodes. The solution value of the local equation at a node is coupled the neighbouring nodes with smaller solution values. We provide precise formulations of the model classes in this graph setting, and prove equivalences between them. Motivated by the connection between first arrival time model and the eikonal equation in the continuum setting, we demonstrate that for graphs in the particular form of grids in Euclidean space mean field limits under grid refinement of certain graph models lead to Hamilton-Jacobi PDEs. For a specific parameter setting, we demonstrate that the solution on the grid approximates the Euclidean distance.
Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.
We show that solution to the Hermite-Pad\'{e} type I approximation problem leads in a natural way to a subclass of solutions of the Hirota (discrete Kadomtsev-Petviashvili) system and of its adjoint linear problem. Our result explains the appearence of various ingredients of the integrable systems theory in application to multiple orthogonal polynomials, numerical algorthms, random matrices, and in other branches of mathematical physics and applied mathematics where the Hermite-Pad\'{e} approximation problem is relevant. We present also the geometric algorithm, based on the notion of Desargues maps, of construction of solutions of the problem in the projective space over the field of rational functions. As a byproduct we obtain the corresponding generalization of the Wynn recurrence. We isolate the boundary data of the Hirota system which provide solutions to Hermite-Pad\'{e} problem showing that the corresponding reduction lowers dimensionality of the system. In particular, we obtain certain equations which, in addition to the known ones given by Paszkowski, can be considered as direct analogs of the Frobenius identities. We study the place of the reduced system within the integrability theory, which results in finding multidimensional (in the sense of number of variables) extension of the discrete-time Toda chain equations.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
In many scenarios, named entity recognition (NER) models severely suffer from unlabeled entity problem, where the entities of a sentence may not be fully annotated. Through empirical studies performed on synthetic datasets, we find two causes of the performance degradation. One is the reduction of annotated entities and the other is treating unlabeled entities as negative instances. The first cause has less impact than the second one and can be mitigated by adopting pretraining language models. The second cause seriously misguides a model in training and greatly affects its performances. Based on the above observations, we propose a general approach that is capable of eliminating the misguidance brought by unlabeled entities. The core idea is using negative sampling to keep the probability of training with unlabeled entities at a very low level. Experiments on synthetic datasets and real-world datasets show that our model is robust to unlabeled entity problem and surpasses prior baselines. On well-annotated datasets, our model is competitive with state-of-the-art method.
The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.