The dispersion error is often the dominant error for computed solutions of wave propagation problems with high-frequency components. In this paper, we define and give explicit examples of $\alpha$-dispersion-relation-preserving schemes. These are dual-pair finite-difference schemes for systems of hyperbolic partial differential equations which preserve the dispersion-relation of the continuous problem uniformly to an $\alpha \%$-error tolerance. We give a general framework to design provably stable finite difference operators that preserve the dispersion relation for hyperbolic systems such as the elastic wave equation. The operators we derive here can resolve the highest frequency ($\pi$-mode) present on any equidistant grid at a tolerance of $5\%$ error. This significantly improves on the current standard that have a tolerance of $100 \%$ error.
We propose in this work to employ the Box-LASSO, a variation of the popular LASSO method, as a low-complexity decoder in a massive multiple-input multiple-output (MIMO) wireless communication system. The Box-LASSO is mainly useful for detecting simultaneously structured signals such as signals that are known to be sparse and bounded. One modulation technique that generates essentially sparse and bounded constellation points is the so-called generalized space-shift keying (GSSK) modulation. In this direction, we derive high dimensional sharp characterizations of various performance measures of the Box-LASSO such as the mean square error, probability of support recovery, and the element error rate, under independent and identically distributed (i.i.d.) Gaussian channels that are not perfectly known. In particular, the analytical characterizations can be used to demonstrate performance improvements of the Box-LASSO as compared to the widely used standard LASSO. Then, we can use these measures to optimally tune the involved hyper-parameters of Box-LASSO such as the regularization parameter. In addition, we derive optimum power allocation and training duration schemes in a training-based massive MIMO system. Monte Carlo simulations are used to validate these premises and to show the sharpness of the derived analytical results.
We revise the proof of low-rate upper bounds on the reliability function of discrete memoryless channels for ordinary and list-decoding schemes, in particular Berlekamp and Blinovsky's zero-rate bound, as well as Blahut's bound for low rates. The available proofs of the zero-rate bound devised by Berlekamp and Blinovsky are somehow complicated in that they contain in one form or another some cumbersome "non-standard" procedures or computations. Here we follow Blinovsky's idea of using a Ramsey-theoretic result by Komlos, and we complement it with some missing steps to present a proof which is rigorous and easier to inspect. Furthermore, we show how these techniques can be used to fix an error that invalidated the proof of Blahut's low-rate bound, which is here presented in an extended form for list decoding and for general channels.
The distributed convex optimization problem over the multi-agent system is considered in this paper, and it is assumed that each agent possesses its own cost function and communicates with its neighbours over a sequence of time-varying directed graphs. However, due to some reasons there exist communication delays while agents receive information from other agents, and we are going to seek the optimal value of the sum of agents' loss functions in this case. We desire to handle this problem with the push-sum distributed dual averaging (PS-DDA) algorithm which is introduced in \cite{Tsianos2012}. It is proved that this algorithm converges and the error decays at a rate $\mathcal{O}\left(T^{-0.5}\right)$ with proper step size, where $T$ is iteration span. The main result presented in this paper also illustrates the convergence of the proposed algorithm is related to the maximum value of the communication delay on one edge. We finally apply the theoretical results to numerical simulations to show the PS-DDA algorithm's performance.
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.
Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.
In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei et al., 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein's distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.
Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.
The Residual Networks of Residual Networks (RoR) exhibits excellent performance in the image classification task, but sharply increasing the number of feature map channels makes the characteristic information transmission incoherent, which losses a certain of information related to classification prediction, limiting the classification performance. In this paper, a Pyramidal RoR network model is proposed by analysing the performance characteristics of RoR and combining with the PyramidNet. Firstly, based on RoR, the Pyramidal RoR network model with channels gradually increasing is designed. Secondly, we analysed the effect of different residual block structures on performance, and chosen the residual block structure which best favoured the classification performance. Finally, we add an important principle to further optimize Pyramidal RoR networks, drop-path is used to avoid over-fitting and save training time. In this paper, image classification experiments were performed on CIFAR-10/100 and SVHN datasets, and we achieved the current lowest classification error rates were 2.96%, 16.40% and 1.59%, respectively. Experiments show that the Pyramidal RoR network optimization method can improve the network performance for different data sets and effectively suppress the gradient disappearance problem in DCNN training.