This paper addresses the gradient coding and coded matrix multiplication problems in distributed optimization and coded computing. We present a numerically stable binary coding method which overcomes the drawbacks of the gradient coding method proposed by Tandon et al., and can also be leveraged by coded computing networks whose servers are of heterogeneous nature. The proposed binary encoding avoids operations over the real and complex numbers which are inherently numerically unstable, thereby enabling numerically stable distributed encodings of the partial gradients. We then make connections between gradient coding and coded matrix multiplication. Specifically, we show that any gradient coding scheme can be extended to coded matrix multiplication. Furthermore, we show how the proposed binary gradient coding scheme can be used to construct three different coded matrix multiplication schemes, each achieving different trade-offs.
This paper studies the expressive power of artificial neural networks (NNs) with rectified linear units. To study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and NNs concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size NNs, which is equivalent to the existence of very special strongly polynomial time algorithms. First, we show that for any undirected graph with $n$ nodes, there is an NN of size $\mathcal{O}(n^3)$ that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with $n$ nodes and $m$ arcs, there is an NN of size $\mathcal{O}(m^2n^2)$ that takes the arc capacities as input and computes a maximum flow. These results imply in particular that the solutions of the corresponding parametric optimization problems where all edge weights or arc capacities are free parameters can be encoded in polynomial space and evaluated in polynomial time, and that such an encoding is provided by an NN.
Many well-known matrices $Z$ are associated to fast transforms corresponding to factorizations of the form $Z = X^J \ldots X^1$, where each factor $X^\ell$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations. Our first main contribution is to prove that any $N \times N$ matrix having the so-called butterfly structure admits a unique factorization into $J$ butterfly factors (where $N = 2^J$), and that the factors can be recovered by a hierarchical factorization method. This contrasts with existing approaches which fit the product of the butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorizations of the Hadamard or the Discrete Fourier Transform matrices of size $2^J$. Computing such factorizations costs $\mathcal{O}(N^2)$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $\mathcal{O}(N \log N)$ matrix-vector multiplications. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting that was recently established. While the butterfly structure corresponds to a fixed prescribed support for each factor, our second contribution is to obtain identifiability results with more general families of allowed sparsity patterns, taking into account permutation ambiguities when they are unavoidable. Typically, we show through the hierarchical paradigm that the butterfly factorization of the Discrete Fourier Transform matrix of size $2^J$ admits a unique sparse factorization into $J$ factors, when enforcing only $2$-sparsity by column and a block-diagonal structure on each factor.
Frequency information lies at the base of discriminating between textures, and therefore between different objects. Classical CNN architectures limit the frequency learning through fixed filter sizes, and lack a way of explicitly controlling it. Here, we build on the structured receptive field filters with Gaussian derivative basis. Yet, rather than using predetermined derivative orders, which typically result in fixed frequency responses for the basis functions, we learn these. We show that by learning the order of the basis we can accurately learn the frequency of the filters, and hence adapt to the optimal frequencies for the underlying learning task. We investigate the well-founded mathematical formulation of fractional derivatives to adapt the filter frequencies during training. Our formulation leads to parameter savings and data efficiency when compared to the standard CNNs and the Gaussian derivative CNN filter networks that we build upon.
Computational fluctuating hydrodynamics aims at understanding the impact of thermal fluctuations on fluid motions at small scales through numerical exploration. These fluctuations are modeled as stochastic flux terms and incorporated into the classical Navier-Stokes equations, which need to be solved numerically. In this paper, we present a novel projection-based method for solving the incompressible fluctuating hydrodynamics equations. By analyzing the equilibrium structure factor spectrum of the velocity field, we investigate how the inherent splitting errors affect the numerical solution of the stochastic partial differential equations in the presence of non-periodic boundary conditions, and how iterative corrections can reduce these errors. Our computational examples demonstrate both the capability of our approach to reproduce correctly stochastic properties of fluids at small scales as well as its potential use in the simulations of multi-physics problems.
We consider the Cahn-Hilliard equation with standard double-well potential. We employ a prototypical class of first order in time semi-implicit methods with implicit treatment of the linear dissipation term and explicit extrapolation of the nonlinear term. When the dissipation coefficient is held small, a conventional wisdom is to add a judiciously chosen stabilization term in order to afford relatively large time stepping and speed up the simulation. In practical numerical implementations it has been long observed that the resulting system exhibits remarkable stability properties in the regime where the stabilization parameter is $\mathcal O(1)$, the dissipation coefficient is vanishingly small and the size of the time step is moderately large. In this work we develop a new stability theory to address this perplexing phenomenon.
Source identification problems have multiple applications in engineering such as the identification of fissures in materials, determination of sources in electromagnetic fields or geophysical applications, detection of contaminant sources, among others. In this work we are concerned with the determination of a time-dependent source in a transport equation from noisy data measured at a fixed position. By means of Fourier techniques can be shown that the problem is ill-posed in the sense that the solution exists but it does not vary continuously with the data. A number of different techniques were developed by other authors to approximate the solution. In this work, we consider a family of parametric regularization operators to deal with the ill-posedness of the problem. We proposed a manner to select the regularization parameter as a function of noise level in data in order to obtain a regularized solution that approximate the unknown source. We find a H\"older type bound for the error of the approximated source when the unknown function is considered to be bounded in a given norm. Numerical examples illustrate the convergence and stability of the method.
In this paper, new results in random matrix theory are derived which allow us to construct a shrinkage estimator of the global minimum variance (GMV) portfolio when the shrinkage target is a random object. More specifically, the shrinkage target is determined as the holding portfolio estimated from previous data. The theoretical findings are applied to develop theory for dynamic estimation of the GMV portfolio, where the new estimator of its weights is shrunk to the holding portfolio at each time of reconstruction. Both cases with and without overlapping samples are considered in the paper. The non-overlapping samples corresponds to the case when different data of the asset returns are used to construct the traditional estimator of the GMV portfolio weights and to determine the target portfolio, while the overlapping case allows intersections between the samples. The theoretical results are derived under weak assumptions imposed on the data-generating process. No specific distribution is assumed for the asset returns except from the assumption of finite $4+\varepsilon$, $\varepsilon>0$, moments. Also, the population covariance matrix with unbounded spectrum can be considered. The performance of new trading strategies is investigated via an extensive simulation. Finally, the theoretical findings are implemented in an empirical illustration based on the returns on stocks included in the S\&P 500 index.
A key advantage of isogeometric discretizations is their accurate and well-behaved eigenfrequencies and eigenmodes. For degree two and higher, however, optical branches of spurious outlier frequencies and modes may appear due to boundaries or reduced continuity at patch interfaces. In this paper, we introduce a variational approach based on perturbed eigenvalue analysis that eliminates outlier frequencies without negatively affecting the accuracy in the remainder of the spectrum and modes. We then propose a pragmatic iterative procedure that estimates the perturbation parameters in such a way that the outlier frequencies are effectively reduced. We demonstrate that our approach allows for a much larger critical time-step size in explicit dynamics calculations. In addition, we show that the critical time-step size obtained with the proposed approach does not depend on the polynomial degree of spline basis functions.
Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.