Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8s' accuracy is superior to VGG16s', and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input-output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer.
In this article, we focus on the error that is committed when computing the matrix logarithm using the Gauss--Legendre quadrature rules. These formulas can be interpreted as Pad\'e approximants of a suitable Gauss hypergeometric function. Empirical observation tells us that the convergence of these quadratures becomes slow when the matrix is not close to the identity matrix, thus suggesting the usage of an inverse scaling and squaring approach for obtaining a matrix with this property. The novelty of this work is the introduction of error estimates that can be used to select a priori both the number of Legendre points needed to obtain a given accuracy and the number of inverse scaling and squaring to be performed. We include some numerical experiments to show the reliability of the estimates introduced.
In this paper we study the orbit closure problem for a reductive group $G\subseteq GL(X)$ acting on a finite dimensional vector space $V$ over $\C$. We assume that the center of $GL(X)$ lies within $G$ and acts on $V$ through a fixed non-trivial character. We study points $y,z\in V$ where (i) $z$ is obtained as the leading term of the action of a 1-parameter subgroup $\lambda (t)\subseteq G$ on $y$, and (ii) $y$ and $z$ have large distinctive stabilizers $K,H \subseteq G$. Let $O(z)$ (resp. $O(y)$) denote the $G$-orbits of $z$ (resp. $y$), and $\overline{O(z)}$ (resp. $\overline{O(y)}$) their closures, then (i) implies that $z\in \overline{O(y)}$. We address the question: under what conditions can (i) and (ii) be simultaneously satisfied, i.e, there exists a 1-PS $\lambda \subseteq G$ for which $z$ is observed as a limit of $y$. Using $\lambda$, we develop a leading term analysis which applies to $V$ as well as to ${\cal G}= Lie(G)$ the Lie algebra of $G$ and its subalgebras ${\cal K}$ and ${\cal H}$, the Lie algebras of $K$ and $H$ respectively. Through this we construct the Lie algebra $\hat{\cal K} \subseteq {\cal H}$ which connects $y$ and $z$ through their Lie algebras. We develop the properties of $\hat{\cal K}$ and relate it to the action of ${\cal H}$ on $\overline{N}=V/T_z O(z)$, the normal slice to the orbit $O(z)$. We examine the case of {\em alignment} when a semisimple element belongs to both ${\cal H}$ and ${\cal K}$, and the conditions for the same. We illustrate some consequences of alignment. Next, we examine the possibility of {\em intermediate $G$-varieties} $W$ which lie between the orbit closures of $z$ and $y$, i.e. $\overline{O(z)} \subsetneq W \subsetneq O(y)$. These have a direct bearing on representation theoretic as well as geometric properties which connect $z$ and $y$.
We describe and analyze a hybrid finite element/neural network method for predicting solutions of partial differential equations. The methodology is designed for obtaining fine scale fluctuations from neural networks in a local manner. The network is capable of locally correcting a coarse finite element solution towards a fine solution taking the source term and the coarse approximation as input. Key observation is the dependency between quality of predictions and the size of training set which consists of different source terms and corresponding fine & coarse solutions. We provide the a priori error analysis of the method together with the stability analysis of the neural network. The numerical experiments confirm the capability of the network predicting fine finite element solutions. We also illustrate the generalization of the method to problems where test and training domains differ from each other.
We introduce a general class of transport distances ${\rm WB}_{\Lambda}$ over the space of positive semi-definite matrix-valued Radon measures $\mathcal{M}(\Omega,\mathbb{S}_+^n)$, called the weighted Wasserstein-Bures distance. Such a distance is defined via a generalized Benamou-Brenier formulation with a weighted action functional and an abstract matricial continuity equation, which leads to a convex optimization problem. Some recently proposed models, including the Kantorovich-Bures distance and the Wasserstein-Fisher-Rao distance, can naturally fit into ours. We give a complete characterization of the minimizer and explore the topological and geometrical properties of the space $(\mathcal{M}(\Omega,\mathbb{S}_+^n),{\rm WB}_{\Lambda})$. In particular, we show that $(\mathcal{M}(\Omega,\mathbb{S}_+^n),{\rm WB}_{\Lambda})$ is a complete geodesic space and exhibits a conic structure.
Engineers are often faced with the decision to select the most appropriate model for simulating the behavior of engineered systems, among a candidate set of models. Experimental monitoring data can generate significant value by supporting engineers toward such decisions. Such data can be leveraged within a Bayesian model updating process, enabling the uncertainty-aware calibration of any candidate model. The model selection task can subsequently be cast into a problem of decision-making under uncertainty, where one seeks to select the model that yields an optimal balance between the reward associated with model precision, in terms of recovering target Quantities of Interest (QoI), and the cost of each model, in terms of complexity and compute time. In this work, we examine the model selection task by means of Bayesian decision theory, under the prism of availability of models of various refinements, and thus varying levels of fidelity. In doing so, we offer an exemplary application of this framework on the IMAC-MVUQ Round-Robin Challenge. Numerical investigations show various outcomes of model selection depending on the target QoI.
We investigate the randomized decision tree complexity of a specific class of read-once threshold functions. A read-once threshold formula can be defined by a rooted tree, every internal node of which is labeled by a threshold function $T_k^n$ (with output 1 only when at least $k$ out of $n$ input bits are 1) and each leaf by a distinct variable. Such a tree defines a Boolean function in a natural way. We focus on the randomized decision tree complexity of such functions, when the underlying tree is a uniform tree with all its internal nodes labeled by the same threshold function. We prove lower bounds of the form $c(k,n)^d$, where $d$ is the depth of the tree. We also treat trees with alternating levels of AND and OR gates separately and show asymptotically optimal bounds, extending the known bounds for the binary case.
In recent decades, a growing number of discoveries in fields of mathematics have been assisted by computer algorithms, primarily for exploring large parameter spaces that humans would take too long to investigate. As computers and algorithms become more powerful, an intriguing possibility arises - the interplay between human intuition and computer algorithms can lead to discoveries of novel mathematical concepts that would otherwise remain elusive. To realize this perspective, we have developed a massively parallel computer algorithm that discovers an unprecedented number of continued fraction formulas for fundamental mathematical constants. The sheer number of formulas discovered by the algorithm unveils a novel mathematical structure that we call the conservative matrix field. Such matrix fields (1) unify thousands of existing formulas, (2) generate infinitely many new formulas, and most importantly, (3) lead to unexpected relations between different mathematical constants, including multiple integer values of the Riemann zeta function. Conservative matrix fields also enable new mathematical proofs of irrationality. In particular, we can use them to generalize the celebrated proof by Ap\'ery for the irrationality of $\zeta(3)$. Utilizing thousands of personal computers worldwide, our computer-supported research strategy demonstrates the power of experimental mathematics, highlighting the prospects of large-scale computational approaches to tackle longstanding open problems and discover unexpected connections across diverse fields of science.
The categorical Gini correlation, $\rho_g$, was proposed by Dang et al. to measure the dependence between a categorical variable, $Y$ , and a numerical variable, $X$. It has been shown that $\rho_g$ has more appealing properties than current existing dependence measurements. In this paper, we develop the jackknife empirical likelihood (JEL) method for $\rho_g$. Confidence intervals for the Gini correlation are constructed without estimating the asymptotic variance. Adjusted and weighted JEL are explored to improve the performance of the standard JEL. Simulation studies show that our methods are competitive to existing methods in terms of coverage accuracy and shortness of confidence intervals. The proposed methods are illustrated in an application on two real datasets.
A generalized unbalanced optimal transport distance ${\rm WB}_{\Lambda}$ on matrix-valued measures $\mathcal{M}(\Omega,\mathbb{S}_+^n)$ was defined in [arXiv:2011.05845] \`{a} la Benamou-Brenier, which extends the Kantorovich-Bures and the Wasserstein-Fisher-Rao distances. In this work, we investigate the convergence properties of the discrete transport problems associated with ${\rm WB}_{\Lambda}$. We first present a convergence framework for abstract discretization. Then, we propose a specific discretization scheme that aligns with this framework, under the assumption that the initial and final distributions are absolutely continuous with respect to the Lebesgue measure. Moreover, thanks to the static formulation, we show that such an assumption can be removed for the Wasserstein-Fisher-Rao distance.
We introduce the extremal range, a local statistic for studying the spatial extent of extreme events in random fields on $\mathbb{R}^2$. Conditioned on exceedance of a high threshold at a location $s$, the extremal range at $s$ is the random variable defined as the smallest distance from $s$ to a location where there is a non-exceedance. We leverage tools from excursion-set theory to study distributional properties of the extremal range, propose parametric models and predict the median extremal range at extreme threshold levels. The extremal range captures the rate at which the spatial extent of conditional extreme events scales for increasingly high thresholds, and we relate its distributional properties with the bivariate tail dependence coefficient and the extremal index of time series in classical Extreme-Value Theory. Consistent estimation of the distribution function of the extremal range for stationary random fields is proven. For non-stationary random fields, we implement generalized additive median regression to predict extremal-range maps at very high threshold levels. An application to two large daily temperature datasets, namely reanalyses and climate-model simulations for France, highlights decreasing extremal dependence for increasing threshold levels and reveals strong differences in joint tail decay rates between reanalyses and simulations.