In the Milky Way, the distribution of stars in the $[\alpha/\mathrm{Fe}]$ vs. $[\mathrm{Fe/H}]$ and $[\mathrm{Fe/H}]$ vs. age planes holds essential information about the history of star formation, accretion, and dynamical evolution of the Galactic disk. We investigate these planes by applying novel statistical methods called copulas and elicitable maps to the ages and abundances of red giants in the APOGEE survey. We find that the low- and high-$\alpha$ disk stars have a clean separation in copula space and use this to provide an automated separation of the $\alpha$ sequences using a purely statistical approach. This separation reveals that the high-$\alpha$ disk ends at the same [$\alpha$/Fe] and age at high $[\mathrm{Fe/H}]$ as the low-$[\mathrm{Fe/H}]$ start of the low-$\alpha$ disk, thus supporting a sequential formation scenario for the high- and low-$\alpha$ disks. We then combine copulas with elicitable maps to precisely obtain the correlation between stellar age $\tau$ and metallicity $[\mathrm{Fe/H}]$ conditional on Galactocentric radius $R$ and height $z$ in the range $0 < R < 20$ kpc and $|z| < 2$ kpc. The resulting trends in the age-metallicity correlation with radius, height, and [$\alpha$/Fe] demonstrate a $\approx 0$ correlation wherever kinematically-cold orbits dominate, while the naively-expected negative correlation is present where kinematically-hot orbits dominate. This is consistent with the effects of spiral-driven radial migration, which must be strong enough to completely flatten the age-metallicity structure of the low-$\alpha$ disk.
We introduce TeraHAC, a $(1+\epsilon)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+\epsilon)$-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of $(1+\epsilon)$-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed. We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.
A $c$-labeling $\phi: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $\pi$ of $G$, there is some vertex $v$ so that $\phi(v) \neq \phi(\pi(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling. We consider a compact version of Tyshkevich's graph decomposition theorem where trivial components are maximally combined to form a complete graph or a graph of isolated vertices. Suppose the compact canonical decomposition of $G$ is $G_{k} \circ G_{k-1} \circ \cdots \circ G_1 \circ G_0$. We prove that $\phi$ is a distinguishing labeling of $G$ if and only if $\phi$ is a distinguishing labeling of $G_i$ when restricted to $V(G_i)$ for $i = 0, \hdots, k$. Thus, $D(G) = \max \{D(G_i), i = 0, \hdots, k \}$. We then present an algorithm that computes the distinguishing number of a unigraph in linear time.
A slow decaying Kolmogorov n-width of the solution manifold of a parametric partial differential equation precludes the realization of efficient linear projection-based reduced-order models. This is due to the high dimensionality of the reduced space needed to approximate with sufficient accuracy the solution manifold. To solve this problem, neural networks, in the form of different architectures, have been employed to build accurate nonlinear regressions of the solution manifolds. However, the majority of the implementations are non-intrusive black-box surrogate models, and only a part of them perform dimension reduction from the number of degrees of freedom of the discretized parametric models to a latent dimension. We present a new intrusive and explicable methodology for reduced-order modelling that employs neural networks for solution manifold approximation but that does not discard the physical and numerical models underneath in the predictive/online stage. We will focus on autoencoders used to compress further the dimensionality of linear approximants of solution manifolds, achieving in the end a nonlinear dimension reduction. After having obtained an accurate nonlinear approximant, we seek for the solutions on the latent manifold with the residual-based nonlinear least-squares Petrov-Galerkin method, opportunely hyper-reduced in order to be independent from the number of degrees of freedom. New adaptive hyper-reduction strategies are developed along with the employment of local nonlinear approximants. We test our methodology on two nonlinear time-dependent parametric benchmarks involving a supersonic flow past a NACA airfoil with changing Mach number and an incompressible turbulent flow around the Ahmed body with changing slant angle.
A function $f : U \to \{0,\ldots,n-1\}$ is a minimal perfect hash function for a set $S \subseteq U$ of size $n$, if $f$ bijectively maps $S$ into the first $n$ natural numbers. These functions are important for many practical applications in computing, such as search engines, computer networks, and databases. Several algorithms have been proposed to build minimal perfect hash functions that: scale well to large sets, retain fast evaluation time, and take very little space, e.g., 2 - 3 bits/key. PTHash is one such algorithm, achieving very fast evaluation in compressed space, typically several times faster than other techniques. In this work, we propose a new construction algorithm for PTHash enabling: (1) multi-threading, to either build functions more quickly or more space-efficiently, and (2) external-memory processing to scale to inputs much larger than the available internal memory. Only few other algorithms in the literature share these features, despite of their big practical impact. We conduct an extensive experimental assessment on large real-world string collections and show that, with respect to other techniques, PTHash is competitive in construction time and space consumption, but retains 2 - 6$\times$ better lookup time.
We show that the minimax sample complexity for estimating the pseudo-spectral gap $\gamma_{\mathsf{ps}}$ of an ergodic Markov chain in constant multiplicative error is of the order of $$\tilde{\Theta}\left( \frac{1}{\gamma_{\mathsf{ps}} \pi_{\star}} \right),$$ where $\pi_\star$ is the minimum stationary probability, recovering the known bound in the reversible setting for estimating the absolute spectral gap [Hsu et al., 2019], and resolving an open problem of Wolfer and Kontorovich [2019]. Furthermore, we strengthen the known empirical procedure by making it fully-adaptive to the data, thinning the confidence intervals and reducing the computational complexity. Along the way, we derive new properties of the pseudo-spectral gap and introduce the notion of a reversible dilation of a stochastic matrix.
We analyze an algorithmic question about immersion theory: for which $m$, $n$, and $CAT=\mathbf{Diff}$ or $\mathbf{PL}$ is the question of whether an $m$-dimensional $CAT$-manifold is immersible in $\mathbb{R}^n$ decidable? As a corollary, we show that the smooth embeddability of an $m$-manifold with boundary in $\mathbb{R}^n$ is undecidable when $n-m$ is even and $11m \geq 10n+1$.
Binary codes of length $n$ may be viewed as subsets of vertices of the Boolean hypercube $\{0,1\}^n$. The ability of a linear error-correcting code to recover erasures is connected to influences of particular monotone Boolean functions. These functions provide insight into the role that particular coordinates play in a code's erasure repair capability. In this paper, we consider directly the influences of coordinates of a code. We describe a family of codes, called codes with minimum disjoint support, for which all influences may be determined. As a consequence, we find influences of repetition codes and certain distinct weight codes. Computing influences is typically circumvented by appealing to the transitivity of the automorphism group of the code. Some of the codes considered here fail to meet the transitivity conditions requires for these standard approaches, yet we can compute them directly.
We investigate a linearised Calder\'on problem in a two-dimensional bounded simply connected $C^{1,\alpha}$ domain $\Omega$. After extending the linearised problem for $L^2(\Omega)$ perturbations, we orthogonally decompose $L^2(\Omega) = \oplus_{k=0}^\infty \mathcal{H}_k$ and prove Lipschitz stability on each of the infinite-dimensional $\mathcal{H}_k$ subspaces. In particular, $\mathcal{H}_0$ is the space of square-integrable harmonic perturbations. This appears to be the first Lipschitz stability result for infinite-dimensional spaces of perturbations in the context of the (linearised) Calder\'on problem. Previous optimal estimates with respect to the operator norm of the data map have been of the logarithmic-type in infinite-dimensional settings. The remarkable improvement is enabled by using the Hilbert-Schmidt norm for the Neumann-to-Dirichlet boundary map and its Fr\'echet derivative with respect to the conductivity coefficient. We also derive a direct reconstruction method that inductively yields the orthogonal projections of a general $L^2(\Omega)$ perturbation onto the $\mathcal{H}_k$ spaces, hence reconstructing any $L^2(\Omega)$ perturbation.
Given a graph $G$, the number of its vertices is represented by $n(G)$, while the number of its edges is denoted as $m(G)$. An independent set in a graph is a set of vertices where no two vertices are adjacent to each other and the size of the maximum independent set is denoted by $\alpha(G)$. A matching in a graph refers to a set of edges where no two edges share a common vertex and the maximum matching size is denoted by $\mu(G)$. If $\alpha(G) + \mu(G) = n(G)$, then the graph $G$ is called a K\"{o}nig-Egerv\'{a}ry graph. Considering a graph $G$ with a degree sequence $d_1 \leq d_2 \leq \cdots \leq d_n$, the annihilation number $a(G)$ is defined as the largest integer $k$ such that the sum of the first $k$ degrees in the sequence is less than or equal to $m(G)$ (Pepper, 2004). It is a known fact that $\alpha(G)$ is less than or equal to $a(G)$ for any graph $G$. Our goal is to estimate the difference between these two parameters. Specifically, we prove a series of inequalities, including $a(G) - \alpha(G) \leq \frac{\mu(G) - 1}{2}$ for trees, $a(G) - \alpha(G) \leq 2 + \mu(G) - 2\sqrt{1 + \mu(G)}$ for bipartite graphs and $a(G) - \alpha(G) \leq \mu(G) - 2$ for K\"{o}nig-Egerv\'{a}ry graphs. Furthermore, we demonstrate that these inequalities serve as tight upper bounds for the difference between the annihilation and independence numbers, regardless of the assigned value for $\mu(G)$.
We consider finite element approximations to the optimal constant for the Hardy inequality with exponent $p=2$ in bounded domains of dimension $n=1$ or $n\geq 3$. For finite element spaces of piecewise linear and continuous functions on a mesh of size $h$, we prove that the approximate Hardy constant, $S_h^n$, converges to the optimal Hardy constant $S^n$ no slower than $O(1/\vert \log h \vert)$. We also show that the convergence is no faster than $O(1/\vert \log h \vert^2)$ if $n=1$ or if $n\geq 3$, the domain is the unit ball, and the finite element discretization exploits the rotational symmetry of the problem. Our estimates are compared to exact values for $S_h^n$ obtained computationally.