Maximizing a non-negative, monontone, submodular function $f$ over $n$ elements under a cardinality constraint $k$ (SMCC) is a well-studied NP-hard problem. It has important applications in, e.g., machine learning and influence maximization. Though the theoretical problem admits polynomial-time approximation algorithms, solving it in practice often involves frequently querying submodular functions that are expensive to compute. This has motivated significant research into designing parallel approximation algorithms in the adaptive complexity model; adaptive complexity (adaptivity) measures the number of sequential rounds of $\text{poly}(n)$ function queries an algorithm requires. The state-of-the-art algorithms can achieve $(1-\frac{1}{e}-\varepsilon)$-approximate solutions with $O(\frac{1}{\varepsilon^2}\log n)$ adaptivity, which approaches the known adaptivity lower-bounds. However, the $O(\frac{1}{\varepsilon^2} \log n)$ adaptivity only applies to maximizing worst-case functions that are unlikely to appear in practice. Thus, in this paper, we consider the special class of $p$-superseparable submodular functions, which places a reasonable constraint on $f$, based on the parameter $p$, and is more amenable to maximization, while also having real-world applicability. Our main contribution is the algorithm LS+GS, a finer-grained version of the existing LS+PGB algorithm, designed for instances of SMCC when $f$ is $p$-superseparable; it achieves an expected $(1-\frac{1}{e}-\varepsilon)$-approximate solution with $O(\frac{1}{\varepsilon^2}\log(p k))$ adaptivity independent of $n$. Additionally, unrelated to $p$-superseparability, our LS+GS algorithm uses only $O(\frac{n}{\varepsilon} + \frac{\log n}{\varepsilon^2})$ oracle queries, which has an improved dependence on $\varepsilon^{-1}$ over the state-of-the-art LS+PGB; this is achieved through the design of a novel thresholding subroutine.
A new $H(\textrm{divdiv})$-conforming finite element is presented, which avoids the need for super-smoothness by redistributing the degrees of freedom to edges and faces. This leads to a hybridizable mixed method with superconvergence for the biharmonic equation. Moreover, new finite element divdiv complexes are established. Finally, new weak Galerkin and $C^0$ discontinuous Galerkin methods for the biharmonic equation are derived.
An average-case variant of the $k$-SUM conjecture asserts that finding $k$ numbers that sum to 0 in a list of $r$ random numbers, each of the order $r^k$, cannot be done in much less than $r^{\lceil k/2 \rceil}$ time. On the other hand, in the dense regime of parameters, where the list contains more numbers and many solutions exist, the complexity of finding one of them can be significantly improved by Wagner's $k$-tree algorithm. Such algorithms for $k$-SUM in the dense regime have many applications, notably in cryptanalysis. In this paper, assuming the average-case $k$-SUM conjecture, we prove that known algorithms are essentially optimal for $k= 3,4,5$. For $k>5$, we prove the optimality of the $k$-tree algorithm for a limited range of parameters. We also prove similar results for $k$-XOR, where the sum is replaced with exclusive or. Our results are obtained by a self-reduction that, given an instance of $k$-SUM which has a few solutions, produces from it many instances in the dense regime. We solve each of these instances using the dense $k$-SUM oracle, and hope that a solution to a dense instance also solves the original problem. We deal with potentially malicious oracles (that repeatedly output correlated useless solutions) by an obfuscation process that adds noise to the dense instances. Using discrete Fourier analysis, we show that the obfuscation eliminates correlations among the oracle's solutions, even though its inputs are highly correlated.
Inference for Variational Autoencoders (VAEs) consists of learning two models: (1) a generative model, which transforms a simple distribution over a latent space into the distribution over observed data, and (2) an inference model, which approximates the posterior of the latent codes given data. The two components are learned jointly via a lower bound to the generative model's log marginal likelihood. In early phases of joint training, the inference model poorly approximates the latent code posteriors. Recent work showed that this leads optimization to get stuck in local optima, negatively impacting the learned generative model. As such, recent work suggests ensuring a high-quality inference model via iterative training: maximizing the objective function relative to the inference model before every update to the generative model. Unfortunately, iterative training is inefficient, requiring heuristic criteria for reverting from iterative to joint training for speed. Here, we suggest an inference method that trains the generative and inference models independently. It approximates the posterior of the true model a priori; fixing this posterior approximation, we then maximize the lower bound relative to only the generative model. By conventional wisdom, this approach should rely on the true prior and likelihood of the true model to approximate its posterior (which are unknown). However, we show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior. We then use MAPA to develop a proof-of-concept inference method. We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines. Lastly, we present a roadmap for scaling the MAPA-based inference method to high-dimensional data.
We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).
Given an undirected graph $G$, a quasi-clique is a subgraph of $G$ whose density is at least $\gamma$ $(0 < \gamma \leq 1)$. Two optimization problems can be defined for quasi-cliques: the Maximum Quasi-Clique (MQC) Problem, which finds a quasi-clique with maximum vertex cardinality, and the Densest $k$-Subgraph (DKS) Problem, which finds the densest subgraph given a fixed cardinality constraint. Most existing approaches to solve both problems often disregard the requirement of connectedness, which may lead to solutions containing isolated components that are meaningless for many real-life applications. To address this issue, we propose two flow-based connectedness constraints to be integrated into known Mixed-Integer Linear Programming (MILP) formulations for either MQC or DKS problems. We compare the performance of MILP formulations enhanced with our connectedness constraints in terms of both running time and number of solved instances against existing approaches that ensure quasi-clique connectedness. Experimental results demonstrate that our constraints are quite competitive, making them valuable for practical applications requiring connectedness.
We introduce a constructive analogue of $\Phi$-dimension, a notion of Hausdorff dimension developed using a restricted class of coverings of a set. A class of coverings $\Phi$ is said to be "faithful" to Hausdorff dimension if the $\Phi$-dimension and Hausdorff dimension coincide for every set. We prove a Point-to-Set Principle for $\Phi$-dimension, through which we get Point-to-Set Principles for Hausdorff Dimension, continued-fraction dimension and dimension of Cantor Coverings as special cases. Using the Point-to-Set Principle for Cantor coverings and a new technique for the construction of sequences satisfying a certain Kolmogorov complexity condition, we show that the notions of faithfulness of Cantor coverings at the Hausdorff and constructive levels are equivalent. We adapt the result by Albeverio, Ivanenko, Lebid, and Torbin to derive the necessary and sufficient conditions for the constructive dimension faithfulness of the coverings generated by the Cantor series expansion. This condition yields two general classes of representations of reals, one whose constructive dimensions that are equivalent to the constructive Hausdorff dimensions, and another, whose effective dimensions are different from the effective Hausdorff dimensions, completely classifying Cantor series expansions of reals.
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-\Theta(\epsilon^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.
This paper introduces AL$\ell_0$CORE, a new form of probabilistic non-negative tensor decomposition. AL$\ell_0$CORE is a Tucker decomposition where the number of non-zero elements (i.e., the $\ell_0$-norm) of the core tensor is constrained to a preset value $Q$ much smaller than the size of the core. While the user dictates the total budget $Q$, the locations and values of the non-zero elements are latent variables and allocated across the core tensor during inference. AL$\ell_0$CORE -- i.e., $allo$cated $\ell_0$-$co$nstrained $core$-- thus enjoys both the computational tractability of CP decomposition and the qualitatively appealing latent structure of Tucker. In a suite of real-data experiments, we demonstrate that AL$\ell_0$CORE typically requires only tiny fractions (e.g.,~1%) of the full core to achieve the same results as full Tucker decomposition at only a correspondingly tiny fraction of the cost.
For a $P$-indexed persistence module ${\sf M}$, the (generalized) rank of ${\sf M}$ is defined as the rank of the limit-to-colimit map for ${\sf M}$ over the poset $P$. For $2$-parameter persistence modules, recently a zigzag persistence based algorithm has been proposed that takes advantage of the fact that generalized rank for $2$-parameter modules is equal to the number of full intervals in a zigzag module defined on the boundary of the poset. Analogous definition of boundary for $d$-parameter persistence modules or general $P$-indexed persistence modules does not seem plausible. To overcome this difficulty, we first unfold a given $P$-indexed module ${\sf M}$ into a zigzag module ${\sf M}_{ZZ}$ and then check how many full interval modules in a decomposition of ${\sf M}_{ZZ}$ can be folded back to remain full in ${\sf M}$. This number determines the generalized rank of ${\sf M}$. For special cases of degree-$d$ homology for $d$-complexes, we obtain a more efficient algorithm including a linear time algorithm for degree-$1$ homology in graphs.
Optimal transport (OT) theory has reshaped the field of generative modeling: Combined with neural networks, recent \textit{Neural OT} (N-OT) solvers use OT as an inductive bias, to focus on ``thrifty'' mappings that minimize average displacement costs. This core principle has fueled the successful application of N-OT solvers to high-stakes scientific challenges, notably single-cell genomics. N-OT solvers are, however, increasingly confronted with practical challenges: while most N-OT solvers can handle squared-Euclidean costs, they must be repurposed to handle more general costs; their reliance on deterministic Monge maps as well as mass conservation constraints can easily go awry in the presence of outliers; mapping points \textit{across} heterogeneous spaces is out of their reach. While each of these challenges has been explored independently, we propose a new framework that can handle, natively, all of these needs. The \textit{generative entropic neural OT} (GENOT) framework models the conditional distribution $\pi_\varepsilon(\*y|\*x)$ of an optimal \textit{entropic} coupling $\pi_\varepsilon$, using conditional flow matching. GENOT is generative, and can transport points \textit{across} spaces, guided by sample-based, unbalanced solutions to the Gromov-Wasserstein problem, that can use any cost. We showcase our approach on both synthetic and single-cell datasets, using GENOT to model cell development, predict cellular responses, and translate between data modalities.