The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha>1$ using target dimension $\tfrac{\log n}{poly(\alpha)}$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $poly(kdn^{1/\alpha^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.
We study the hardness of approximating two reconfiguration problems. One problem is Maxmin $k$-Cut Reconfiguration, which is a reconfiguration analogue of Max $k$-Cut. The other is Maxmin E$k$-SAT Reconfiguration, which is a reconfiguration analogue of Max E$k$-SAT. The Probabilistically Checkable Reconfiguration Proof theorem due to Hirahara and Ohsaka (STOC 2024) and Karthik C. S. and Manurangsi (2023) implies that Maxmin 4-Cut Reconfiguration and Maxmin E3-SAT Reconfiguration are PSPACE-hard to approximate within a constant factor. However, the asymptotic behavior of approximability for these problems with respect to $k$ is not well understood. In this paper, we present the following hardness-of-approximation results and approximation algorithms for Maxmin $k$-Cut Reconfiguration and Maxmin E$k$-SAT Reconfiguration: $\bullet$ For every $k \geq 2$, Maxmin $k$-Cut Reconfiguration is PSPACE-hard to approximate within a factor of $1 - \Omega\left(\frac{1}{k}\right)$, whereas it can be approximated within a factor of $1-\frac{2}{k}$. Our lower and upper bounds demonstrate that Maxmin $k$-Cut Reconfiguration exhibits the asymptotically same approximability as Max $k$-Cut. $\bullet$ For every $k \geq 3$, Maxmin E$k$-SAT Reconfiguration is PSPACE-hard (resp. NP-hard) to approximate within a factor of $1-\Omega\left(\frac{1}{9^{\sqrt{k}}}\right)$ (resp. $1-\frac{1}{8k}$). On the other hand, it admits a deterministic $\left(1-\frac{2.5}{k}\right)$-factor approximation algorithm, implying that Maxmin E$k$-SAT Reconfiguration displays an asymptotically approximation threshold different from Max E$k$-SAT.
Palindromes are non-empty strings that read the same forward and backward. The problem of recognizing strings that can be represented as the concatenation of even-length palindromes, the concatenation of palindromes of length greater than one, and the concatenation of exactly $k$ palindromes was introduced in the seminal paper of Knuth, Morris, and Pratt [SIAM J. Comput., 1977]. In this work, we study the problem of recognizing so-called $k$-palindromic strings, which can be represented as the concatenation of exactly $k$ palindromes. It was shown that the problem is solvable in linear space and time [Rubinchik and Schur, MFCS'2020]. We aim to develop a sublinear-space solution, and show the following results: (1) First, we show a structural characterization of the set of all $k$-palindromic prefixes of a string by representing it as a union of a small number of highly structured string sets, called affine prefix sets. We show that the size of this representation is of the right asymptotic form by constructing an almost matching lower bound. (2) Secondly, we derive a read-only algorithm that, given a string $T$ of length $n$ and an integer $k$, computes a compact representation of the $i$-palindromic prefixes of $T$, for all $1 \le i \le k$. (3) Finally, we also give a read-only algorithm for computing the palindromic length of $T$, which is the smallest $\ell$ such that $T$ is $\ell$-palindromic, given that $\ell \le k$. The algorithms use $\mathcal O(n \cdot 6^{k^2} \cdot \log^k n)$ time and $\mathcal O(6^{k^2} \cdot \log^k n)$ space. Our work is the first step toward a streaming algorithm for the recognition of $k$-palindromic prefixes.
We provide a version for lower probabilities of Monge's and Kantorovich's optimal transport problems. We show that, when the lower probabilities are the lower envelopes of $\epsilon$-contaminated sets, then our version of Monge's, and a restricted version of our Kantorovich's problems, coincide with their respective classical versions. We also give sufficient conditions for the existence of our version of Kantorovich's optimal plan, and for the two problems to be equivalent. As a byproduct, we show that for $\epsilon$-contaminations the lower probability versions of Monge's and Kantorovich's optimal transport problems need not coincide. The applications of our results to Machine Learning and Artificial Intelligence are also discussed.
The Optimal Transport (OT) problem investigates a transport map that connects two distributions while minimizing a given cost function. Finding such a transport map has diverse applications in machine learning, such as generative modeling and image-to-image translation. In this paper, we introduce a scalable and simulation-free approach for solving the Entropic Unbalanced Optimal Transport (EUOT) problem. We derive the dynamical form of this EUOT problem, which is a generalization of the Schr\"odinger bridges (SB) problem. Based on this, we derive dual formulation and optimality conditions of the EUOT problem from the stochastic optimal control interpretation. By leveraging these properties, we propose a simulation-free algorithm to solve EUOT, called Simulation-free EUOT (SF-EUOT). While existing SB models require expensive simulation costs during training and evaluation, our model achieves simulation-free training and one-step generation by utilizing the reciprocal property. Our model demonstrates significantly improved scalability in generative modeling and image-to-image translation tasks compared to previous SB methods.
The Euler Characteristic Transform (ECT) is an efficiently-computable geometrical-topological invariant that characterizes the global shape of data. In this paper, we introduce the Local Euler Characteristic Transform ($\ell$-ECT), a novel extension of the ECT particularly designed to enhance expressivity and interpretability in graph representation learning. Unlike traditional Graph Neural Networks (GNNs), which may lose critical local details through aggregation, the $\ell$-ECT provides a lossless representation of local neighborhoods. This approach addresses key limitations in GNNs by preserving nuanced local structures while maintaining global interpretability. Moreover, we construct a rotation-invariant metric based on $\ell$-ECTs for spatial alignment of data spaces. Our method exhibits superior performance than standard GNNs on a variety of node classification tasks, particularly in graphs with high heterophily.
In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$ under $(\epsilon, \delta)$-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $\Omega(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.
Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergence. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions. To address these issues, we focus on $\alpha$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $\alpha$-divergence loss function ($\alpha$-Div), is derived. $\alpha$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $\alpha$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness in optimization using $\alpha$-Div. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $\alpha$-divergence.
We consider minimum time multicasting problems in directed and undirected graphs: given a root node and a subset of $t$ terminal nodes, multicasting seeks to find the minimum number of rounds within which all terminals can be informed with a message originating at the root. In each round, the telephone model we study allows the information to move via a matching from the informed nodes to the uninformed nodes. Since minimum time multicasting in digraphs is poorly understood compared to the undirected variant, we study an intermediate problem in undirected graphs that specifies a target $k < t$, and requires the only $k$ of the terminals be informed in the minimum number of rounds. For this problem, we improve implications of prior results and obtain an $\tilde{O}(t^{1/3})$ multiplicative approximation. For the directed version, we obtain an {\em additive} $\tilde{O}(k^{1/2})$ approximation algorithm (with a poly-logarithmic multiplicative factor). Our algorithms are based on reductions to the related problems of finding $k$-trees of minimum poise (sum of maximum degree and diameter) and applying a combination of greedy network decomposition techniques and set covering under partition matroid constraints.
We consider the $k$-min-sum-radii ($k$-MSR) clustering problem with fairness constraints. The $k$-min-sum-radii problem is a mixture of the classical $k$-center and $k$-median problems. We are given a set of points $P$ in a metric space and a number $k$ and aim to partition the points into $k$ clusters, each of the clusters having one designated center. The objective to minimize is the sum of the radii of the $k$ clusters (where in $k$-center we would only consider the maximum radius and in $k$-median we would consider the sum of the individual points' costs). Various notions of fair clustering have been introduced lately, and we follow the definitions due to Chierichetti, Kumar, Lattanzi and Vassilvitskii [NeurIPS 2017] which demand that cluster compositions shall follow the proportions of the input point set with respect to some given sensitive attribute. For the easier case where the sensitive attribute only has two possible values and each is equally frequent in the input, the aim is to compute a clustering where all clusters have a 1:1 ratio with respect to this attribute. We call this the 1:1 case. There has been a surge of FPT-approximation algorithms for the $k$-MSR problem lately, solving the problem both in the unconstrained case and in several constrained problem variants. We add to this research area by designing an FPT $(6+\epsilon)$-approximation that works for $k$-MSR under the mentioned general fairness notion. For the special 1:1 case, we improve our algorithm to achieve a $(3+\epsilon)$-approximation.
As a promising paradigm to collaboratively train models with decentralized data, Federated Learning (FL) can be exploited to fine-tune Large Language Models (LLMs). While LLMs correspond to huge size, the scale of the training data significantly increases, which leads to tremendous amounts of computation and communication costs. The training data is generally non-Independent and Identically Distributed (non-IID), which requires adaptive data processing within each device. Although Low Rank Adaptation (LoRA) can significantly reduce the scale of parameters to update in the fine-tuning process, it still takes unaffordable time to transfer the low-rank parameters of all the layers in LLMs. In this paper, we propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods, i.e., adaptive federated curriculum learning and efficient sparse parameter update. First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process. Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA so as to improve the efficiency of the FL fine-tuning process. Extensive experimental results based on 10 datasets demonstrate that FibecFed yields excellent performance (up to 45.35% in terms of accuracy) and superb fine-tuning speed (up to 98.61% faster) compared with 17 baseline approaches).