A Lattice is a partially ordered set where both least upper bound and greatest lower bound of any pair of elements are unique and exist within the set. K\"{o}tter and Kschischang proved that codes in the linear lattice can be used for error and erasure-correction in random networks. Codes in the linear lattice have previously been shown to be special cases of codes in modular lattices. Two well known classifications of semimodular lattices are geometric and distributive lattices. Most of the frequently used coding spaces are examples of either or both. We have identified the unique criterion which makes a geometric lattice distributive, thus characterizing all finite geometric distributive lattices. Our characterization helps to prove a conjecture regarding the maximum size of a distributive sublattice of a finite geometric lattice and identify the maximal case. The Whitney numbers of the class of geometric distributive lattices are also calculated. We present a few other applications of this unique characterization to derive certain results regarding linearity and complements in the linear lattice.
In this paper, we generalize existing frameworks for $\mathcal{H}_2\otimes\mathcal{L}_2$-optimal model order reduction to a broad class of parametric linear time-invariant systems. To this end, we derive first-order necessary ptimality conditions for a class of structured reduced-order models, and then building on those, propose a stability-preserving optimization-based method for computing locally $\mathcal{H}_2\otimes\mathcal{L}_2$-optimal reduced-order models. We also make a theoretical comparison to existing approaches in the literature, and in numerical experiments, show how our new method, with reasonable computational effort, produces stable optimized reduced-order models with significantly lower approximation errors.
We present an algorithm for computing $\epsilon$-coresets for $(k, \ell)$-median clustering of polygonal curves in $\mathbb{R}^d$ under the Fr\'echet distance. This type of clustering is an adaption of Euclidean $k$-median clustering: we are given a set of $n$ polygonal curves in $\mathbb{R}^d$, each of complexity (number of vertices) at most $m$, and want to compute $k$ median curves such that the sum of distances from the given curves to their closest median curve is minimal. Additionally, we restrict the complexity of the median curves to be at most $\ell$ each, to suppress overfitting, a problem specific for sequential data. Our algorithm has running time linear in $n$, sub-quartic in $m$ and quadratic in $\epsilon^{-1}$. With high probability it returns $\epsilon$-coresets of size quadratic in $\epsilon^{-1}$ and logarithmic in $n$ and $m$. We achieve this result by applying the improved $\epsilon$-coreset framework by Langberg and Feldman to a generalized $k$-median problem over an arbitrary metric space. Later we combine this result with the recent result by Driemel et al. on the VC dimension of metric balls under the Fr\'echet distance. Furthermore, our framework yields $\epsilon$-coresets for any generalized $k$-median problem where the range space induced by the open metric balls of the underlying space has bounded VC dimension, which is of independent interest. Finally, we show that our $\epsilon$-coresets can be used to improve the running time of an existing approximation algorithm for $(1,\ell)$-median clustering.
The list-decodable code has been an active topic in theoretical computer science.There are general results about the list-decodability to the Johnson radius and the list-decoding capacity theorem. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new general simple but strong upper bounds for list-decodable codes in general finite metric spaces based on various covering codes. The general covering code upper bounds can be applied to the case that the volumes of the balls depend on the centers, not only on the radius. Then any good upper bound on the covering radius or the size of covering code imply a good upper bound on the sizes of list-decodable codes. Our results give exponential improvements on the recent generalized Singleton upper bound in STOC 2020 for Hamming metric list-decodable codes, when the code lengths are large. A generalized Singleton upper bound for average-radius list-decodable codes is also given from our general covering code upper bound. Even for the list size $L=1$ case our covering code upper bounds give highly non-trivial upper bounds on the sizes of codes with the given minimum distance. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes. We apply our general covering code upper bounds for list-decodable rank-metric codes, list-decodable subspace codes, list-decodable insertion codes and list-decodable deletion codes. Some new better results about non-list-decodability of rank-metric codes and subspace codes are obtained.
Algorithmic and statistical approaches to congressional redistricting are becoming increasingly valuable tools in courts and redistricting commissions for quantifying gerrymandering in the United States. While there is existing literature covering how various Markov chain Monte Carlo distributions differ in terms of projected electoral outcomes and geometric quantifiers of compactness, there is still work to be done on measuring similarities between different congressional redistricting plans. This paper briefly introduces an intuitive and interpretive measure of similarity, and a corresponding assignment matrix, that corresponds to the percentage of a state's area or population that stays in the same congressional district between two plans. We then show how to calculate this measure in polynomial time and briefly demonstrate some potential use-cases.
Quantile regression is a field with steadily growing importance in statistical modeling. It is a complementary method to linear regression, since computing a range of conditional quantile functions provides a more accurate modelling of the stochastic relationship among variables, especially in the tails. We introduce a non-restrictive and highly flexible nonparametric quantile regression approach based on C- and D-vine copulas. Vine copulas allow for separate modeling of marginal distributions and the dependence structure in the data, and can be expressed through a graph theoretical model given by a sequence of trees. This way we obtain a quantile regression model, that overcomes typical issues of quantile regression such as quantile crossings or collinearity, the need for transformations and interactions of variables. Our approach incorporates a two-step ahead ordering of variables, by maximizing the conditional log-likelihood of the tree sequence, while taking into account the next two tree levels. Further, we show that the nonparametric conditional quantile estimator is consistent. The performance of the proposed methods is evaluated in both low- and high-dimensional settings using simulated and real world data. The results support the superior prediction ability of the proposed models.
There has been a rich development of vector autoregressive (VAR) models for modeling temporally correlated multivariate outcomes. However, the existing VAR literature has largely focused on single subject parametric analysis, with some recent extensions to multi-subject modeling with known subgroups. Motivated by the need for flexible Bayesian methods that can pool information across heterogeneous samples in an unsupervised manner, we develop a novel class of non-parametric Bayesian VAR models based on heterogeneous multi-subject data. In particular, we propose a product of Dirichlet process mixture priors that enables separate clustering at multiple scales, which result in partially overlapping clusters that provide greater flexibility. We develop several variants of the method to cater to varying levels of heterogeneity. We implement an efficient posterior computation scheme and illustrate posterior consistency properties under reasonable assumptions on the true density. Extensive numerical studies show distinct advantages over competing methods in terms of estimating model parameters and identifying the true clustering and sparsity structures. Our analysis of resting state fMRI data from the Human Connectome Project reveals biologically interpretable differences between distinct fluid intelligence groups, and reproducible parameter estimates. In contrast, single-subject VAR analyses followed by permutation testing result in negligible differences, which is biologically implausible.
Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.
Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.
Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.