Given the family $P$ of all nonempty subsets of a set $U$ of alternatives, a choice over $U$ is a function $c \colon \Omega \to P$ such that $\Omega \subseteq P$ and $c(B) \subseteq B$ for all menus $B \in \Omega$. A choice is total if $\Omega = P$, and partial otherwise. In economics, an agent is considered rational whenever her choice behavior satisfies suitable axioms of consistency, which are properties quantified over menus. Here we address the following lifting problem: Given a partial choice satisfying one or more axioms of consistency, is it possible to extend it to a total choice satisfying the same axioms? After characterizing the lifting of some choice properties that are well-known in the economics literature, we study the decidability of the connected satisfiability problem for unquantified formulae of an elementary fragment of set theory, which involves a choice function symbol, the Boolean set operators, the singleton, the equality and inclusion predicates, and the propositional connectives. In two cases we prove that the satisfiability problem is NP-complete, whereas in the remaining cases we obtain NP-completeness under the additional assumption that the number of choice terms is constant.
Let $X = \{X_{u}\}_{u \in U}$ be a real-valued Gaussian process indexed by a set $U$. It can be thought of as an undirected graphical model with every random variable $X_{u}$ serving as a vertex. We characterize this graph in terms of the covariance of $X$ through its reproducing kernel property. Unlike other characterizations in the literature, our characterization does not restrict the index set $U$ to be finite or countable, and hence can be used to model the intrinsic dependence structure of stochastic processes in continuous time/space. Consequently, the said characterization is not (and apparently cannot be) of the inverse-zero type. This poses novel challenges for the problem of recovery of the dependence structure from a sample of independent realizations of $X$, also known as structure estimation. We propose a methodology that circumvents these issues, by targeting the recovery of the underlying graph up to a finite resolution, which can be arbitrarily fine and is limited only by the available sample size. The recovery is shown to be consistent so long as the graph is sufficiently regular in an appropriate sense, and convergence rates are provided. Our methodology is illustrated by simulation and two data analyses.
We introduce a novel framework for implementing error-correction in constrained systems. The main idea of our scheme, called Quantized-Constraint Concatenation (QCC), is to employ a process of embedding the codewords of an error-correcting code in a constrained system as a (noisy, irreversible) quantization process. This is in contrast to traditional methods, such as concatenation and reverse concatenation, where the encoding into the constrained system is reversible. The possible number of channel errors QCC is capable of correcting is linear in the block length $n$, improving upon the $O(\sqrt{n})$ possible with the state-of-the-art known schemes. For a given constrained system, the performance of QCC depends on a new fundamental parameter of the constrained system - its covering radius. Motivated by QCC, we study the covering radius of constrained systems in both combinatorial and probabilistic settings. We reveal an intriguing characterization of the covering radius of a constrained system using ergodic theory. We use this equivalent characterization in order to establish efficiently computable upper bounds on the covering radius.
Display Ads and the generalized assignment problem are two well-studied online packing problems with important applications in ad allocation and other areas. In both problems, ad impressions arrive online and have to be allocated immediately to budget-constrained advertisers. Worst-case algorithms that achieve the ideal competitive ratio are known, but might act overly conservative given the predictable and usually tame nature of real-world input. Given this discrepancy, we develop an algorithm for both problems that incorporate machine-learned predictions and can thus improve the performance beyond the worst-case. Our algorithm is based on the work of Feldman et al. (2009) and similar in nature to Mahdian et al. (2007) who were the first to develop a learning-augmented algorithm for the related, but more structured Ad Words problem. We use a novel analysis to show that our algorithm is able to capitalize on a good prediction, while being robust against poor predictions. We experimentally evaluate our algorithm on synthetic and real-world data on a wide range of predictions. Our algorithm is consistently outperforming the worst-case algorithm without predictions.
With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation may no longer be meaningful. In practice the typical approach is to either "ignore or transform" - ignore the skewness altogether or transform the outcome to obtain a more symmetric distribution, although neither approach is entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate this parameter is limited. In this study we described and compared confounding-adjustment methods to address this gap. The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression and two little-known implementations of g-computation for this problem. Motivated by a cohort investigation in the Longitudinal Study of Australian Children, we conducted a simulation study that found the IPW estimator, weighted quantile regression and g-computation implementations minimised bias when the relevant models were correctly specified, with g-computation additionally minimising the variance. These methods provide appealing alternatives to the common "ignore or transform" approach and multivariable quantile regression, enhancing our capability to obtain meaningful causal effect estimates with skewed outcome data.
We analyze a practical algorithm for sparse PCA on incomplete and noisy data under a general non-random sampling scheme. The algorithm is based on a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that under certain conditions, we can recover the support of the sparse leading eigenvector with high probability by obtaining a unique solution. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the observed entries. The concepts of algebraic connectivity and irregularity are used to describe the structural properties of the observed entries. We empirically justify our theorem with synthetic and real data analysis. We also show that our algorithm outperforms several other sparse PCA approaches especially when the observed entries have good structural properties. As a by-product of our analysis, we provide two theorems to handle a deterministic sampling scheme, which can be applied to other matrix-related problems.
The problem of covering the ground set of two matroids by a minimum number of common independent sets is notoriously hard even in very restricted settings, i.e.\ when the goal is to decide if two common independent sets suffice or not. Nevertheless, as the problem generalizes several long-standing open questions, identifying tractable cases is of particular interest. Strongly base orderable matroids form a class for which a basis-exchange condition that is much stronger than the standard axiom is met. As a result, several problems that are open for arbitrary matroids can be solved for this class. In particular, Davies and McDiarmid showed that if both matroids are strongly base orderable, then the covering number of their intersection coincides with the maximum of their covering numbers. Motivated by their result, we propose relaxations of strongly base orderability in two directions. First we weaken the basis-exchange condition, which leads to the definition of a new, complete class of matroids with distinguished algorithmic properties. Second, we introduce the notion of covering the circuits of a matroid by a graph, and consider the cases when the graph is ought to be 2-regular or a path. We give an extensive list of results explaining how the proposed relaxations compare to existing conjectures and theorems on coverings by common independent sets.
A $0,1$ matrix is said to be regular if all of its rows and columns have the same number of ones. We prove that for infinitely many integers $k$, there exists a square regular $0,1$ matrix with binary rank $k$, such that the Boolean rank of its complement is $k^{\widetilde{\Omega}(\log k)}$. Equivalently, the ones in the matrix can be partitioned into $k$ combinatorial rectangles, whereas the number of rectangles needed for any cover of its zeros is $k^{\widetilde{\Omega}(\log k)}$. This settles, in a strong form, a question of Pullman (Linear Algebra Appl., 1988) and a conjecture of Hefner, Henson, Lundgren, and Maybee (Congr. Numer., 1990). The result can be viewed as a regular analogue of a recent result of Balodis, Ben-David, G\"{o}\"{o}s, Jain, and Kothari (FOCS, 2021), motivated by the clique vs. independent set problem in communication complexity and by the (disproved) Alon-Saks-Seymour conjecture in graph theory. As an application of the produced regular matrices, we obtain regular counterexamples to the Alon-Saks-Seymour conjecture and prove that for infinitely many integers $k$, there exists a regular graph with biclique partition number $k$ and chromatic number $k^{\widetilde{\Omega}(\log k)}$.
The core is a dominant solution concept in economics and game theory. In this context, the following question arises, ``How versatile is this solution concept?'' We note that within game theory, this notion has been used for profit -- equivalently, cost or utility -- sharing only. In this paper, we show a completely different use for it: in an {\em investment management game}, under which an agent needs to allocate her money among investment firms in such a way that {\em in each of exponentially many future scenarios}, sufficient money is available in the ``right'' firms so she can buy an ``optimal investment'' for that scenario. We study a restriction of this game to {\em perfect graphs} and characterize its core. Our characterization is analogous to Shapley and Shubik's characterization of the core of the assignment game. The difference is the following: whereas their characterization follows from {\em total unimodularity}, ours follows from {\em total dual integrality}. The latter is another novelty of our work.
Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under both regular and non-regular statistical models whose loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm in non-regular statistical models. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.