Suppose that we have $n$ agents and $n$ items which lie in a shared metric space. We would like to match the agents to items such that the total distance from agents to their matched items is as small as possible. However, instead of having direct access to distances in the metric, we only have each agent's ranking of the items in order of distance. Given this limited information, what is the minimum possible worst-case approximation ratio (known as the distortion) that a matching mechanism can guarantee? Previous work by Caragiannis et al. proved that the (deterministic) Serial Dictatorship mechanism has distortion at most $2^n - 1$. We improve this by providing a simple deterministic mechanism that has distortion $O(n^2)$. We also provide the first nontrivial lower bound on this problem, showing that any matching mechanism (deterministic or randomized) must have worst-case distortion $\Omega(\log n)$. In addition to these new bounds, we show that a large class of truthful mechanisms derived from Deferred Acceptance all have worst-case distortion at least $2^n - 1$, and we find an intriguing connection between thin matchings (analogous to the well-known thin trees conjecture) and the distortion gap between deterministic and randomized mechanisms.
Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.
The triple-differences (TD) design is a popular identification strategy for causal effects in settings where researchers do not believe the parallel trends assumption of conventional difference-in-differences (DiD) is satisfied. TD designs augment the conventional 2x2 DiD with a "placebo" stratum -- observations that are nested in the same units and time periods but are known to be entirely unaffected by the treatment. However, many TD applications go beyond this simple 2x2x2 and use observations on many units in many "placebo" strata across multiple time periods. A popular estimator for this setting is the triple-differences regression (TDR) fixed-effects estimator -- an extension of the common "two-way fixed effects" estimator for DiD. This paper decomposes the TDR estimator into its component two-group/two-period/two-strata triple-differences and illustrates how interpreting this parameter causally in settings with arbitrary staggered adoption requires strong effect homogeneity assumptions as many placebo DiDs incorporate observations under treatment. The decomposition clarifies the implied identifying variation behind the triple-differences regression estimator and suggests researchers should be cautious when implementing these estimators in settings more complex than the 2x2x2 case. Alternative approaches that only incorporate "clean placebos" such as direct imputation of the counterfactual may be more appropriate. The paper concludes by demonstrating the utility of this imputation estimator in an application of the "gravity model" to the estimation of the effect of the WTO/GATT on international trade.
We study the measure of order-competitive ratio introduced by Ezra et al. [2023] for online algorithms in Bayesian combinatorial settings. In our setting, a decision-maker observes a sequence of elements that are associated with stochastic rewards that are drawn from known priors, but revealed one by one in an online fashion. The decision-maker needs to decide upon the arrival of each element whether to select it or discard it (according to some feasibility constraint), and receives the associated rewards of the selected elements. The order-competitive ratio is defined as the worst-case ratio (over all distribution sequences) between the performance of the best order-unaware and order-aware algorithms, and quantifies the loss incurred due to the lack of knowledge of the arrival order. Ezra et al. [2023] showed how to design algorithms that achieve better approximations with respect to the new benchmark (order-competitive ratio) in the single-choice setting, which raises the natural question of whether the same can be achieved in combinatorial settings. In particular, whether it is possible to achieve a constant approximation with respect to the best online algorithm for downward-closed feasibility constraints, whether $\omega(1/n)$-approximation is achievable for general (non-downward-closed) feasibility constraints, or whether a convergence rate to $1$ of $o(1/\sqrt{k})$ is achievable for the multi-unit setting. We show, by devising novel constructions that may be of independent interest, that for all three scenarios, the asymptotic lower bounds with respect to the old benchmark, also hold with respect to the new benchmark.
Ranking and Balance are arguably the two most important algorithms in the online matching literature. They achieve the same optimal competitive ratio of $1-1/e$ for the integral version and fractional version of online bipartite matching by Karp, Vazirani, and Vazirani (STOC 1990) respectively. The two algorithms have been generalized to weighted online bipartite matching problems, including vertex-weighted online bipartite matching and AdWords, by utilizing a perturbation function. The canonical choice of the perturbation function is $f(x)=1-e^{x-1}$ as it leads to the optimal competitive ratio of $1-1/e$ in both settings. We advance the understanding of the weighted generalizations of Ranking and Balance in this paper, with a focus on studying the effect of different perturbation functions. First, we prove that the canonical perturbation function is the \emph{unique} optimal perturbation function for vertex-weighted online bipartite matching. In stark contrast, all perturbation functions achieve the optimal competitive ratio of $1-1/e$ in the unweighted setting. Second, we prove that the generalization of Ranking to AdWords with unknown budgets using the canonical perturbation function is at most $0.624$ competitive, refuting a conjecture of Vazirani (2021). More generally, as an application of the first result, we prove that no perturbation function leads to the prominent competitive ratio of $1-1/e$ by establishing an upper bound of $1-1/e-0.0003$. Finally, we propose the online budget-additive welfare maximization problem that is intermediate between AdWords and AdWords with unknown budgets, and we design an optimal $1-1/e$ competitive algorithm by generalizing Balance.
In 1982 Papadimitriou and Yannakakis introduced the Exact Matching problem, in which given a red and blue edge-colored graph $G$ and an integer $k$ one has to decide whether there exists a perfect matching in $G$ with exactly $k$ red edges. Even though a randomized polynomial-time algorithm for this problem was quickly found a few years later, it is still unknown today whether a deterministic polynomial-time algorithm exists. This makes the Exact Matching problem an important candidate to test the RP=P hypothesis. In this paper we focus on approximating Exact Matching. While there exists a simple algorithm that computes in deterministic polynomial-time an almost perfect matching with exactly $k$ red edges, not a lot of work focuses on computing perfect matchings with almost $k$ red edges. In fact such an algorithm for bipartite graphs running in deterministic polynomial-time was published only recently (STACS'23). It outputs a perfect matching with $k'$ red edges with the guarantee that $0.5k \leq k' \leq 1.5k$. In the present paper we aim at approximating the number of red edges without exceeding the limit of $k$ red edges. We construct a deterministic polynomial-time algorithm, which on bipartite graphs computes a perfect matching with $k'$ red edges such that $k/3 \leq k' \leq k$.
The goal of this work is to study waves interacting with partially immersed objects allowed to move freely in the vertical direction, and in a regime in which the propagation of the waves is described by the one dimensional Boussinesq-Abbott system. The problem can be reduced to a transmission problem for this Boussinesq system, in which the transmission conditions between the components of the domain at the left and at the right of the object are determined through the resolution of coupled forced ODEs in time satisfied by the vertical displacement of the object and the average discharge in the portion of the fluid located under the object. We propose a new extended formulation in which these ODEs are complemented by two other forced ODEs satisfied by the trace of the surface elevation at the contact points. The interest of this new extended formulation is that the forcing terms are easy to compute numerically and that the surface elevation at the contact points is furnished for free. Based on this formulation, we propose a second order scheme that involves a generalization of the MacCormack scheme with nonlocal flux and a source term, which is coupled to a second order Heun scheme for the ODEs. In order to validate this scheme, several explicit solutions for this wave-structure interaction problem are derived and can serve as benchmark for future codes. As a byproduct, our method provides a second order scheme for the generation of waves at the entrance of the numerical domain for the Boussinesq-Abbott system.
A binary code of blocklength $n$ and codebook size $M$ is called an $(n,M)$ code, which is studied for memoryless binary symmetric channels (BSCs) with the maximum likelihood (ML) decoding. For any $n \geq 2$, some optimal codes among the linear $(n,4)$ codes have been explicitly characterized in the previous study, but whether the optimal codes among the linear codes are better than all the nonlinear codes or not is unknown. In this paper, we first show that for any $n\geq 2$, there exists an optimal code (among all the $(n,4)$ codes) that is either linear or in a subset of nonlinear codes, called Class-I codes. We identified all the optimal codes among the linear $(n,4)$ codes for each blocklength $n\geq 2$, and found ones that were not given in literature. For any $n$ from $2$ to $300$, all the optimal $(n,4)$ codes are identified, where except for $n=3$, all the optimal $(n,4)$ codes are equivalent to linear codes. There exist optimal $(3,4)$ codes that are not equivalent to linear codes. Furthermore, we derive a subset of nonlinear codes called Class-II codes and justify that for any $n >300$, the set composed of linear, Class-I and Class-II codes and their equivalent codes contains all the optimal $(n,4)$ codes. Both Class-I and Class-II codes are close to linear codes in the sense that they involve only one type of columns that are not included in linear codes. Our results are obtained using a new technique to compare the ML decoding performance of two codes, featured by a partition of the entire range of the channel output.
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column indices to be consecutive. In the detection problem, under the null hypothesis, the observed matrix is a realization of independent and identically distributed standard normal entries. Under the alternative, there exists a set of hidden submatrices with elevated means inside the same standard normal matrix. Recovery refers to the task of locating the hidden submatrices. For both problems, and for both models, we characterize the statistical and computational barriers by deriving information-theoretic lower bounds, designing and analyzing algorithms matching those bounds, and proving computational lower bounds based on the low-degree polynomials conjecture. In particular, we show that the space of the model parameters (i.e., number of planted submatrices, their dimensions, and elevated mean) can be partitioned into three regions: the impossible regime, where all algorithms fail; the hard regime, where while detection or recovery are statistically possible, we give some evidence that polynomial-time algorithm do not exist; and finally the easy regime, where polynomial-time algorithms exist.
Entangled states shared among distant nodes are frequently used in quantum network applications. When quantum resources are abundant, entangled states can be continuously distributed across the network, allowing nodes to consume them whenever necessary. This continuous distribution of entanglement enables quantum network applications to operate continuously while being regularly supplied with entangled states. Here, we focus on the steady-state performance analysis of protocols for continuous distribution of entanglement. We propose the virtual neighborhood size and the virtual node degree as performance metrics. We utilize the concept of Pareto optimality to formulate a multi-objective optimization problem to maximize the performance. As an example, we solve the problem for a quantum network with a tree topology. One of the main conclusions from our analysis is that the entanglement consumption rate has a greater impact on the protocol performance than the fidelity requirements. The metrics that we establish in this manuscript can be utilized to assess the feasibility of entanglement distribution protocols for large-scale quantum networks.
In the domain generalization literature, a common objective is to learn representations independent of the domain after conditioning on the class label. We show that this objective is not sufficient: there exist counter-examples where a model fails to generalize to unseen domains even after satisfying class-conditional domain invariance. We formalize this observation through a structural causal model and show the importance of modeling within-class variations for generalization. Specifically, classes contain objects that characterize specific causal features, and domains can be interpreted as interventions on these objects that change non-causal features. We highlight an alternative condition: inputs across domains should have the same representation if they are derived from the same object. Based on this objective, we propose matching-based algorithms when base objects are observed (e.g., through data augmentation) and approximate the objective when objects are not observed (MatchDG). Our simple matching-based algorithms are competitive to prior work on out-of-domain accuracy for rotated MNIST, Fashion-MNIST, PACS, and Chest-Xray datasets. Our method MatchDG also recovers ground-truth object matches: on MNIST and Fashion-MNIST, top-10 matches from MatchDG have over 50% overlap with ground-truth matches.