We consider a general difference-in-differences model in which the treatment variable of interest may be non-binary and its value may change in each period. It is generally difficult to estimate treatment parameters defined with the potential outcome given the entire path of treatment adoption, because each treatment path may be experienced by only a small number of observations. We propose an alternative approach using the concept of effective treatment, which summarizes the treatment path into an empirically tractable low-dimensional variable, and develop doubly robust identification, estimation, and inference methods. We also provide a companion R software package.
We consider hypergraph network design problems where the goal is to construct a hypergraph satisfying certain properties. In graph network design problems, the number of edges in an arbitrary solution is at most the square of the number of vertices. In contrast, in hypergraph network design problems, the number of hyperedges in an arbitrary solution could be exponential in the number of vertices and hence, additional care is necessary to design polynomial-time algorithms. The central theme of this work is to show that certain hypergraph network design problems admit solutions with polynomial number of hyperedges and moreover, can be solved in strongly polynomial time. Our work improves on the previous fastest pseudo-polynomial run-time for these problems. In addition, we develop algorithms that return (near-)uniform hypergraphs as solutions. The hypergraph network design problems that we focus upon are splitting-off operation in hypergraphs, connectivity augmentation using hyperedges, and covering skew-supermodular functions using hyperedges. Our definition of the splitting-off operation in hypergraphs and our proof showing the existence of the operation using a strongly polynomial-time algorithm to compute it are likely to be of independent graph-theoretical interest.
This paper establishes the equivalence between Local Differential Privacy (LDP) and a global limit on learning any knowledge about an object. However, an output from an LDP query is not necessarily required to provide exact amount of knowledge equal to the upper bound of the learning limit. Since the amount of knowledge gain should be proportional to the incurred privacy loss, the traditional approach of using DP guarantee to measure privacy loss can occasionally overestimate the actual privacy loss. This is especially problematic in privacy accounting in LDP, where privacy loss is computed by summing the DP guarantees (basic composition). To address this issue, this paper introduces the concept of realized privacy loss, which measures the actual knowledge gained by the analyst after a query, as a more accurate measure of privacy loss. The realized privacy loss is then integrated into the privacy accounting of fully adaptive composition, where an adversary adaptively selects queries based on previous results. The Bayesian Privacy Filter is implemented to ensure that the realized privacy loss of the composed queries eventually reaches the DP guarantee, allowing the full utilization of the privacy budget assigned to a queried object. Furthermore, this paper introduces the Bayesian Privacy Odometer to measure realized privacy loss in fully adaptive composition. Experimental evaluations are conducted to assess the efficiency of the Bayesian Privacy Filter, demonstrating that the corresponding composition can accept arbitrarily more queries than the basic composition when the composed queries have sufficiently small DP guarantees. Conversely, this paper concludes, through experiments, that when estimating the histogram of a group of objects with the same privacy budget, an analyst should prefer using a single randomized response over a composition managed by the Bayesian Privacy Filter.
With the increasing availability of datasets, developing data fusion methods to leverage the strengths of different datasets to draw causal effects is of great practical importance to many scientific fields. In this paper, we consider estimating the quantile treatment effects using small validation data with fully-observed confounders and large auxiliary data with unmeasured confounders. We propose a Fused Quantile Treatment effects Estimator (FQTE) by integrating the information from two datasets based on doubly robust estimating functions. We allow for the misspecification of the models on the dataset with unmeasured confounders. Under mild conditions, we show that the proposed FQTE is asymptotically normal and more efficient than the initial QTE estimator using the validation data solely. By establishing the asymptotic linear forms of related estimators, convenient methods for covariance estimation are provided. Simulation studies demonstrate the empirical validity and improved efficiency of our fused estimators. We illustrate the proposed method with an application.
The criticality problem in nuclear engineering asks for the principal eigen-pair of a Boltzmann operator describing neutron transport in a reactor core. Being able to reliably design, and control such reactors requires assessing these quantities within quantifiable accuracy tolerances. In this paper we propose a paradigm that deviates from the common practice of approximately solving the corresponding spectral problem with a fixed, presumably sufficiently fine discretization. Instead, the present approach is based on first contriving iterative schemes, formulated in function space, that are shown to converge at a quantitative rate without assuming any a priori excess regularity properties, and that exploit only properties of the optical parameters in the underlying radiative transfer model. We develop the analytical and numerical tools for approximately realizing each iteration step withing judiciously chosen accuracy tolerances, verified by a posteriori estimates, so as to still warrant quantifiable convergence to the exact eigen-pair. This is carried out in full first for a Newton scheme. Since this is only locally convergent we analyze in addition the convergence of a power iteration in function space to produce sufficiently accurate initial guesses. Here we have to deal with intrinsic difficulties posed by compact but unsymmetric operators preventing standard arguments used in the finite dimensional case. Our main point is that we can avoid any condition on an initial guess to be already in a small neighborhood of the exact solution. We close with a discussion of remaining intrinsic obstructions to a certifiable numerical implementation, mainly related to not knowing the gap between the principal eigenvalue and the next smaller one in modulus.
Difference-in-differences is without a doubt the most widely used method for evaluating the causal effect of a hypothetical intervention in the possible presence of confounding bias due to hidden factors. The approach is typically used when both pre- and post-exposure outcome measurements are available, and one can reasonably assume that the additive association of the unobserved confounder with the outcome is equal in the two exposure arms, and constant over time; a so-called parallel trends assumption. The parallel trends assumption may not be credible in many practical settings, including if the outcome is binary, a count, or polytomous, and more generally, when the unmeasured confounder exhibits non-additive effects on the distribution of the outcome, even if such effects are constant over time. We introduce an alternative approach that replaces the parallel trends assumption with an odds ratio equi-confounding assumption, which states that confounding bias for the causal effect of interest, encoded by an association between treatment and the potential outcome under no-treatment can be identified with a well-specified generalized linear model relating the pre-exposure outcome and the exposure. As the proposed method identifies any causal effect that is conceivably identified in the absence of confounding bias, including nonlinear effects such as quantile treatment effects, the approach is aptly called Universal Difference-in-differences (UDiD). Both fully parametric and more robust semiparametric UDiD estimators are described and illustrated in a real-world application concerning the causal effects of a Zika virus outbreak on birth rate in Brazil.
We consider a persuasion problem between a sender and a receiver whose utility may be nonlinear in her belief; we call such receivers risk-conscious. Such utility models arise when the receiver exhibits systematic biases away from expected-utility-maximization, such as uncertainty aversion (e.g., from sensitivity to the variance of the waiting time for a service). Due to this nonlinearity, the standard approach to finding the optimal persuasion mechanism using revelation principle fails. To overcome this difficulty, we use the underlying geometry of the problem to develop a convex optimization framework to find the optimal persuasion mechanism. We define the notion of full persuasion and use our framework to characterize conditions under which full persuasion can be achieved. We use our approach to study binary persuasion, where the receiver has two actions and the sender strictly prefers one of them at every state. Under a convexity assumption, we show that the binary persuasion problem reduces to a linear program, and establish a canonical set of signals where each signal either reveals the state or induces in the receiver uncertainty between two states. Finally, we discuss the broader applicability of our methods to more general contexts, and illustrate our methodology by studying information sharing of waiting times in service systems.
Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
Superdirective array may achieve an array gain proportional to the square of the number of antennas $M^2$. In the early studies of superdirectivity, little research has been done from wireless communication point of view. To leverage superdirectivity for enhancing the spectral efficiency, this paper investigates multi-user communication systems with superdirective arrays. We first propose a field-coupling-aware (FCA) multi-user channel estimation method, which takes into account the antenna coupling effects. Aiming to maximize the power gain of the target user, we propose multi-user multipath superdirective precoding (SP) as an extension of our prior work on coupling-based superdirective beamforming. Furthermore, to reduce the inter-user interference, we propose interference-nulling superdirective precoding (INSP) as the optimal solution to maximize user power gains while eliminating interference. Then, by taking the ohmic loss into consideration, we further propose a regularized interference-nulling superdirective precoding (RINSP) method. Finally, we discuss the well-known narrow directivity bandwidth issue, and find that it is not a fundamental problem of superdirective arrays in multi-carrier communication systems. Simulation results show our proposed methods outperform the state-of-the-art methods significantly. Interestingly, in the multi-user scenario, an 18-antenna superdirective array can achieve up to a 9-fold increase of spectral efficiency compared to traditional multiple-input multiple-output (MIMO), while simultaneously reducing the array aperture by half.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.