In this paper, we study the non-monotone adaptive submodular maximization problem subject to a knapsack and a $k$-system constraints. The input of our problem is a set of items, where each item has a particular state drawn from a known prior distribution. However, the state of an item is initially unknown, one must select an item in order to reveal the state of that item. There is a utility function which is defined over items and states. Our objective is to sequentially select a group of items to maximize the expected utility. Although the cardinality-constrained non-monotone adaptive submodular maximization has been well studied in the literature, whether there exists a constant approximation solution for the knapsack-constrained or $k$-system constrained adaptive submodular maximization problem remains an open problem. It fact, it has only been settled given the additional assumption of pointwise submodularity. In this paper, we remove the common assumption on pointwise submodularity and propose the first constant approximation solutions for both cases. Inspired by two recent studies on non-monotone adaptive submodular maximization, we develop a sampling-based randomized algorithm that achieves a $\frac{1}{10}$ approximation for the case of a knapsack constraint and that achieves a $\frac{1}{2k+4}$ approximation ratio for the case of a $k$-system constraint.
Optimization problems with set submodular objective functions have many real-world applications. In discrete scenarios, where the same item can be selected more than once, the domain is generalized from a 2-element set to a bounded integer lattice. In this work, we consider the problem of maximizing a monotone submodular function on the bounded integer lattice subject to a cardinality constraint. In particular, we focus on maximizing DR-submodular functions, i.e., functions defined on the integer lattice that exhibit the diminishing returns property. Given any epsilon > 0, we present a randomized algorithm with probabilistic guarantees of O(1 - 1/e - epsilon) approximation, using a framework inspired by a Stochastic Greedy algorithm developed for set submodular functions by Mirzasoleiman et al. We then show that, on synthetic DR-submodular functions, applying our proposed algorithm on the integer lattice is faster than the alternatives, including reducing a target problem to the set domain and then applying the fastest known set submodular maximization algorithm.
A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them -- this objective is broadly known as domain generalization. A common belief is that ERM can interpolate but not extrapolate and that the latter is considerably more difficult, but these claims are vague and lack formal justification. In this work, we recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test distributions. Under an existing notion of inter- and extrapolation based on reweighting of sub-group likelihoods, we rigorously demonstrate that extrapolation is computationally much harder than interpolation, though their statistical complexity is not significantly different. Furthermore, we show that ERM -- or a noisy variant -- is provably minimax-optimal for both tasks. Our framework presents a new avenue for the formal analysis of domain generalization algorithms which may be of independent interest.
In this paper, we demonstrate a formulation for optimizing coupled submodular maximization problems with provable sub-optimality bounds. In robotics applications, it is quite common that optimization problems are coupled with one another and therefore cannot be solved independently. Specifically, we consider two problems coupled if the outcome of the first problem affects the solution of a second problem that operates over a longer time scale. For example, in our motivating problem of environmental monitoring, we posit that multi-robot task allocation will potentially impact environmental dynamics and thus influence the quality of future monitoring, here modeled as a multi-robot intermittent deployment problem. The general theoretical approach for solving this type of coupled problem is demonstrated through this motivating example. Specifically, we propose a method for solving coupled problems modeled by submodular set functions with matroid constraints. A greedy algorithm for solving this class of problem is presented, along with sub-optimality guarantees. Finally, practical optimality ratios are shown through Monte Carlo simulations to demonstrate that the proposed algorithm can generate near-optimal solutions with high efficiency.
In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the prospects for improved fairness when members of a less privileged group are treated by a risk algorithm as if they are members of a more privileged group. We combine a machine learning classifier trained in a novel manner with an optimal transport adjustment for the relevant joint probability distributions, which together provide a constructive response to claims of bias-in-bias-out. A key distinction is between fairness claims that are empirically testable and fairness claims that are not. We then use confusion tables and conformal prediction sets to evaluate achieved fairness for projected risk. Our data are a random sample of 300,000 offenders at their arraignments for a large metropolitan area in the United States during which decisions to release or detain are made. We show that substantial improvement in fairness can be achieved consistent with a Pareto improvement for protected groups.
Spanners have been shown to be a powerful tool in graph algorithms. Many spanner constructions use a certain type of clustering at their core, where each cluster has small diameter and there are relatively few spanner edges between clusters. In this paper, we provide a clustering algorithm that, given $k\geq 2$, can be used to compute a spanner of stretch $2k-1$ and expected size $O(n^{1+1/k})$ in $k$ rounds in the CONGEST model. This improves upon the state of the art (by Elkin, and Neiman [TALG'19]) by making the bounds on both running time and stretch independent of the random choices of the algorithm, whereas they only hold with high probability in previous results. Spanners are used in certain synchronizers, thus our improvement directly carries over to such synchronizers. Furthermore, for keeping the \emph{total} number of inter-cluster edges small in low diameter decompositions, our clustering algorithm provides the following guarantees. Given $\beta\in (0,1]$, we compute a low diameter decomposition with diameter bound $O\left(\frac{\log n}{\beta}\right)$ such that each edge $e\in E$ is an inter-cluster edge with probability at most $\beta\cdot w(e)$ in $O\left(\frac{\log n}{\beta}\right)$ rounds in the CONGEST model. Again, this improves upon the state of the art (by Miller, Peng, and Xu [SPAA'13]) by making the bounds on both running time and diameter independent of the random choices of the algorithm, whereas they only hold with high probability in previous results.
We present a $380$-approximation algorithm for the Nash Social Welfare problem with submodular valuations. Our algorithm builds on and extends a recent constant-factor approximation for Rado valuations.
In this work, we study the multi-agent decision problem where agents try to coordinate to optimize a given system-level objective. While solving for the global optimal is intractable in many cases, the greedy algorithm is a well-studied and efficient way to provide good approximate solutions - notably for submodular optimization problems. Executing the greedy algorithm requires the agents to be ordered and execute a local optimization based on the solutions of the previous agents. However, in limited information settings, passing the solution from the previous agents may be nontrivial, as some agents may not be able to directly communicate with each other. Thus the communication time required to execute the greedy algorithm is closely tied to the order that the agents are given. In this work, we characterize interplay between the communication complexity and agent orderings by showing that the complexity using the best ordering is O(n) and increases considerably to O(n^2) when using the worst ordering. Motivated by this, we also propose an algorithm that can find an ordering and execute the greedy algorithm quickly, in a distributed fashion. We also show that such an execution of the greedy algorithm is advantageous over current methods for distributed submodular maximization.
Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.
Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from expectations). Probabilistic extensions of OE (Lai and Hockenmaier, 2017) have provided the ability to somewhat calibrate these denotational probabilities while retaining the consistency and inductive bias of ordered models, but lack the ability to model the negative correlations found in real-world knowledge. In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anticorrelation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty. We show improvements over previous approaches in modeling the Flickr and WordNet entailment graphs, and investigate the power of the model.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.