We study the problem of allocating a set $M$ of $m$ ${indivisible}$ items among $n$ agents in a fair manner. We consider two well-studied notions of fairness: envy-freeness (EF), and envy-freeness up to any good (EFX). While it is known that complete EF allocations do not always exist, it is not known if complete EFX allocations exist besides a few cases. In this work, we reformulate the problem to allow $M$ to be a multiset. Specifically, we introduce a parameter $t$ for the number of distinct ${types}$ of items, and study allocations of multisets that contain items of these $t$ types. We show the following: 1. For arbitrary $n$, $t$, a complete EF allocation exists when agents have distinct additive valuations, and there are ${enough}$ items of each type. 2. For arbitrary $n$, $m$, $t$, a complete EFX allocation exists when agents have additive valuations with identical ${preferences}$. 3. For arbitrary $n$, $m$, and $t\le2$, a complete EFX allocation exists when agents have additive valuations. For 2 and 3, our approach is constructive; we give a polynomial-time algorithm to find a complete EFX allocation.
The widespread dependency on open-source software makes it a fruitful target for malicious actors, as demonstrated by recurring attacks. The complexity of today's open-source supply chains results in a significant attack surface, giving attackers numerous opportunities to reach the goal of injecting malicious code into open-source artifacts that is then downloaded and executed by victims. This work proposes a general taxonomy for attacks on open-source supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution. Taking the form of an attack tree, it covers 107 unique vectors, linked to 94 real-world incidents, and mapped to 33 mitigating safeguards. User surveys conducted with 17 domain experts and 134 software developers positively validated the correctness, comprehensiveness and comprehensibility of the taxonomy, as well as its suitability for various use-cases. Survey participants also assessed the utility and costs of the identified safeguards, and whether they are used.
The naive importance sampling (IS) estimator generally does not work well in examples involving simultaneous inference on several targets, as the importance weights can take arbitrarily large values, making the estimator highly unstable. In such situations, alternative multiple IS estimators involving samples from multiple proposal distributions are preferred. Just like the naive IS, the success of these multiple IS estimators crucially depends on the choice of the proposal distributions. The selection of these proposal distributions is the focus of this article. We propose three methods: (i) a geometric space filling approach, (ii) a minimax variance approach, and (iii) a maximum entropy approach. The first two methods are applicable to any IS estimator, whereas the third approach is described in the context of Doss's (2010) two-stage IS estimator. For the first method, we propose a suitable measure of 'closeness' based on the symmetric Kullback-Leibler divergence, while the second and third approaches use estimates of asymptotic variances of Doss's (2010) IS estimator and Geyer's (1994) reverse logistic regression estimator, respectively. Thus, when samples from the proposal distributions are obtained by running Markov chains, we provide consistent spectral variance estimators for these asymptotic variances. The proposed methods for selecting proposal densities are illustrated using various detailed examples.
A natural way of increasing our understanding of NP-complete graph problems is to restrict the input to a special graph class. Classes of $H$-free graphs, that is, graphs that do not contain some graph $H$ as an induced subgraph, have proven to be an ideal testbed for such a complexity study. However, if the forbidden graph $H$ contains a cycle or claw, then these problems often stay NP-complete. A recent complexity study on the $k$-Colouring problem shows that we may still obtain tractable results if we also bound the diameter of the $H$-free input graph. We continue this line of research by initiating a complexity study on the impact of bounding the diameter for a variety of classical vertex partitioning problems restricted to $H$-free graphs. We prove that bounding the diameter does not help for Independent Set, but leads to new tractable cases for problems closely related to 3-Colouring. That is, we show that Near-Bipartiteness, Independent Feedback Vertex Set, Independent Odd Cycle Transversal, Acyclic 3-Colouring and Star 3-Colouring are all polynomial-time solvable for chair-free graphs of bounded diameter. To obtain these results we exploit a new structural property of 3-colourable chair-free graphs.
In the interdependent values (IDV) model introduced by Milgrom and Weber [1982], agents have private signals that capture their information about different social alternatives, and the valuation of every agent is a function of all agent signals. While interdependence has been mainly studied for auctions, it is extremely relevant for a large variety of social choice settings, including the canonical setting of public projects. The IDV model is very challenging relative to standard independent private values, and welfare guarantees have been achieved through two alternative conditions known as {\em single-crossing} and {\em submodularity over signals (SOS)}. In either case, the existing theory falls short of solving the public projects setting. Our contribution is twofold: (i) We give a workable characterization of truthfulness for IDV public projects for the largest class of valuations for which such a characterization exists, and term this class \emph{decomposable valuations}; (ii) We provide possibility and impossibility results for welfare approximation in public projects with SOS valuations. Our main impossibility result is that, in contrast to auctions, no universally truthful mechanism performs better for public projects with SOS valuations than choosing a project at random. Our main positive result applies to {\em excludable} public projects with SOS, for which we establish a constant factor approximation similar to auctions. Our results suggest that exclusion may be a key tool for achieving welfare guarantees in the IDV model.
We study the performance of a phase-noise impaired double reconfigurable intelligent surface (RIS)-aided multiuser (MU) multiple-input single-output (MISO) system under spatial correlation at both RISs and base-station (BS). The downlink achievable rate is derived in closed-form under maximum ratio transmission (MRT) precoding. In addition, we obtain the optimal phase-shift design at both RISs in closed-form for the considered channel and phase-noise models. Numerical results validate the analytical expressions, and highlight the effects of different system parameters on the achievable rate. Our analysis shows that phase-noise can severely degrade the performance when users do not have direct links to both RISs, and can only be served via the double-reflection link. Also, we show that high spatial correlation at RISs is essential for high achievable rates.
Requirements engineering (RE) activities for Machine Learning (ML) are not well-established and researched in the literature. Many issues and challenges exist when specifying, designing, and developing ML-enabled systems. Adding more focus on RE for ML can help to develop more reliable ML-enabled systems. Based on insights collected from previous work and industrial experiences, we propose a catalogue of 45 concerns to be considered when specifying ML-enabled systems, covering five different perspectives we identified as relevant for such systems: objectives, user experience, infrastructure, model, and data. Examples of such concerns include the execution engine and telemetry for the infrastructure perspective, and explainability and reproducibility for the model perspective. We conducted a focus group session with eight software professionals with experience developing ML-enabled systems to validate the importance, quality and feasibility of using our catalogue. The feedback allowed us to improve the catalogue and confirmed its practical relevance. The main research contribution of this work consists in providing a validated set of concerns grouped into perspectives that can be used by requirements engineers to support the specification of ML-enabled systems.
Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result. Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.
The problem of scheduling unrelated machines has been studied since the inception of algorithmic mechanism design~\cite{NR99}. It is a resource allocation problem that entails assigning $m$ tasks to $n$ machines for execution. Machines are regarded as strategic agents who may lie about their execution costs so as to minimize their allocated workload. To address the situation when monetary payment is not an option to compensate the machines' costs, \citeauthor{DBLP:journals/mst/Koutsoupias14} [2014] devised two \textit{truthful} mechanisms, K and P respectively, that achieve an approximation ratio of $\frac{n+1}{2}$ and $n$, for social cost minimization. In addition, no truthful mechanism can achieve an approximation ratio better than $\frac{n+1}{2}$. Hence, mechanism K is optimal. While approximation ratio provides a strong worst-case guarantee, it also limits us to a comprehensive understanding of mechanism performance on various inputs. This paper investigates these two scheduling mechanisms beyond the worst case. We first show that mechanism K achieves a smaller social cost than mechanism P on every input. That is, mechanism K is pointwise better than mechanism P. Next, for each task $j$, when machines' execution costs $t_i^j$ are independent and identically drawn from a task-specific distribution $F^j(t)$, we show that the average-case approximation ratio of mechanism K converges to a constant. This bound is tight for mechanism K. For a better understanding of this distribution dependent constant, on the one hand, we estimate its value by plugging in a few common distributions; on the other, we show that this converging bound improves a known bound \cite{DBLP:conf/aaai/Zhang18} which only captures the single-task setting. Last, we find that the average-case approximation ratio of mechanism P converges to the same constant.
In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. The methods that can incorporate such restrictions can improve model interpretability and prediction accuracy. Penalized regression can integrate selection rules by assigning the coefficients to different groups and then applying penalties to the groups. However, no general framework has been proposed to formalize selection rules and their applications. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We show that all selection rules can be represented as a combination of operations on constructs, which can be used to identify the related selection dictionary. One may then apply some criteria to select the best model. We show that the theoretical framework can help to identify the grouping structure in existing penalized regression methods. In addition, we formulate structured variable selection into mixed-integer optimization problems which can be solved by existing software. Finally, we discuss the significance of the framework in the context of statistics.
This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{//cse.lab.imtlucca.it/~bemporad/ideal}.