Beame et al. [ITCS 2018 & TALG 2021] introduced and used the Bipartite Independent Set (BIS) and Independent Set (IS) oracle access to an unknown, simple, unweighted and undirected graph and solved the edge estimation problem. The introduction of this oracle set forth a series of works in a short span of time that either solved open questions mentioned by Beame et al. or were generalizations of their work as in Dell and Lapinskas [STOC 2018], Dell, Lapinskas and Meeks [SODA 2020], Bhattacharya et al. [ISAAC 2019 & Theory Comput. Syst. 2021], and Chen et al. [SODA 2020]. Edge estimation using BIS can be done using polylogarithmic queries, while IS queries need sub-linear but more than polylogarithmic queries. Chen et al. improved Beame et al.'s upper bound result for edge estimation using IS and also showed an almost matching lower bound. Beame et al. in their introductory work asked a few open questions out of which one was on estimating structures of higher order than edges, like triangles and cliques, using BIS queries. In this work, we completely resolve the query complexity of estimating triangles using BIS oracle. While doing so, we prove a lower bound for an even stronger query oracle called Edge Emptiness (EE) oracle, recently introduced by Assadi, Chakrabarty and Khanna [ESA 2021] to test graph connectivity.
Bayesian persuasion studies the problem faced by an informed sender who strategically discloses information to influence the behavior of an uninformed receiver. Recently, a growing attention has been devoted to settings where the sender and the receiver interact sequentially, in which the receiver's decision-making problem is usually modeled as a Markov decision process (MDP). However, previous works focused on computing optimal information-revelation policies (a.k.a. signaling schemes) under the restrictive assumption that the receiver acts myopically, selecting actions to maximize the one-step utility and disregarding future rewards. This is justified by the fact that, when the receiver is farsighted and thus considers future rewards, finding an optimal Markovian signaling scheme is NP-hard. In this paper, we show that Markovian signaling schemes do not constitute the "right" class of policies. Indeed, differently from most of the MDPs settings, we prove that Markovian signaling schemes are not optimal, and general history-dependent signaling schemes should be considered. Moreover, we also show that history-dependent signaling schemes circumvent the negative complexity results affecting Markovian signaling schemes. Formally, we design an algorithm that computes an optimal and {\epsilon}-persuasive history-dependent signaling scheme in time polynomial in 1/{\epsilon} and in the instance size. The crucial challenge is that general history-dependent signaling schemes cannot be represented in polynomial space. Nevertheless, we introduce a convenient subclass of history-dependent signaling schemes, called promise-form, which are as powerful as general history-dependent ones and efficiently representable. Intuitively, promise-form signaling schemes compactly encode histories in the form of honest promises on future receiver's rewards.
Complexity classes defined by modifying the acceptance condition of NP computations have been extensively studied. For example, the class UP, which contains decision problems solvable by non-deterministic polynomial-time Turing machines (NPTMs) with at most one accepting path -- equivalently NP problems with at most one solution -- has played a significant role in cryptography, since P=/=UP is equivalent to the existence of one-way functions. In this paper, we define and examine variants of several such classes where the acceptance condition concerns the total number of computation paths of an NPTM, instead of the number of accepting ones. This direction reflects the relationship between the counting classes #P and TotP, which are the classes of functions that count the number of accepting paths and the total number of paths of NPTMs, respectively. The former is the well-studied class of counting versions of NP problems, introduced by Valiant (1979). The latter contains all self-reducible counting problems in #P whose decision version is in P, among them prominent #P-complete problems such as Non-negative Permanent, #PerfMatch, and #Dnf-Sat, thus playing a significant role in the study of approximable counting problems. We show that almost all classes introduced in this work coincide with their '# accepting paths'-definable counterparts. As a result, we present a novel family of complete problems for the classes parity-P, Modkp, SPP, WPP, C=P, and PP that are defined via TotP-complete problems under parsimonious reductions.
We present an implicit-explicit finite volume scheme for two-fluid single-temperature flow in all Mach number regimes which is based on a symmetric hyperbolic thermodynamically compatible description of the fluid flow. The scheme is stable for large time steps controlled by the interface transport and is computational efficient due to a linear implicit character. The latter is achieved by linearizing along constant reference states given by the asymptotic analysis of the single-temperature model. Thus, the use of a stiffly accurate IMEX Runge Kutta time integration and the centered treatment of pressure based quantities provably guarantee the asymptotic preserving property of the scheme for weakly compressible Euler equations with variable volume fraction. The properties of the first and second order scheme are validated by several numerical test cases.
A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in $O(\log n)$ time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of $O(\log n)$, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in $O(\log\log n)$ expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve $O(1)$ expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.
Most algorithms constructing bases of finite-dimensional vector spaces return basis vectors which, apart from orthogonality, do not show any special properties. While every basis is sufficient to define the vector space, not all bases are equally suited to unravel properties of the problem to be solved. In this paper a normal form for bases of finite-dimensional vector spaces is introduced which may prove very useful in the context of understanding the structure of the problem in which the basis appears in a step towards the solution. This normal form may be viewed as a new normal form for matrices of full column rank.
The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. However, traditional serial algorithms often struggle to adapt to large-scale graphs. To address this issue, researchers have explored parallel computing as a solution. The state-of-the-art shortest paths algorithm is the $\Delta$-stepping implementation, which significantly improves the parallelism of Dijkstra's algorithm. We propose a novel shortest paths algorithm achieving higher parallelism and scalability, which requires $O(m)$ and $O(E_{wcc})$ times on the connected and unconnected graphs for SSSP problems, respectively, where $E_{wcc}$ denote the number of edges included in the largest weakly connected component in graph. To evaluate the effectiveness of the novel algorithm, we tested it using real graph inputs from Stanford Network Analysis Platform and SuiteSparse Matrix Collection. Our algorithm outperformed the BFS (Breadth-First Search) and $\Delta$-stepping implementations from Gunrock from Gunrock, achieving a speedup of 1546.994$\times$ and 1432.145$\times$, respectively.
A set $D \subseteq V$ of a graph $G=(V, E)$ is a dominating set of $G$ if every vertex $v\in V\setminus D$ is adjacent to at least one vertex in $D.$ A set $S \subseteq V$ is a co-secure dominating set (CSDS) of a graph $G$ if $S$ is a dominating set of $G$ and for each vertex $u \in S$ there exists a vertex $v \in V\setminus S$ such that $uv \in E$ and $(S\setminus \{u\}) \cup \{v\}$ is a dominating set of $G$. The minimum cardinality of a co-secure dominating set of $G$ is the co-secure domination number and it is denoted by $\gamma_{cs}(G)$. Given a graph $G=(V, E)$, the minimum co-secure dominating set problem (Min Co-secure Dom) is to find a co-secure dominating set of minimum cardinality. In this paper, we strengthen the inapproximability result of Min Co-secure Dom for general graphs by showing that this problem can not be approximated within a factor of $(1- \epsilon)\ln |V|$ for perfect elimination bipartite graphs and star convex bipartite graphs unless P=NP. On the positive side, we show that Min Co-secure Dom can be approximated within a factor of $O(\ln |V|)$ for any graph $G$ with $\delta(G)\geq 2$. For $3$-regular and $4$-regular graphs, we show that Min Co-secure Dom is approximable within a factor of $\dfrac{8}{3}$ and $\dfrac{10}{3}$, respectively. Furthermore, we prove that Min Co-secure Dom is APX-complete for $3$-regular graphs.
Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020.
In recent years, online social networks have been the target of adversaries who seek to introduce discord into societies, to undermine democracies and to destabilize communities. Often the goal is not to favor a certain side of a conflict but to increase disagreement and polarization. To get a mathematical understanding of such attacks, researchers use opinion-formation models from sociology, such as the Friedkin--Johnsen model, and formally study how much discord the adversary can produce when altering the opinions for only a small set of users. In this line of work, it is commonly assumed that the adversary has full knowledge about the network topology and the opinions of all users. However, the latter assumption is often unrealistic in practice, where user opinions are not available or simply difficult to estimate accurately. To address this concern, we raise the following question: Can an attacker sow discord in a social network, even when only the network topology is known? We answer this question affirmatively. We present approximation algorithms for detecting a small set of users who are highly influential for the disagreement and polarization in the network. We show that when the adversary radicalizes these users and if the initial disagreement/polarization in the network is not very high, then our method gives a constant-factor approximation on the setting when the user opinions are known. To find the set of influential users, we provide a novel approximation algorithm for a variant of MaxCut in graphs with positive and negative edge weights. We experimentally evaluate our methods, which have access only to the network topology, and we find that they have similar performance as methods that have access to the network topology and all user opinions. We further present an NP-hardness proof, which was an open question by Chen and Racz [IEEE Trans. Netw. Sci. Eng., 2021].
A novel positive dependence property is introduced, called positive measure inducing (PMI for short), being fulfilled by numerous copula classes, including Gaussian, Fr\'echet, Farlie-Gumbel-Morgenstern and Frank copulas; it is conjectured that even all positive quadrant dependent Archimedean copulas meet this property. From a geometric viewpoint, a PMI copula concentrates more mass near the main diagonal than in the opposite diagonal. A striking feature of PMI copulas is that they impose an ordering on a certain class of copula-induced measures of concordance, the latter originating in Edwards et al. (2004) and including Spearman's rho $\rho$ and Gini's gamma $\gamma$, leading to numerous new inequalities such as $3 \gamma \geq 2 \rho$. The measures of concordance within this class are estimated using (classical) empirical copulas and the intrinsic construction via empirical checkerboard copulas, and the estimators' asymptotic behaviour is determined. Building upon the presented inequalities, asymptotic tests are constructed having the potential of being used for detecting whether the underlying dependence structure of a given sample is PMI, which in turn can be used for excluding certain copula families from model building. The excellent performance of the tests is demonstrated in a simulation study and by means of a real-data example.