We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph structure without considering the characteristics of the function. To address these limitations, we utilize Bayesian Optimization (BO), a sample-efficient black-box solver, and propose a novel framework for combinatorial optimization on graphs. More specifically, we map each $k$-node subset in the original graph to a node in a new combinatorial graph and adopt a local modeling approach to efficiently traverse the latter graph by progressively sampling its subgraphs using a recursive algorithm. Extensive experiments under both synthetic and real-world setups demonstrate the effectiveness of the proposed BO framework on various types of graphs and optimization tasks, where its behavior is analyzed in detail with ablation studies.
Arbitrary varying channels (AVC) are used to model communication settings in which a channel state may vary arbitrarily over time. Their primary objective is to circumvent statistical assumptions on channel variation. Traditional studies on AVCs optimize rate subject to the worst-case state sequence. While this approach is resilient to channel variations, it may result in low rates for state sequences that are associated with relatively good channels. This paper addresses the analysis of AVCs through the lens of competitive analysis, where solution quality is measured with respect to the optimal solution had the state sequence been known in advance. Our main result demonstrates that codes constructed by a single input distribution do not achieve optimal competitive performance over AVCs. This stands in contrast to the single-letter capacity formulae for AVCs, and it indicates, in our setting, that even though the encoder cannot predict the subsequent channel states, it benefits from varying its input distribution as time proceeds.
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.
Given a graph $G = (V, E)$ and a model of information flow on that network, a fundamental question is to understand if all the nodes have sufficient access to information generated at other nodes in the graph. If not, we can ask if a small set of edge additions improve information access. Formally, the broadcast value of a network is defined to be the minimum over pairs $u,v \in V$ of the probability that an information cascade starting at $u$ reaches $v$. Recent work in the algorithmic fairness literature has focused on heuristics for adding a few edges to a graph to improve its broadcast. Our goal is to formally study the approximability of the Broadcast Improvement problem: given $G$ and a parameter $k$, find the best set of $k$ edges to add to $G$ in order to maximize the broadcast value of the resulting graph. We develop efficient bicriteria approximation algorithms. If the optimal solution adds $k$ edges and achieves a broadcast of $\beta^*$, we develop algorithms that can (a) add $2k-1$ edges and achieve a broadcast value roughly $(\beta^*)^4$, or (b) add $O(k\log n)$ edges and achieve a broadcast roughly $\beta^*$. We also provide other trade-offs, that can be better depending on $k$ and the parameter associated with propagation in the cascade model. We complement our results by proving that unless P = NP, any algorithm that adds $O(k)$ edges must lose significantly in the approximation of $\beta^*$, resolving an open question. Our techniques are inspired by connections between Broadcast Improvement and problems such as Metric $k$-Center and Diameter Reduction. However, since the objective involves information cascades, we need to develop novel probabilistic tools to reason about the existence of paths in edge-sampled graphs. Finally, we show that our techniques extend to a single-source variant, for which we show both bicriteria algorithms and inapproximability results.
To efficiently find an optimum parameter combination in a large-scale problem, it is a key to convert the parameters into available variables in actual machines. Specifically, quadratic unconstrained binary optimization problems are solved with the help of machine learning, e.g., factorization machines with annealing, which convert a raw parameter to binary variables. This work investigates the dependence of the convergence speed and the accuracy on binary labeling method, which can influence the cost function shape and thus the probability of being captured at a local minimum solution. By exemplifying traveling salesman problem, we propose and evaluate Gray labeling, which correlates the Hamming distance in binary labels with the traveling distance. Through numerical simulation of traveling salesman problem up to 15 cities at a limited number of iterations, the Gray labeling shows less local minima percentages and shorter traveling distances compared with natural labeling.
Graph diffusion, which iteratively propagates real-valued substances among the graph, is used in numerous graph/network-involved applications. However, releasing diffusion vectors may reveal sensitive linking information in the data such as transaction information in financial network data. However, protecting the privacy of graph data is challenging due to its interconnected nature. This work proposes a novel graph diffusion framework with edge-level differential privacy guarantees by using noisy diffusion iterates. The algorithm injects Laplace noise per diffusion iteration and adopts a degree-based thresholding function to mitigate the high sensitivity induced by low-degree nodes. Our privacy loss analysis is based on Privacy Amplification by Iteration (PABI), which to our best knowledge, is the first effort that analyzes PABI with Laplace noise and provides relevant applications. We also introduce a novel Infinity-Wasserstein distance tracking method, which tightens the analysis of privacy leakage and makes PABI more applicable in practice. We evaluate this framework by applying it to Personalized Pagerank computation for ranking tasks. Experiments on real-world network data demonstrate the superiority of our method under stringent privacy conditions.
The problem of computing vertex and edge connectivity of a graph are classical problems in algorithmic graph theory. The focus of this paper is on computing these parameters on embedded graphs. A typical example of an embedded graph is a planar graph which can be drawn with no edge crossings. It has long been known that vertex and edge connectivity of planar embedded graphs can be computed in linear time. Very recently, Biedl and Murali extended the techniques from planar graphs to 1-plane graphs without $\times$-crossings, i.e., crossings whose endpoints induce a matching. While the tools used were novel, they were highly tailored to 1-plane graphs, and do not provide much leeway for further extension. In this paper, we develop alternate techniques that are simpler, have wider applications to near-planar graphs, and can be used to test both vertex and edge connectivity. Our technique works for all those embedded graphs where any pair of crossing edges are connected by a path that, roughly speaking, can be covered with few cells of the drawing. Important examples of such graphs include optimal 2-planar and optimal 3-planar graphs, $d$-map graphs, $d$-framed graphs, graphs with bounded crossing number, and $k$-plane graphs with bounded number of $\times$-crossings.
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportunity to provide a deeper insight into the hidden generation process, as it is a fundamental mechanism underlying what we observe. In particular, overlooking selection in sequential data can lead to an incomplete or overcomplicated inductive bias in modeling, such as assuming a universal autoregressive structure for all dependencies. Therefore, rather than merely viewing it as a bias, we explore the causal structure of selection in sequential data to delve deeper into the complete causal process. Specifically, we show that selection structure is identifiable without any parametric assumptions or interventional experiments. Moreover, even in cases where selection variables coexist with latent confounders, we still establish the nonparametric identifiability under appropriate structural conditions. Meanwhile, we also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies. The framework has been validated empirically on both synthetic data and real-world music.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.