In the last years, many neural network-based approaches have been proposed to tackle combinatorial optimization problems such as routing problems. Many of these approaches are based on graph neural networks (GNNs) or related transformers, operating on the Euclidean coordinates representing the routing problems. However, GNNs are inherently not well suited to operate on dense graphs, such as in routing problems. Furthermore, models operating on Euclidean coordinates cannot be applied to non-Euclidean versions of routing problems that are often found in real-world settings. To overcome these limitations, we propose a novel GNN-related edge-based neural model called Graph Edge Attention Network (GREAT). We evaluate the performance of GREAT in the edge-classification task to predict optimal edges in the Traveling Salesman Problem (TSP). We can use such a trained GREAT model to produce sparse TSP graph instances, keeping only the edges GREAT finds promising. Compared to other, non-learning-based methods to sparsify TSP graphs, GREAT can produce very sparse graphs while keeping most of the optimal edges. Furthermore, we build a reinforcement learning-based GREAT framework which we apply to Euclidean and non-Euclidean asymmetric TSP. This framework achieves state-of-the-art results.
Local interactions of uncoordinated individuals produce the collective behaviors of many biological systems, inspiring much of the current research in programmable matter. A striking example is the spontaneous assembly of fire ants into "bridges" comprising their own bodies to traverse obstacles and reach sources of food. Experiments and simulations suggest that, remarkably, these ants always form one bridge -- instead of multiple, competing bridges -- despite a lack of central coordination. We argue that the reliable formation of a single bridge does not require sophistication on behalf of the individuals by provably reproducing this behavior in a self-organizing particle system. We show that the formation of a single bridge by the particles is a statistical inevitability of their preferences to move in a particular direction, such as toward a food source, and their preference for more neighbors. Two parameters, $\eta$ and $\beta$, reflect the strengths of these preferences and determine the Gibbs stationary measure of the corresponding particle system's Markov chain dynamics. We show that a single bridge almost certainly forms when $\eta$ and $\beta$ are sufficiently large. Our proof introduces an auxiliary Markov chain, called an "occupancy chain," that captures only the significant, global changes to the system. Through the occupancy chain, we abstract away information about the motion of individual particles, but we gain a more direct means of analyzing their collective behavior. Such abstractions provide a promising new direction for understanding many other systems of programmable matter.
In recent years, the mathematical limits and algorithmic bounds for probabilistic group testing have become increasingly well-understood, with exact asymptotic thresholds now being known in general scaling regimes for the noiseless setting. In the noisy setting where each test outcome is flipped with constant probability, there have been similar developments, but the overall understanding has lagged significantly behind the noiseless setting. In this paper, we substantially narrow this gap by deriving exact asymptotic thresholds for the noisy setting under two widely-studied random test designs: i.i.d. Bernoulli and near-constant tests-per-item. These thresholds are established by combining components of an existing information-theoretic threshold decoder with a novel analysis of maximum-likelihood decoding (upper bounds), and deriving a novel set of impossibility results by analyzing certain failure events for optimal maximum-likelihood decoding (lower bounds).
Over the past two decades, tools from network science have been leveraged to characterize the organization of both structural and functional networks of the brain. One such measure of network organization is hub node identification. Hubs are specialized nodes within a network that link distinct brain units corresponding to specialized functional processes. Conventional methods for identifying hub nodes utilize different types of centrality measures and participation coefficient to profile various aspects of nodal importance. These methods solely rely on the functional connectivity networks constructed from functional magnetic resonance imaging (fMRI), ignoring the structure-function coupling in the brain. In this paper, we introduce a graph signal processing (GSP) based hub detection framework that utilizes both the structural connectivity and the functional activation to identify hub nodes. The proposed framework models functional activity as graph signals on the structural connectivity. Hub nodes are then detected based on the premise that hub nodes are sparse, have higher level of activity compared to their neighbors, and the non-hub nodes' activity can be modeled as the output of a graph-based filter. Based on these assumptions, an optimization framework, GraFHub, is formulated to learn the coefficients of the optimal polynomial graph filter and detect the hub nodes. The proposed framework is evaluated on both simulated data and resting state fMRI (rs-fMRI) data from Human Connectome Project (HCP).
The vulnerability of artificial neural networks to adversarial perturbations in the black-box setting is widely studied in the literature. The majority of attack methods to construct these perturbations suffer from an impractically large number of queries required to find an adversarial example. In this work, we focus on knowledge distillation as an approach to conduct transfer-based black-box adversarial attacks and propose an iterative training of the surrogate model on an expanding dataset. This work is the first, to our knowledge, to provide provable guarantees on the success of knowledge distillation-based attack on classification neural networks: we prove that if the student model has enough learning capabilities, the attack on the teacher model is guaranteed to be found within the finite number of distillation iterations.
Scaling political actors based on their individual characteristics and behavior helps profiling and grouping them as well as understanding changes in the political landscape. In this paper we introduce the Structural Text-Based Scaling (STBS) model to infer ideological positions of speakers for latent topics from text data. We expand the usual Poisson factorization specification for topic modeling of text data and use flexible shrinkage priors to induce sparsity and enhance interpretability. We also incorporate speaker-specific covariates to assess their association with ideological positions. Applying STBS to U.S. Senate speeches from Congress session 114, we identify immigration and gun violence as the most polarizing topics between the two major parties in Congress. Additionally, we find that, in discussions about abortion, the gender of the speaker significantly influences their position, with female speakers focusing more on women's health. We also see that a speaker's region of origin influences their ideological position more than their religious affiliation.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.
Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do not take the characteristics of RL problems into consideration and are unsupervised. By leveraging returns, the most important feedback signals in RL, we propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns. Our auxiliary loss is theoretically justified to learn representations that capture the structure of a new form of state-action abstraction, under which state-action pairs with similar return distributions are aggregated together. In low data regime, our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite, and achieves even better performance when combined with existing auxiliary tasks.
Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.