In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.
We introduce a reversible theory of exact entanglement manipulation by establishing a necessary and sufficient condition for state transfer under trace-preserving transformations that completely preserve the positivity of partial transpose (PPT). Under these free transformations, we show that logarithmic negativity emerges as the pivotal entanglement measure for determining entangled states' transformations, analogous to the role of entropy in the second law of thermodynamics. Previous results have proven that entanglement is irreversible under quantum operations that completely preserve PPT and leave open the question of reversibility for quantum operations that do not generate entanglement asymptotically. However, we find that going beyond the complete positivity constraint imposed by standard quantum mechanics enables a reversible theory of exact entanglement manipulation, which may suggest a potential incompatibility between the reversibility of entanglement and the fundamental principles of quantum mechanics.
Negative control variables are sometimes used in non-experimental studies to detect the presence of confounding by hidden factors. A negative control outcome (NCO) is an outcome that is influenced by unobserved confounders of the exposure effects on the outcome in view, but is not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the Control Outcome Calibration Approach (COCA) as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO on the treatment-free counterfactual, together with a rank-preserving structural model which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose three separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly-robust approach. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil.
Regularization is a critical component in deep learning training, with weight decay being a commonly used approach. It applies a constant penalty coefficient uniformly across all parameters. This may be unnecessarily restrictive for some parameters, while insufficiently restricting others. To dynamically adjust penalty coefficients for different parameter groups, we present constrained parameter regularization (CPR) as an alternative to traditional weight decay. Instead of applying a single constant penalty to all parameters, we enforce an upper bound on a statistical measure (e.g., the L$_2$-norm) of parameter groups. Consequently, learning becomes a constraint optimization problem, which we address by an adaptation of the augmented Lagrangian method. CPR only requires two hyperparameters and incurs no measurable runtime overhead. Additionally, we propose a simple but efficient mechanism to adapt the upper bounds during the optimization. We provide empirical evidence of CPR's efficacy in experiments on the "grokking" phenomenon, computer vision, and language modeling tasks. Our results demonstrate that CPR counteracts the effects of grokking and consistently matches or outperforms traditional weight decay.
When optimizing problems with uncertain parameter values in a linear objective, decision-focused learning enables end-to-end learning of these values. We are interested in a stochastic scheduling problem, in which processing times are uncertain, which brings uncertain values in the constraints, and thus repair of an initial schedule may be needed. Historical realizations of the stochastic processing times are available. We show how existing decision-focused learning techniques based on stochastic smoothing can be adapted to this scheduling problem. We include an extensive experimental evaluation to investigate in which situations decision-focused learning outperforms the state of the art for such situations: scenario-based stochastic optimization.
We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.
Self-supervised learning has been widely used to obtain transferrable representations from unlabeled images. Especially, recent contrastive learning methods have shown impressive performances on downstream image classification tasks. While these contrastive methods mainly focus on generating invariant global representations at the image-level under semantic-preserving transformations, they are prone to overlook spatial consistency of local representations and therefore have a limitation in pretraining for localization tasks such as object detection and instance segmentation. Moreover, aggressively cropped views used in existing contrastive methods can minimize representation distances between the semantically different regions of a single image. In this paper, we propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks. In particular, we devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region according to geometric translations and zooming operations. On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements over the image-level supervised pretraining as well as the state-of-the-art self-supervised learning methods.
Graph neural networks (GNNs) have been widely used in representation learning on graphs and achieved state-of-the-art performance in tasks such as node classification and link prediction. However, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs. The limitations especially become problematic when learning representations on a misspecified graph or a heterogeneous graph that consists of various types of nodes and edges. In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion. Graph Transformer layer, a core layer of GTNs, learns a soft selection of edge types and composite relations for generating useful multi-hop connections so-called meta-paths. Our experiments show that GTNs learn new graph structures, based on data and tasks without domain knowledge, and yield powerful node representation via convolution on the new graphs. Without domain-specific graph preprocessing, GTNs achieved the best performance in all three benchmark node classification tasks against the state-of-the-art methods that require pre-defined meta-paths from domain knowledge.
Graph representation learning is to learn universal node representations that preserve both node attributes and structural information. The derived node representations can be used to serve various downstream tasks, such as node classification and node clustering. When a graph is heterogeneous, the problem becomes more challenging than the homogeneous graph node learning problem. Inspired by the emerging information theoretic-based learning algorithm, in this paper we propose an unsupervised graph neural network Heterogeneous Deep Graph Infomax (HDGI) for heterogeneous graph representation learning. We use the meta-path structure to analyze the connections involving semantics in heterogeneous graphs and utilize graph convolution module and semantic-level attention mechanism to capture local representations. By maximizing local-global mutual information, HDGI effectively learns high-level node representations that can be utilized in downstream graph-related tasks. Experiment results show that HDGI remarkably outperforms state-of-the-art unsupervised graph representation learning methods on both classification and clustering tasks. By feeding the learned representations into a parametric model, such as logistic regression, we even achieve comparable performance in node classification tasks when comparing with state-of-the-art supervised end-to-end GNN models.
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.