We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($\lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($\lambda$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.
Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target regions, leading to unsatisfactory results when the target regions blend well with the background. To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods. Specifically, the study develops a deep cross-scale PatchMatch (PM) method that is customized for CMFD to locate copy-move regions. Unlike existing deep models, our approach utilizes features extracted from high-resolution scales to seek explicit and reliable point-to-point matching between source and target regions. Furthermore, we propose a novel pairwise rank learning framework to separate source and target regions. By leveraging the strong prior of point-to-point matches, the framework can identify subtle differences and effectively discriminate between source and target regions, even when the target regions blend well with the background. Our framework is fully differentiable and can be trained end-to-end. Comprehensive experimental results highlight the remarkable generalizability of our scheme across various copy-move scenarios, significantly outperforming existing methods.
The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $\alpha$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propose a novel adversarial reweighting model that adversarially learns to reweight source domain data to identify source-private class samples by assigning smaller weights to them, for mitigating potential negative transfer. Based on the adversarial reweighting, we train the transferable recognition model on the reweighted source distribution to be able to classify common class data. To reduce the prediction uncertainty of the recognition model on the target domain for PDA, we present an $\alpha$-power maximization mechanism in ARPM, which enriches the family of losses for reducing the prediction uncertainty for PDA. Extensive experimental results on five PDA benchmarks, i.e., Office-31, Office-Home, VisDA-2017, ImageNet-Caltech, and DomainNet, show that our method is superior to recent PDA methods. Ablation studies also confirm the effectiveness of components in our approach. To theoretically analyze our method, we deduce an upper bound of target domain expected error for PDA, which is approximately minimized in our approach. We further extend ARPM to open-set DA, universal DA, and test time adaptation, and verify the usefulness through experiments.
Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.
Third-party software components like Log4J accelerate software application development but introduce substantial risk. These components have led to many software supply chain attacks. These attacks succeed because third-party software components are implicitly trusted in an application. Although several security defenses exist to reduce the risks from third-party software components, none of them fulfills the full set of requirements needed to defend against common attacks. No individual solution prevents malicious access to operating system resources, is dependency-aware, and enables the discovery of least privileges, all with low runtime costs. Consequently, they cannot prevent software supply chain attacks. This paper proposes applying the NIST Zero Trust Architecture to software applications. Our Zero Trust Dependencies concept applies the NIST ZTA principles to an application's dependencies. First, we assess the expected effectiveness and feasibility of Zero Trust Dependencies using a study of third-party software components and their vulnerabilities. Then, we present a system design, ZTDSYS, that enables the application of Zero Trust Dependencies to software applications and a prototype, ZTDJAVA, for Java applications. Finally, with evaluations on recreated vulnerabilities and realistic applications, we show that ZTDJAVA can defend against prevalent vulnerability classes, introduces negligible cost, and is easy to configure and use.
We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator
The aim in packing problems is to decide if a given set of pieces can be placed inside a given container. A packing problem is defined by the types of pieces and containers to be handled, and the motions that are allowed to move the pieces. The pieces must be placed so that in the resulting placement, they are pairwise interior-disjoint. We establish a framework which enables us to show that for many combinations of allowed pieces, containers and motions, the resulting problem is $\exists \mathbb{R}$-complete. This means that the problem is equivalent (under polynomial time reductions) to deciding whether a given system of polynomial equations and inequalities with integer coefficients has a real solution. We consider packing problems where only translations are allowed as the motions, and problems where arbitrary rigid motions are allowed, i.e., both translations and rotations. When rotations are allowed, we show that it is an $\exists \mathbb{R}$-complete problem to decide if a set of convex polygons, each of which has at most $7$ corners, can be packed into a square. Restricted to translations, we show that the following problems are $\exists \mathbb{R}$-complete: (i) pieces bounded by segments and hyperbolic curves to be packed in a square, and (ii) convex polygons to be packed in a container bounded by segments and hyperbolic curves.
MAG$\pi$ is a Multiparty, Asynchronous and Generalised $\pi$-calculus that introduces timeouts into session types as a means of reasoning about failure-prone communication. Its type system guarantees that all possible message-loss is handled by timeout branches. In this work, we argue that the previous is unnecessarily strict. We present MAG$\pi$!, an extension serving as the first introduction of replication into Multiparty Session Types (MPST). Replication is a standard $\pi$-calculus construct used to model infinitely available servers. We lift this construct to type-level, and show that it simplifies specification of distributed client-server interactions. We prove properties relevant to generalised MPST: subject reduction, session fidelity and process property verification.
We give a poly-time algorithm for the $k$-edge-connected spanning subgraph ($k$-ECSS) problem that returns a solution of cost no greater than the cheapest $(k+10)$-ECSS on the same graph. Our approach enhances the iterative relaxation framework with a new ingredient, which we call ghost values, that allows for high sparsity in intermediate problems. Our guarantees improve upon the best-known approximation factor of $2$ for $k$-ECSS whenever the optimal value of $(k+10)$-ECSS is close to that of $k$-ECSS. This is a property that holds for the closely related problem $k$-edge-connected spanning multi-subgraph ($k$-ECSM), which is identical to $k$-ECSS except edges can be selected multiple times at the same cost. As a consequence, we obtain a $\left(1+O\left(\frac{1}{k}\right)\right)$-approximation algorithm for $k$-ECSM, which resolves a conjecture of Pritchard and improves upon a recent $\left(1+O\left(\frac{1}{\sqrt{k}}\right)\right)$-approximation algorithm of Karlin, Klein, Oveis Gharan, and Zhang. Moreover, we present a matching lower bound for $k$-ECSM, showing that our approximation ratio is tight up to the constant factor in $O\left(\frac{1}{k}\right)$, unless $P=NP$.
Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods. We summarize these methods into three categories: generative-based, contrastive-based, and adversarial-based. All methods can be further divided into ten subcategories. To facilitate the experiments and validation of time series SSL methods, we also summarize datasets commonly used in time series forecasting, classification, anomaly detection, and clustering tasks. Finally, we present the future directions of SSL for time series analysis.
Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.