Generating on-purpose impacts with rigid robots is challenging as they may lead to severe hardware failures due to abrupt changes in the velocities and torques. Without dedicated hardware and controllers, robots typically operate at a near-zero velocity in the vicinity of contacts. We assume knowing how much of impact the hardware can absorb and focus solely on the controller aspects. The novelty of our approach is twofold: (i) it uses the task-space inverse dynamics formalism that we extend by seamlessly integrating impact tasks; (ii) it does not require separate models with switches or a reset map to operate the robot undergoing impact tasks. Our main idea lies in integrating post-impact states prediction and impact-aware inequality constraints as part of our existing general-purpose whole-body controller. To achieve such prediction, we formulate task-space impacts and its spreading along the kinematic tree of a floating-base robot with subsequent joint velocity and torque jumps. As a result, the feasible solution set accounts for various constraints due to expected impacts. In a multi-contact situation of under-actuated legged robots subject to multiple impacts, we also enforce standing stability margins. By design, our controller does not require precise knowledge of impact location and timing. We assessed our formalism with the humanoid robot HRP-4, generating maximum contact velocities, neither breaking established contacts nor damaging the hardware.
Location-awareness is essential in various wireless applications. The capability of performing precise ranging is substantial in achieving high-accuracy localization. Due to the notorious ambiguity phenomenon, optimal ranging waveforms should be adaptive to the signal-to-noise ratio (SNR). In this letter, we propose to use the Ziv-Zakai bound (ZZB) as the ranging performance metric, as well as an associated waveform design algorithm having theoretical guarantee of achieving the optimal ZZB at a given SNR. Numerical results suggest that, in stark contrast to the well-known high-SNR design philosophy, the detection probability of the ranging signal becomes more important than the resolution in the low-SNR regime.
Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
In many sensor network applications, a fusion center often has additional valuable information, such as context data, which cannot be obtained directly from the sensors. Motivated by this, we study a generalized CEO problem where a CEO has access to context information. The main contribution of this work is twofold. Firstly, we characterize the asymptotically optimal error exponent per rate as the number of sensors and sum rate grow without bound. The proof extends the Berger-Tung coding scheme and the converse argument by Berger et al. (1996) taking into account context information. The resulting expression includes the minimum Chernoff divergence over context information. Secondly, assuming that the sizes of the source and context alphabets are respectively $|\mathcal{X}|$ and $|\mathcal{S}|$, we prove that it is asymptotically optimal to partition all sensors into at most $\binom{|\mathcal{X}|}{2} |\mathcal{S}|$ groups and have the sensors in each group adopt the same encoding scheme. Our problem subsumes the original CEO problem by Berger et al. (1996) as a special case if there is only one letter for context information; in this case, our result tightens its required number of groups from $\binom{|\mathcal{X}|}{2}+2$ to $\binom{|\mathcal{X}|}{2}$. We also numerically demonstrate the effect of context information for a simple Gaussian scenario.
Statistical divergences quantify the difference between probability distributions, thereby allowing for multiple uses in machine-learning. However, a fundamental challenge of these quantities is their estimation from empirical samples since the underlying distributions of the data are usually unknown. In this work, we propose a divergence inspired by the Jensen-Shannon divergence which avoids the estimation of the probability density functions. Our approach embeds the data in an reproducing kernel Hilbert space (RKHS) where we associate data distributions with uncentered covariance operators in this representation space. Therefore, we name this measure the representation Jensen-Shannon divergence (RJSD). We provide an estimator from empirical covariance matrices by explicitly mapping the data to an RKHS using Fourier features. This estimator is flexible, scalable, differentiable, and suitable for minibatch-based optimization problems. Additionally, we provide an estimator based on kernel matrices without an explicit mapping to the RKHS. We provide consistency convergence results for the proposed estimator. Moreover, we demonstrate that this quantity is a lower bound on the Jensen-Shannon divergence, leading to a variational approach to estimate it with theoretical guarantees. We leverage the proposed divergence to train generative networks, where our method mitigates mode collapse and encourages samples diversity. Additionally, RJSD surpasses other state-of-the-art techniques in multiple two-sample testing problems, demonstrating superior performance and reliability in discriminating between distributions.
Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best F1-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its statistical similarity with the test examples; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated practical success in fine-tuning language models, existing theoretical work focuses on regret minimization and fails to capture most of the practical frameworks. In this study, we fill in such a gap between theoretical PbRL and practical algorithms by proposing a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired before collecting any human feedback. Theoretical analysis demonstrates that our algorithm requires less human feedback for learning the optimal policy under preference-based models with linear parameterization and unknown transitions, compared to the existing theoretical literature. Specifically, our framework can incorporate linear and low-rank MDPs with efficient sample complexity. Additionally, we investigate reward-agnostic RL with action-based comparison feedback and introduce an efficient querying algorithm tailored to this scenario.
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces by introducing three new cross-domain benchmarks constructed from publicly available datasets. We also propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views, to learn representations of task-specific information for fast adaptation, and task-agnostic information for knowledge transfer. We run exhaustive experiments to evaluate the performance of contrastive and meta-learning strategies. We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy across all benchmarks. The source code and data will be released here: //github.com/kavehhassani/metagrl
Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing performance on a variety of domains such as few-shot image classification and reinforcement learning. However, one important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In this paper, we augment MAML with the capability to identify the mode of tasks sampled from a multimodal task distribution and adapt quickly through gradient updates. Specifically, we propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation. We evaluate the proposed model on a diverse set of few-shot learning tasks, including regression, image classification, and reinforcement learning. The results not only demonstrate the effectiveness of our model in modulating the meta-learned prior in response to the characteristics of tasks but also show that training on a multimodal distribution can produce an improvement over unimodal training.
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.