We present a homotopic approach to solving challenging, optimization-based motion planning problems. The approach uses Homotopy Optimization, which, unlike standard continuation methods for solving homotopy problems, solves a sequence of constrained optimization problems rather than a sequence of nonlinear systems of equations. The insight behind our proposed algorithm is formulating the discovery of this sequence of optimization problems as a search problem in a multidimensional homotopy parameter space. Our proposed algorithm, the Probabilistic Homotopy Optimization algorithm, switches between solve and sample phases, using solutions to easy problems as initial guesses to more challenging problems. We analyze how our algorithm performs in the presence of common challenges to homotopy methods, such as bifurcation, folding, and disconnectedness of the homotopy solution manifold. Finally, we demonstrate its utility via a case study on two dynamic motion planning problems: the cart-pole and the MIT Humanoid.
We investigate the Byzantine attack problem within the context of model training in distributed learning systems. While ensuring the convergence of current model training processes, common solvers (e.g. SGD, Adam, RMSProp, etc.) can be easily compromised by malicious nodes in these systems. Consequently, the training process may either converge slowly or even diverge. To develop effective secure distributed learning solvers, it is crucial to first examine attack methods to assess the robustness of these solvers. In this work, we contribute to the design of attack strategies by initially highlighting the limitations of finite-norm attacks. We then introduce the seesaw attack, which has been demonstrated to be more effective than the finite-norm attack. Through numerical experiments, we evaluate the efficacy of the seesaw attack across various gradient aggregation rules.
Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model's question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.
We present a new power method to obtain solutions of eigenvalue problems. The method can determine not only the dominant or lowest eigenvalues but also all eigenvalues without the need for a deflation procedure. The method uses a functional of an operator (or a matrix) to select or filter an eigenvalue. The method can freely select a solution by varying a parameter associated to an estimate of the eigenvalue. The convergence of the method is highly dependent on how closely the parameter to the eigenvalues. In this paper, numerical results of the method are shown to be in excellent agreement with the analytical ones.
Randomized sampling based algorithms are widely used in robot motion planning due to the problem's intractability, and are experimentally effective on a wide range of problem instances. Most variants bias their sampling using various heuristics related to the known underlying structure of the search space. In this work, we formalize the intuitive notion of guided search by defining the concept of a guiding space. This new language encapsulates many seemingly distinct prior methods under the same framework, and allows us to reason about guidance, a previously obscured core contribution of different algorithms. We suggest an information theoretic method to evaluate guidance, which experimentally matches intuition when tested on known algorithms in a variety of environments. The language and evaluation of guidance suggests improvements to existing methods, and allows for simple hybrid algorithms that combine guidance from multiple sources.
Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at //zjunlp.github.io/project/ConceptEdit.
We provide a rigorous convergence proof demonstrating that the well-known semi-analytical Fourier cosine (COS) formula for the inverse Fourier transform of continuous probability distributions can be extended to discrete probability distributions, with the help of spectral filters. We establish general convergence rates for these filters and further show that several classical spectral filters achieve convergence rates one order faster than previously recognized in the literature on the Gibbs phenomenon. Our numerical experiments corroborate the theoretical convergence results. Additionally, we illustrate the computational speed and accuracy of the discrete COS method with applications in computational statistics and quantitative finance. The theoretical and numerical results highlight the method's potential for solving problems involving discrete distributions, particularly when the characteristic function is known, allowing the discrete Fourier transform (DFT) to be bypassed.
We investigate the prominent class of fair representation learning methods for bias mitigation. Using causal reasoning to define and formalise different sources of dataset bias, we reveal important implicit assumptions inherent to these methods. We prove fundamental limitations on fair representation learning when evaluation data is drawn from the same distribution as training data and run experiments across a range of medical modalities to examine the performance of fair representation learning under distribution shifts. Our results explain apparent contradictions in the existing literature and reveal how rarely considered causal and statistical aspects of the underlying data affect the validity of fair representation learning. We raise doubts about current evaluation practices and the applicability of fair representation learning methods in performance-sensitive settings. We argue that fine-grained analysis of dataset biases should play a key role in the field moving forward.
While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables. A challenge is how to model the across-view dependence between the partitions of patients into subgroups. The complexities of the partition space make standard methods to model dependence, such as correlation, infeasible. In this article, we propose CLustering with Independence Centering (CLIC), a clustering prior that uses a single parameter to explicitly model dependence between clusterings across views. CLIC is induced by the product centered Dirichlet process (PCDP), a novel hierarchical prior that bridges between independent and equivalent partitions. We show appealing theoretic properties, provide a finite approximation and prove its accuracy, present a marginal Gibbs sampler for posterior computation, and derive closed form expressions for the marginal and joint partition distributions for the CLIC model. On synthetic data and in an application to epidemiology, CLIC accurately characterizes view-specific partitions while providing inference on the dependence level.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.