We consider a three-block alternating direction method of multipliers (ADMM) for solving the nonconvex nonseparable optimization problem with linear constraint. Inspired by [1], the third variable is updated twice in each iteration to ensure the global convergence. Based on the powerful Kurdyka-Lojasiewicz property, we prove that the sequence generated by the ADMM converges globally to the critical point of the augmented Lagrangian function. We also point out the convergence of proposed ADMM with swapping the update order of the first and second variables, and with adding a proximal term to the first variable for more general nonseparable problems, respectively. Moreover, we make numerical experiments on three nonconvex problems: multiple measurement vector (MMV), robust PCA (RPCA) and nonnegative matrix completion (NMC). The results show the efficiency and outperformance of proposed ADMM.
We give an operational definition of information-theoretic resources within a given multipartite classical or quantum correlation. We present our causal model that serves as the source coding side of this correlation and introduce a novel concept of resource rate. We argue that, beyond classical secrecy, additional resources exist that are useful for the security of distributed computing problems, which can be captured by the resource rate. Furthermore, we establish a relationship between resource rate and an extension of Shannon's logarithmic information measure, namely, total correlation. Subsequently, we present a novel quantum secrecy monotone and investigate a quantum hybrid key distribution system as an extension of our causal model. Finally, we discuss some connections to optimal transport (OT) problem.
We consider a new type of inverse combinatorial optimization, Inverse Submodular Maximization (ISM), for human-in-the-loop multi-robot coordination. Forward combinatorial optimization, defined as the process of solving a combinatorial problem given the reward (cost)-related parameters, is widely used in multi-robot coordination. In the standard pipeline, the reward (cost)-related parameters are designed offline by domain experts first and then these parameters are utilized for coordinating robots online. What if we need to change these parameters by non-expert human supervisors who watch over the robots during tasks to adapt to some new requirements? We are interested in the case where human supervisors can suggest what actions to take, and the robots need to change the internal parameters based on such suggestions. We study such problems from the perspective of inverse combinatorial optimization, i.e., the process of finding parameters given solutions to the problem. Specifically, we propose a new formulation for ISM, in which we aim to find a new set of parameters that minimally deviate from the current parameters and can make the greedy algorithm output actions the same as those suggested by humans. We show that such problems can be formulated as a Mixed Integer Quadratic Program (MIQP). However, MIQP involves exponentially many binary variables, making it intractable for the existing solver when the problem size is large. We propose a new algorithm under the Branch $\&$ Bound paradigm to solve such problems. In numerical simulations, we demonstrate how to use ISM in multi-robot multi-objective coverage control, and we show that the proposed algorithm achieves significant advantages in running time and peak memory usage compared to directly using an existing solver.
Optimal behaviours of a system to perform a specific task can be achieved by leveraging the coupling between trajectory optimization, stabilization, and design optimization. This approach is particularly advantageous for underactuated systems, which are systems that have fewer actuators than degrees of freedom and thus require for more elaborate control systems. This paper proposes a novel co-design algorithm, namely Robust Trajectory Control with Design optimization (RTC-D). An inner optimization layer (RTC) simultaneously performs direct transcription (DIRTRAN) to find a nominal trajectory while computing optimal hyperparameters for a stabilizing time-varying linear quadratic regulator (TVLQR). RTC-D augments RTC with a design optimization layer, maximizing the system's robustness through a time-varying Lyapunov-based region of attraction (ROA) analysis. This analysis provides a formal guarantee of stability for a set of off-nominal states. The proposed algorithm has been tested on two different underactuated systems: the torque-limited simple pendulum and the cart-pole. Extensive simulations of off-nominal initial conditions demonstrate improved robustness, while real-system experiments show increased insensitivity to torque disturbances.
Outflow boundaries play an important role in multiphase fluid dynamics simulations that involve transition between liquid and vapor phases. These flows are dominated by low Weber numbers and a sharp jump in pressure, velocity, and temperature. Inadequate treatment of these jumps at the outlet generates undesirable fluid disturbances that propagate upstream and lead to instabilities within the computational domain. To mitigate these disturbances, we introduce a forcing term that can be applied to incompressible Navier-Stokes equations to enforce stability in the numerical solution. The forcing term acts as a damping mechanism to control vortices that are generated by droplet/bubbles in multiphase flows, and is designed to be a general formulation that can be coupled with a fixed pressure outflow boundary condition to simulate a variety of multiphase flow problems. We demonstrate its applicability to simulate pool and flow boiling problems, where bubble-induced vortices during evaporation and condensation present a challenge at the outflow. Validation and verification cases are chosen to quantify accuracy and stability of the proposed method in comparison to established benchmarks and reference solutions, along with detailed performance analysis for three-dimensional simulations on leadership supercomputing platforms. Computational experiments are performed using Flash-X, which is a composable open-source software instrument designed for multiscale fluid dynamics simulations on heterogeneous architectures.
Mitigating hallucinations of Large Multi-modal Models(LMMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LMMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues generated by our novel Adversarial Question Generator, which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LMMs. On our benchmark, the zero-shot performance of state-of-the-art LMMs dropped significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning that robustly fine-tunes LMMs on augmented multi-modal instruction-following datasets with hallucinatory dialogues. Extensive experiments show that our proposed approach successfully reduces dialogue hallucination while maintaining or even improving performance.
We consider the problem of evaluating dynamic consistency in discrete time probabilistic filters that approximate stochastic system state densities with Gaussian mixtures. Dynamic consistency means that the estimated probability distributions correctly describe the actual uncertainties. As such, the problem of consistency testing naturally arises in applications with regards to estimator tuning and validation. However, due to the general complexity of the density functions involved, straightforward approaches for consistency testing of mixture-based estimators have remained challenging to define and implement. This paper derives a new exact result for Gaussian mixture consistency testing within the framework of normalized deviation squared (NDS) statistics. It is shown that NDS test statistics for generic multivariate Gaussian mixture models exactly follow mixtures of generalized chi-square distributions, for which efficient computational tools are available. The accuracy and utility of the resulting consistency tests are numerically demonstrated on static and dynamic mixture estimation examples.
We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal knowledge distillation during training to learn strong linguistic and prosodic representations of emotion from speech. During inference, our method only uses a stream of speech signals to perform unimodal SER thus reducing computation overhead and avoiding run-time transcription and prosodic feature extraction errors. During training, our method distills information at both embedding and logit levels from a pair of pre-trained Prosodic and Linguistic teachers that are fine-tuned for SER. Experiments on the IEMOCAP benchmark demonstrate that our method outperforms other unimodal and multimodal techniques by a considerable margin, and achieves state-of-the-art performance of 77.49% unweighted accuracy and 78.91% weighted accuracy. Detailed ablation studies demonstrate the impact of each component of our method.
We present variational inference with sequential sample-average approximation (VISA), a method for approximate inference in computationally intensive models, such as those based on numerical simulations. VISA extends importance-weighted forward-KL variational inference by employing a sequence of sample-average approximations, which are considered valid inside a trust region. This makes it possible to reuse model evaluations across multiple gradient steps, thereby reducing computational cost. We perform experiments on high-dimensional Gaussians, Lotka-Volterra dynamics, and a Pickover attractor, which demonstrate that VISA can achieve comparable approximation accuracy to standard importance-weighted forward-KL variational inference with computational savings of a factor two or more for conservatively chosen learning rates.
We consider a binary decision aggregation problem in the presence of both truthful and adversarial experts. The truthful experts will report their private signals truthfully with proper incentive, while the adversarial experts can report arbitrarily. The decision maker needs to design a robust aggregator to forecast the true state of the world based on the reports of experts. The decision maker does not know the specific information structure, which is a joint distribution of signals, states, and strategies of adversarial experts. We want to find the optimal aggregator minimizing regret under the worst information structure. The regret is defined by the difference in expected loss between the aggregator and a benchmark who makes the optimal decision given the joint distribution and reports of truthful experts. We prove that when the truthful experts are symmetric and adversarial experts are not too numerous, the truncated mean is optimal, which means that we remove some lowest reports and highest reports and take averaging among the left reports. Moreover, for many settings, the optimal aggregators are in the family of piecewise linear functions. The regret is independent of the total number of experts but only depends on the ratio of adversaries. We evaluate our aggregators by numerical experiment in an ensemble learning task. We also obtain some negative results for the aggregation problem with adversarial experts under some more general information structures and experts' report space.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.