This work addresses the problem of online exploration and visual sensor coverage of unknown environments. We introduce a novel perception roadmap we refer to as the Active Perception Network (APN) that serves as a hierarchical topological graph describing how to traverse and perceive an incrementally built spatial map of the environment. The APN state is incrementally updated to expand a connected configuration space that extends throughout as much of the known space as possible, using efficient difference-awareness techniques that track the discrete changes of the spatial map to inform the updates. A frontier-guided approach is presented for efficient evaluation of information gain and covisible information, which guides view sampling and refinement to ensure maximum coverage of the unmapped space is maintained within the APN. The updated roadmap is hierarchically decomposed into subgraph regions which we use to facilitate a non-myopic global view sequence planner. A comparative analysis to several state-of-the-art approaches was conducted, showing significant performance improvements in terms of total exploration time and surface coverage, and demonstrating high computational efficiency that is scalable to large and complex environments.
The Gaussian mixed-effects model driven by a stationary integrated Ornstein-Uhlenbeck process has been used for analyzing longitudinal data having an explicit and simple serial-correlation structure in each individual. However, the theoretical aspect of its asymptotic inference is yet to be elucidated. We prove the local asymptotics for the associated log-likelihood function, which in particular guarantees the asymptotic optimality of the suitably chosen maximum-likelihood estimator. We illustrate the obtained asymptotic normality result through some simulations for both balanced and unbalanced datasets.
This study investigates the accessibility of open-source electronic health record (EHR) systems for individuals who are visually impaired or blind. Ensuring the accessibility of EHRs to visually impaired users is critical for the diversity, equity, and inclusion of all users. The study used a combination of automated and manual accessibility testing techniques like screen readers to evaluate the accessibility of three widely used open-source EHR systems. Our assessment focused on the performance of three popular screen readers, including JAWS (Windows), NVDA (Windows), and Apple VoiceOver (OSX). The evaluation revealed that although each of the three systems was partially accessible, there is room for improvement, particularly regarding keyboard navigation and screen reader compatibility. The study concludes with recommendations for making EHR systems more inclusive for all users and more accessible.
Previous efforts to support creative problem-solving have included (a) techniques (such as brainstorming and design thinking) to stimulate creative ideas, and (b) software tools to record and share these ideas. Now, generative AI technologies can suggest new ideas that might never have occurred to the users, and users can then select from these ideas or use them to stimulate even more ideas. Here, we describe such a system, Supermind Ideator. The system uses a large language model (GPT 3.5) and adds prompting, fine tuning, and a user interface specifically designed to help people use creative problem-solving techniques. Some of these techniques can be applied to any problem; others are specifically intended to help generate innovative ideas about how to design groups of people and/or computers ("superminds"). We also describe our early experiences with using this system and suggest ways it could be extended to support additional techniques for other specific problem-solving domains.
Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we propose DiffDub: Diffusion-based dubbing. We first craft the Diffusion auto-encoder by an inpainting renderer incorporating a mask to delineate editable zones and unaltered regions. This allows for seamless filling of the lower-face region while preserving the remaining parts. Throughout our experiments, we encountered several challenges. Primarily, the semantic encoder lacks robustness, constricting its ability to capture high-level features. Besides, the modeling ignored facial positioning, causing mouth or nose jitters across frames. To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance. Moreover, we encapsulated a conformer-based reference encoder and motion generator fortified by a cross-attention mechanism. This enables our model to learn person-specific textures with varying references and reduces reliance on paired audio-visual data. Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.
To improve the predictability of complex computational models in the experimentally-unknown domains, we propose a Bayesian statistical machine learning framework utilizing the Dirichlet distribution that combines results of several imperfect models. This framework can be viewed as an extension of Bayesian stacking. To illustrate the method, we study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses. We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification and are preferable to classical Bayesian model averaging. Additionally, our statistical analysis indicates that improving model predictions through mixing rather than mixing of corrected models leads to more robust extrapolations.
This paper presents a Bayesian optimization framework for the automatic tuning of shared controllers which are defined as a Model Predictive Control (MPC) problem. The proposed framework includes the design of performance metrics as well as the representation of user inputs for simulation-based optimization. The framework is applied to the optimization of a shared controller for an Image Guided Therapy robot. VR-based user experiments confirm the increase in performance of the automatically tuned MPC shared controller with respect to a hand-tuned baseline version as well as its generalization ability.
We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization problem, demonstrating its practical usefulness and the relevance of its interpretation.
This paper proposes two innovative vector transport operators, leveraging the Cayley transform, for the generalized Stiefel manifold embedded with a non-standard inner product. Specifically, it introduces the differentiated retraction and an approximation of the Cayley transform to the differentiated matrix exponential. These vector transports are demonstrated to satisfy the Ring-Wirth non-expansive condition under non-standard metrics while preserving isometry. Building upon the novel vector transport operators, we extend the modified Polak-Ribi$\acute{e}$re-Polyak (PRP) conjugate gradient method to the generalized Stiefel manifold. Under a non-monotone line search condition, we prove our algorithm globally converges to a stationary point. The efficiency of the proposed vector transport operators is empirically validated through numerical experiments involving generalized eigenvalue problems and canonical correlation analysis.
Network games are an important class of games that model agent interactions in networked systems, where players are situated at the nodes of a graph and their payoffs depend on the actions taken by their neighbors. We extend the classical framework by considering a game model where the strategies are positive semidefinite matrices having trace one. These (continuous) games can serve as a simple model of quantum strategic interactions. We focus on the zero-sum case, where the sum of all players' payoffs is equal to zero. We establish that in this class of games, Nash equilibria can be characterized as the projection of a spectrahedron, that is, the feasible region of a semidefinite program. Furthermore, we demonstrate that determining whether a game is a semidefinite network game is equivalent to deciding if the value of a semidefinite program is zero. Beyond the zero-sum case, we characterize Nash equilibria as the solutions of a semidefinite linear complementarity problem.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.