Displacement is an important measurement for the assessment of structural conditions, but its field measurement is often hindered by difficulties associated with sensor installation and measurement accuracy. To overcome the disadvantages of conventional displacement measurement, computer vision (CV)-based methods have been implemented due to their remote sensing capabilities and accuracy. This paper presents a strategy for non-target structural displacement measurement that makes use of CV to avoid the need to install a target on the structure while calibrating the displacement using structured light. The proposed system called as LAVOLUTION calculates the relative position of the camera with regard to the structure using four equally spaced beams of structured light and obtains a scale factor to convert pixel movement into structural displacement. A jig for the four beams of structured light is designed and a corresponding alignment process is proposed. A method for calculating the scale factor using the designed jig for tunable structured-light is proposed and validated via numerical simulations and lab-scale experiments. To confirm the feasibility of the proposed displacement measurement process, experiments on a shaking table and a full-scale bridge are conducted and the accuracy of the proposed method is compared with that of a reference laser doppler vibrometer.
Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
Controller design for bipedal walking on dynamic rigid surfaces (DRSes), which are rigid surfaces moving in the inertial frame (e.g., ships and airplanes), remains largely uninvestigated. This paper introduces a hierarchical control approach that achieves stable underactuated bipedal robot walking on a horizontally oscillating DRS. The highest layer of our approach is a real-time motion planner that generates desired global behaviors (i.e., the center of mass trajectories and footstep locations) by stabilizing a reduced-order robot model. One key novelty of this layer is the derivation of the reduced-order model by analytically extending the angular momentum based linear inverted pendulum (ALIP) model from stationary to horizontally moving surfaces. The other novelty is the development of a discrete-time foot-placement controller that exponentially stabilizes the hybrid, linear, time-varying ALIP model. The middle layer of the proposed approach is a walking pattern generator that translates the desired global behaviors into the robot's full-body reference trajectories for all directly actuated degrees of freedom. The lowest layer is an input-output linearizing controller that exponentially tracks those full-body reference trajectories based on the full-order, hybrid, nonlinear robot dynamics. Simulations of planar underactuated bipedal walking on a swaying DRS confirm that the proposed framework ensures the walking stability under different DRS motions and gait types.
For users to trust model predictions, they need to understand model outputs, particularly their confidence - calibration aims to adjust (calibrate) models' confidence to match expected accuracy. We argue that the traditional calibration evaluation does not promote effective calibrations: for example, it can encourage always assigning a mediocre confidence score to all predictions, which does not help users distinguish correct predictions from wrong ones. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. Focusing on the practical application of open-domain question answering, we examine conventional calibration methods applied on the widely-used retriever-reader pipeline, all of which do not bring significant gains under our new MacroCE metric. Toward better calibration, we propose a new calibration method (ConsCal) that uses not just final model predictions but whether multiple model checkpoints make consistent predictions. Altogether, we provide an alternative view of calibration along with a new metric, re-evaluation of existing calibration methods on our metric, and proposal of a more effective calibration method.
The world is filled with articulated objects that are difficult to determine how to use from vision alone, e.g., a door might open inwards or outwards. Humans handle these objects with strategic trial-and-error: first pushing a door then pulling if that doesn't work. We enable these capabilities in autonomous agents by proposing "Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR), a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation. We compare our model with existing work in manipulating objects after a handful of exploration actions, on the PartNet-Mobility dataset. We further propose a novel PuzzleBoxes benchmark that contains locked boxes that require multiple steps to solve. We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework, despite using zero training data. We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
In this paper, we propose a comprehensive benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios. Current explanation datasets often employ synthetic data with simple reasoning structures. Therefore, it cannot express more complex reasoning processes, such as the rebuttal to a reasoning step and the degree of certainty of the evidence. To this end, we propose a comprehensive logical reasoning explanation form. Based on the multi-hop chain of reasoning, the explanation form includes three main components: (1) The condition of rebuttal that the reasoning node can be challenged; (2) Logical formulae that uncover the internal texture of reasoning nodes; (3) Reasoning strength indicated by degrees of certainty. The fine-grained structure conforms to the real logical reasoning scenario, better fitting the human cognitive process but, simultaneously, is more challenging for the current models. We evaluate the current best models' performance on this new explanation form. The experimental results show that generating reasoning graphs remains a challenging task for current models, even with the help of giant pre-trained language models.
Vision-based sensors have shown significant performance, accuracy, and efficiency gain in Simultaneous Localization and Mapping (SLAM) systems in recent years. In this regard, Visual Simultaneous Localization and Mapping (VSLAM) methods refer to the SLAM approaches that employ cameras for pose estimation and map generation. We can see many research works that demonstrated VSLAMs can outperform traditional methods, which rely only on a particular sensor, such as a Lidar, even with lower costs. VSLAM approaches utilize different camera types (e.g., monocular, stereo, and RGB-D), have been tested on various datasets (e.g., KITTI, TUM RGB-D, and EuRoC) and in dissimilar environments (e.g., indoors and outdoors), and employ multiple algorithms and methodologies to have a better understanding of the environment. The mentioned variations have made this topic popular for researchers and resulted in a wide range of VSLAMs methodologies. In this regard, the primary intent of this survey is to present the recent advances in VSLAM systems, along with discussing the existing challenges and trends. We have given an in-depth literature survey of forty-five impactful papers published in the domain of VSLAMs. We have classified these manuscripts by different characteristics, including the novelty domain, objectives, employed algorithms, and semantic level. We also discuss the current trends and future directions that may help researchers investigate them.
Reinforcement learning has been demonstrated as a flexible and effective approach for learning a range of continuous control tasks, such as those used by robots to manipulate objects in their environment. But in robotics particularly, real-world rollouts are costly, and sample efficiency can be a major limiting factor when learning a new skill. In game environments, the use of world models has been shown to improve sample efficiency while still achieving good performance, especially when images or other rich observations are provided. In this project, we explore the use of a world model in a deformable robotic manipulation task, evaluating its effect on sample efficiency when learning to fold a cloth in simulation. We compare the use of RGB image observation with a feature space leveraging built-in structure (keypoints representing the cloth configuration), a common approach in robot skill learning, and compare the impact on task performance and learning efficiency with and without the world model. Our experiments showed that the usage of keypoints increased the performance of the best model on the task by 50%, and in general, the use of a learned or constructed reduced feature space improved task performance and sample efficiency. The use of a state transition predictor(MDN-RNN) in our world models did not have a notable effect on task performance.
In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to auxiliary information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
Variables contained within the global oceans can detect and reveal the effects of the warming climate as the oceans absorb huge amounts of solar energy. Hence, information regarding the joint spatial distribution of ocean variables is critical for climate monitoring. In this paper, we investigate the spatial correlation structure between ocean temperature and salinity using data harvested from the Argo program and construct a model to capture their bivariate spatial dependence from the surface to the ocean's interior. We develop a flexible class of multivariate nonstationary covariance models defined in 3-dimensional (3D) space (longitude x latitude x depth) that allows for the variances and correlation to change along the vertical pressure dimension. These models are able to describe the joint spatial distribution of the two variables while incorporating the underlying vertical structure of the ocean. We demonstrate that proposed cross-covariance models describe the complex vertical cross-covariance structure well, while existing cross-covariance models including bivariate Mat\'{e}rn models poorly fit empirical cross-covariance structure. Furthermore, the results show that using one more variable significantly enhances the prediction of the other variable and that the estimated spatial dependence structures are consistent with the ocean stratification.
Artificial Intelligence (AI) is rapidly becoming integrated into military Command and Control (C2) systems as a strategic priority for many defence forces. The successful implementation of AI is promising to herald a significant leap in C2 agility through automation. However, realistic expectations need to be set on what AI can achieve in the foreseeable future. This paper will argue that AI could lead to a fragility trap, whereby the delegation of C2 functions to an AI could increase the fragility of C2, resulting in catastrophic strategic failures. This calls for a new framework for AI in C2 to avoid this trap. We will argue that antifragility along with agility should form the core design principles for AI-enabled C2 systems. This duality is termed Agile, Antifragile, AI-Enabled Command and Control (A3IC2). An A3IC2 system continuously improves its capacity to perform in the face of shocks and surprises through overcompensation from feedback during the C2 decision-making cycle. An A3IC2 system will not only be able to survive within a complex operational environment, it will also thrive, benefiting from the inevitable shocks and volatility of war.