Unmanned Surface Vehicles (USVs) play a pivotal role in various applications, including surface rescue, commercial transactions, scientific exploration, water rescue, and military operations. The effective control of high-speed unmanned surface boats stands as a critical aspect within the overall USV system, particularly in challenging environments marked by complex surface obstacles and dynamic conditions, such as time-varying surges, non-directional forces, and unpredictable winds. In this paper, we propose a data-driven control method based on Koopman theory. This involves constructing a high-dimensional linear model by mapping a low-dimensional nonlinear model to a higher-dimensional linear space through data identification. The observable USVs dynamical system is dynamically reconstructed using online error learning. To enhance tracking control accuracy, we utilize a Constructive Lyapunov Function (CLF)-Control Barrier Function (CBF)-Quadratic Programming (QP) approach to regulate the high-dimensional linear dynamical system obtained through identification. This approach facilitates error compensation, thereby achieving more precise tracking control.
Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.
The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two almost orthogonal directions. The first is represented by end-to-end (E2E) learned codecs which, while offering high performance on image coding, are not yet on par with state-of-the-art conventional video codecs and lack interoperability. The second direction considers using the Versatile Video Coding (VVC) standard or any other conventional video codec (CVC) together with pre- and post-processing operations targeting machine analysis. While the CVC-based methods benefit from interoperability and broad hardware and software support, the machine task performance is often lower than the desired level, particularly in low bitrates. This paper proposes a hybrid codec for machines called NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bj{\o}ntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks. To the best of our knowledge, this is the first research paper showing a hybrid video codec that outperforms VVC on multiple datasets and multiple machine vision tasks.
A deep learning initialized iterative (Int-Deep) method is developed for numerically solving Navier-Stokes Darcy model. For this purpose, Newton iterative method is mentioned for solving the relative finite element discretized problem. It is proved that this method converges quadratically with the convergence rate independent of the finite element mesh size under certain standard conditions. Later on, a deep learning algorithm is proposed for solving this nonlinear coupled problem. Following the ideas of an earlier work by Huang, Wang and Yang (2020), an Int-Deep algorithm is constructed for the previous problem in order to further improve the computational efficiency. A series of numerical examples are reported to confirm that the Int-Deep algorithm converges to the true solution rapidly and is robust with respect to the physical parameters in the model.
Point clouds are utilized in various 3D applications such as cross-reality (XR) and realistic 3D displays. In some applications, e.g., for live streaming using a 3D point cloud, real-time point cloud denoising methods are required to enhance the visual quality. However, conventional high-precision denoising methods cannot be executed in real time for large-scale point clouds owing to the complexity of graph constructions with K nearest neighbors and noise level estimation. This paper proposes a fast graph-based denoising (FGBD) for a large-scale point cloud. First, high-speed graph construction is achieved by scanning a point cloud in various directions and searching adjacent neighborhoods on the scanning lines. Second, we propose a fast noise level estimation method using eigenvalues of the covariance matrix on a graph. Finally, we also propose a new low-cost filter selection method to enhance denoising accuracy to compensate for the degradation caused by the acceleration algorithms. In our experiments, we succeeded in reducing the processing time dramatically while maintaining accuracy relative to conventional denoising methods. Denoising was performed at 30fps, with frames containing approximately 1 million points.
This paper analyzes the stability of the class of Time-Accurate and Highly-Stable Explicit Runge-Kutta (TASE-RK) methods, introduced in 2021 by Bassenne et al. (J. Comput. Phys.) for the numerical solution of stiff Initial Value Problems (IVPs). Such numerical methods are easy to implement and require the solution of a limited number of linear systems per step, whose coefficient matrices involve the exact Jacobian $J$ of the problem. To significantly reduce the computational cost of TASE-RK methods without altering their consistency properties, it is possible to replace $J$ with a matrix $A$ (not necessarily tied to $J$) in their formulation, for instance fixed for a certain number of consecutive steps or even constant. However, the stability properties of TASE-RK methods strongly depend on this choice, and so far have been studied assuming $A=J$. In this manuscript, we theoretically investigate the conditional and unconditional stability of TASE-RK methods by considering arbitrary $A$. To this end, we first split the Jacobian as $J=A+B$. Then, through the use of stability diagrams and their connections with the field of values, we analyze both the case in which $A$ and $B$ are simultaneously diagonalizable and not. Numerical experiments, conducted on Partial Differential Equations (PDEs) arising from applications, show the correctness and utility of the theoretical results derived in the paper, as well as the good stability and efficiency of TASE-RK methods when $A$ is suitably chosen.
This study presents a control framework leveraging vision language models (VLMs) for multiple tasks and robots. Notably, existing control methods using VLMs have achieved high performance in various tasks and robots in the training environment. However, these methods incur high costs for learning control policies for tasks and robots other than those in the training environment. Considering the application of industrial and household robots, learning in novel environments where robots are introduced is challenging. To address this issue, we propose a control framework that does not require learning control policies. Our framework combines the vision-language CLIP model with a randomized control. CLIP computes the similarity between images and texts by embedding them in the feature space. This study employs CLIP to compute the similarity between camera images and text representing the target state. In our method, the robot is controlled by a randomized controller that simultaneously explores and increases the similarity gradients. Moreover, we fine-tune the CLIP to improve the performance of the proposed method. Consequently, we confirm the effectiveness of our approach through a multitask simulation and a real robot experiment using a two-wheeled robot and robot arm.
In this paper we develop a linear expectile hidden Markov model for the analysis of cryptocurrency time series in a risk management framework. The methodology proposed allows to focus on extreme returns and describe their temporal evolution by introducing in the model time-dependent coefficients evolving according to a latent discrete homogeneous Markov chain. As it is often used in the expectile literature, estimation of the model parameters is based on the asymmetric normal distribution. Maximum likelihood estimates are obtained via an Expectation-Maximization algorithm using efficient M-step update formulas for all parameters. We evaluate the introduced method with both artificial data under several experimental settings and real data investigating the relationship between daily Bitcoin returns and major world market indices.
Agents centered around Large Language Models (LLMs) are now capable of automating mobile device operations for users. After fine-tuning to learn a user's mobile operations, these agents can adhere to high-level user instructions online. They execute tasks such as goal decomposition, sequencing of sub-goals, and interactive environmental exploration, until the final objective is achieved. However, privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. Moreover, users' real-world operations are exploratory, with action data being complex and redundant, posing challenges for agent learning. To address these issues, in our practical application, we have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Additionally, we integrated Standard Operating Procedure (SOP) information within the model's in-context learning to enhance the agent's comprehension of complex task execution. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks, including application operation, web searching, and web shopping. Experimental results show that the SOP-based agent achieves state-of-the-art performance in LLMs without incurring additional inference costs, boasting an overall action success rate of 66.92\%. The code and data examples are available at //github.com/alipay/mobile-agent.
We tackle the challenging tasks of monitoring on unstable HPC platforms the performance of CFD applications all along their development. We have designed and implemented a monitoring framework, integrated at the end of a CI-CD pipeline. Measures retrieved during the automatic execution of production simulations are analyzed within a visual analytics interface we developed, providing advanced visualizations and interaction. We have validated this approach by monitoring the CFD code Alya over two years, detecting and resolving issues related to the platform, and highlighting performance improvement.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.