The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALADs), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.
The efficacy of interpolating via Variably Scaled Kernels (VSKs) is known to be dependent on the definition of a proper scaling function, but no numerical recipes to construct it are available. Previous works suggest that such a function should mimic the target one, but no theoretical evidence is provided. This paper fills both the gaps: it proves that a scaling function reflecting the target one may lead to enhanced approximation accuracy, and it provides a user-independent tool for learning the scaling function by means of Discontinuous Neural Networks ({\delta}NN), i.e., NNs able to deal with possible discontinuities. Numerical evidence supports our claims, as it shows that the key features of the target function can be clearly recovered in the learned scaling function.
Recently, constructions of optimal linear codes from simplicial complexes have attracted much attention and some related nice works were presented. Let $q$ be a prime power. In this paper, by using the simplicial complexes of ${\mathbb F}_{q}^m$ with one single maximal element, we construct four families of linear codes over the ring ${\mathbb F}_{q}+u{\mathbb F}_{q}$ ($u^2=0$), which generalizes the results of [IEEE Trans. Inf. Theory 66(6):3657-3663, 2020]. The parameters and Lee weight distributions of these four families of codes are completely determined. Most notably, via the Gray map, we obtain several classes of optimal linear codes over ${\mathbb F}_{q}$, including (near) Griesmer codes and distance-optimal codes.
High-performance computing (HPC) has revolutionized our ability to perform detailed simulations of complex real-world processes. A prominent contemporary example is from aerospace propulsion, where HPC is used for rotating detonation rocket engine (RDRE) simulations in support of the design of next-generation rocket engines; however, these simulations take millions of core hours even on powerful supercomputers, which makes them impractical for engineering tasks like design exploration and risk assessment. Reduced-order models (ROMs) address this limitation by constructing computationally cheap yet sufficiently accurate approximations that serve as surrogates for the high-fidelity model. This paper contributes a new distributed algorithm that achieves fast and scalable construction of predictive physics-based ROMs trained from sparse datasets of extremely large state dimension. The algorithm learns structured physics-based ROMs that approximate the dynamical systems underlying those datasets. This enables model reduction for problems at a scale and complexity that exceeds the capabilities of existing approaches. We demonstrate our algorithm's scalability using up to $2,048$ cores on the Frontera supercomputer at the Texas Advanced Computing Center. We focus on a real-world three-dimensional RDRE for which one millisecond of simulated physical time requires one million core hours on a supercomputer. Using a training dataset of $2,536$ snapshots each of state dimension $76$ million, our distributed algorithm enables the construction of a predictive data-driven reduced model in just $13$ seconds on $2,048$ cores on Frontera.
State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the state-of-the-art CNN-based methods in scene coordinate regression in indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts, even in the absence of the labeled data sources. The source code, trained models, dataset, and demo videos are available at the following link: //thpjp.github.io/d2s.
We present HUP-3D, a 3D multi-view multi-modal synthetic dataset for hand-ultrasound (US) probe pose estimation in the context of obstetric ultrasound. Egocentric markerless 3D joint pose estimation has potential applications in mixed reality based medical education. The ability to understand hand and probe movements programmatically opens the door to tailored guidance and mentoring applications. Our dataset consists of over 31k sets of RGB, depth and segmentation mask frames, including pose related ground truth data, with a strong emphasis on image diversity and complexity. Adopting a camera viewpoint-based sphere concept allows us to capture a variety of views and generate multiple hand grasp poses using a pre-trained network. Additionally, our approach includes a software-based image rendering concept, enhancing diversity with various hand and arm textures, lighting conditions, and background images. Furthermore, we validated our proposed dataset with state-of-the-art learning models and we obtained the lowest hand-object keypoint errors. The dataset and other details are provided with the supplementary material. The source code of our grasp generation and rendering pipeline will be made publicly available.
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: //github.com/huawei-noah/HEBO/tree/master/ROSLLM.
Multiparty message-passing protocols are notoriously difficult to design, due to interaction mismatches that lead to errors such as deadlocks. Existing protocol specification formats have been developed to prevent such errors (e.g. multiparty session types (MPST)). In order to further constrain protocols, specifications can be extended with refinements, i.e. logical predicates to control the behaviour of the protocol based on previous values exchanged. Unfortunately, existing refinement theories and implementations are tightly coupled with specification formats. This paper proposes a framework for multiparty message-passing protocols with refinements and its implementation in Rust. Our work decouples correctness of refinements from the underlying model of computation, which results in a specification-agnostic framework. Our contributions are threefold. First, we introduce a trace system which characterises valid refined traces, i.e. a sequence of sending and receiving actions correct with respect to refinements. Second, we give a correct model of computation named refined communicating system (RCS), which is an extension of communicating automata systems with refinements. We prove that RCS only produce valid refined traces. We show how to generate RCS from mainstream protocol specification formats, such as refined multiparty session types (RMPST) or refined choreography automata. Third, we illustrate the flexibility of the framework by developing both a static analysis technique and an improved model of computation for dynamic refinement evaluation. Finally, we provide a Rust toolchain for decentralised RMPST, evaluate our implementation with a set of benchmarks from the literature, and observe that refinement overhead is negligible.
We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posterior-based estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the Dirichlet-Laplacian eigenbasis and Mat\'ern process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study.
Although recently several foundation models for satellite remote sensing imagery have been proposed, they fail to address major challenges of real/operational applications. Indeed, embeddings that don't take into account the spectral, spatial and temporal dimensions of the data as well as the irregular or unaligned temporal sampling are of little use for most real world uses.As a consequence, we propose an ALIgned Sits Encoder (ALISE), a novel approach that leverages the spatial, spectral, and temporal dimensions of irregular and unaligned SITS while producing aligned latent representations. Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space. Additionally, thanks to a multi-view framework, we explore integration of instance discrimination along a masked autoencoding task to SITS. The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE), and a novel crop change detection dataset. Furthermore, the change detection task is performed without supervision. The results suggest that the use of aligned representations is more effective than previous SSL methods for linear probing segmentation tasks.
A common method for estimating the Hessian operator from random samples on a low-dimensional manifold involves locally fitting a quadratic polynomial. Although widely used, it is unclear if this estimator introduces bias, especially in complex manifolds with boundaries and nonuniform sampling. Rigorous theoretical guarantees of its asymptotic behavior have been lacking. We show that, under mild conditions, this estimator asymptotically converges to the Hessian operator, with nonuniform sampling and curvature effects proving negligible, even near boundaries. Our analysis framework simplifies the intensive computations required for direct analysis.