We study the linear elasticity system subject to singular forces. We show existence and uniqueness of solutions in two frameworks: weighted Sobolev spaces, where the weight belongs to the Muckenhoupt class $A_2$; and standard Sobolev spaces where the integrability index is less than $d/(d-1)$; $d$ is the spatial dimension. We propose a standard finite element scheme and provide optimal error estimates in the $\mathbf{L}^2$--norm. By proving well posedness, we clarify some issues concerning the study of generalized mixed problems in Banach spaces.
We give the first numerical calculation of the spectrum of the Laplacian acting on bundle-valued forms on a Calabi-Yau three-fold. Specifically, we show how to compute the approximate eigenvalues and eigenmodes of the Dolbeault Laplacian acting on bundle-valued $(p,q)$-forms on K\"ahler manifolds. We restrict our attention to line bundles over complex projective space and Calabi-Yau hypersurfaces therein. We give three examples. For two of these, $\mathbb{P}^3$ and a Calabi-Yau one-fold (a torus), we compare our numerics with exact results available in the literature and find complete agreement. For the third example, the Fermat quintic three-fold, there are no known analytic results, so our numerical calculations are the first of their kind. The resulting spectra pass a number of non-trivial checks that arise from Serre duality and the Hodge decomposition. The outputs of our algorithm include all the ingredients one needs to compute physical Yukawa couplings in string compactifications.
We present HoVer-UNet, an approach to distill the knowledge of the multi-branch HoVerNet framework for nuclei instance segmentation and classification in histopathology. We propose a compact, streamlined single UNet network with a Mix Vision Transformer backbone, and equip it with a custom loss function to optimally encode the distilled knowledge of HoVerNet, reducing computational requirements without compromising performances. We show that our model achieved results comparable to HoVerNet on the public PanNuke and Consep datasets with a three-fold reduction in inference time. We make the code of our model publicly available at //github.com/DIAGNijmegen/HoVer-UNet.
We present an efficient matrix-free geometric multigrid method for the elastic Helmholtz equation, and a suitable discretization. Many discretization methods had been considered in the literature for the Helmholtz equations, as well as many solvers and preconditioners, some of which are adapted for the elastic version of the equation. However, there is very little work considering the reciprocity of discretization and a solver. In this work, we aim to bridge this gap. By choosing an appropriate stencil for re-discretization of the equation on the coarse grid, we develop a multigrid method that can be easily implemented as matrix-free, relying on stencils rather than sparse matrices. This is crucial for efficient implementation on modern hardware. Using two-grid local Fourier analysis, we validate the compatibility of our discretization with our solver, and tune a choice of weights for the stencil for which the convergence rate of the multigrid cycle is optimal. It results in a scalable multigrid preconditioner that can tackle large real-world 3D scenarios.
We introduce the new setting of open-vocabulary object 6D pose estimation, in which a textual prompt is used to specify the object of interest. In contrast to existing approaches, in our setting (i) the object of interest is specified solely through the textual prompt, (ii) no object model (e.g. CAD or video sequence) is required at inference, (iii) the object is imaged from two different viewpoints of two different scenes, and (iv) the object was not observed during the training phase. To operate in this setting, we introduce a novel approach that leverages a Vision-Language Model to segment the object of interest from two distinct scenes and to estimate its relative 6D pose. The key of our approach is a carefully devised strategy to fuse object-level information provided by the prompt with local image features, resulting in a feature space that can generalize to novel concepts. We validate our approach on a new benchmark based on two popular datasets, REAL275 and Toyota-Light, which collectively encompass 39 object instances appearing in four thousand image pairs. The results demonstrate that our approach outperforms both a well-established hand-crafted method and a recent deep learning-based baseline in estimating the relative 6D pose of objects in different scenes. Project website: //jcorsetti.github.io/oryon-website/.
Robots operating in an open world will encounter novel objects with unknown physical properties, such as mass, friction, or size. These robots will need to sense these properties through interaction prior to performing downstream tasks with the objects. We propose a method that autonomously learns tactile exploration policies by developing a generative world model that is leveraged to 1) estimate the object's physical parameters using a differentiable Bayesian filtering algorithm and 2) develop an exploration policy using an information-gathering model predictive controller. We evaluate our method on three simulated tasks where the goal is to estimate a desired object property (mass, height or toppling height) through physical interaction. We find that our method is able to discover policies that efficiently gather information about the desired property in an intuitive manner. Finally, we validate our method on a real robot system for the height estimation task, where our method is able to successfully learn and execute an information-gathering policy from scratch.
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual or parallel examples, ParrotTTS can transfer voices across languages while preserving the speaker specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results in monolingual and multi-lingual scenarios. ParrotTTS outperforms state-of-the-art multi-lingual TTS models using only a fraction of paired data as latter.
Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.
We study the replica symmetry breaking (rsb) concepts on a generic level through the prism of recently introduced interpolating/comparison mechanisms for bilinearly indexed (bli) random processes. In particular, \cite{Stojnicnflgscompyx23} introduced a \emph{fully lifted} (fl) interpolating mechanism and \cite{Stojnicsflgscompyx23} developed its a \emph{stationarized fully lifted} (sfl) variant. Here, we present a sfl \emph{matching} mechanism that shows that the results obtained in \cite{Stojnicnflgscompyx23,Stojnicsflgscompyx23} completely correspond to the ones obtained by a statistical physics replica tool with the replica symmetry breaking (rsb) form suggested by Parisi in \cite{Par79,Parisi80,Par80}. The results are very generic as they allow to handle pretty much all bilinear models at once. Moreover, given that the results of \cite{Stojnicnflgscompyx23,Stojnicsflgscompyx23} are extendable to many other forms, the concepts presented here automatically extend to any such forms as well.
We study damped wave propagation problems phrased as abstract evolution equations in Hilbert spaces. Under some general assumptions, including a natural compatibility condition for initial values, we establish exponential decay estimates for all mild solutions using the language and tools of Hilbert complexes. This framework turns out strong enough to conduct our analysis but also general enough to include a number of interesting examples. Some of these are briefly discussed. By a slight modification of the main arguments, we also obtain corresponding decay results for numerical approximations obtained by compatible discretization strategies.
Positron Emission Tomography (PET) enables functional imaging of deep brain structures, but the bulk and weight of current systems preclude their use during many natural human activities, such as locomotion. The proposed long-term solution is to construct a robotic system that can support an imaging system surrounding the subject's head, and then move the system to accommodate natural motion. This requires a system to measure the motion of the head with respect to the imaging ring, for use by both the robotic system and the image reconstruction software. We report here the design, calibration, and experimental evaluation of a parallel string encoder mechanism for sensing this motion. Our results indicate that with kinematic calibration, the measurement system can achieve accuracy within 0.5mm, especially for small motions.