We introduce the new setting of open-vocabulary object 6D pose estimation, in which a textual prompt is used to specify the object of interest. In contrast to existing approaches, in our setting (i) the object of interest is specified solely through the textual prompt, (ii) no object model (e.g. CAD or video sequence) is required at inference, (iii) the object is imaged from two different viewpoints of two different scenes, and (iv) the object was not observed during the training phase. To operate in this setting, we introduce a novel approach that leverages a Vision-Language Model to segment the object of interest from two distinct scenes and to estimate its relative 6D pose. The key of our approach is a carefully devised strategy to fuse object-level information provided by the prompt with local image features, resulting in a feature space that can generalize to novel concepts. We validate our approach on a new benchmark based on two popular datasets, REAL275 and Toyota-Light, which collectively encompass 39 object instances appearing in four thousand image pairs. The results demonstrate that our approach outperforms both a well-established hand-crafted method and a recent deep learning-based baseline in estimating the relative 6D pose of objects in different scenes. Project page: //jcorsetti.github.io/oryon/.
Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2023c) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs.
This work aims at making a comprehensive contribution in the general area of parametric inference for discretely observed diffusion processes. Established approaches for likelihood-based estimation invoke a time-discretisation scheme for the approximation of the intractable transition dynamics of the Stochastic Differential Equation (SDE) model over finite time periods. The scheme is applied for a step-size that is either user-selected or determined by the data. Recent research has highlighted the critical ef-fect of the choice of numerical scheme on the behaviour of derived parameter estimates in the setting of hypo-elliptic SDEs. In brief, in our work, first, we develop two weak second order sampling schemes (to cover both hypo-elliptic and elliptic SDEs) and produce a small time expansion for the density of the schemes to form a proxy for the true intractable SDE transition density. Then, we establish a collection of analytic results for likelihood-based parameter estimates obtained via the formed proxies, thus providing a theoretical framework that showcases advantages from the use of the developed methodology for SDE calibration. We present numerical results from carrying out classical or Bayesian inference, for both elliptic and hypo-elliptic SDEs.
Background: Recent studies have used basic epicardial adipose tissue (EAT) assessments (e.g., volume and mean HU) to predict risk of atherosclerosis-related, major adverse cardiovascular events (MACE). Objectives: Create novel, hand-crafted EAT features, 'fat-omics', to capture the pathophysiology of EAT and improve MACE prediction. Methods: We segmented EAT using a previously-validated deep learning method with optional manual correction. We extracted 148 radiomic features (morphological, spatial, and intensity) and used Cox elastic-net for feature reduction and prediction of MACE. Results: Traditional fat features gave marginal prediction (EAT-volume/EAT-mean-HU/ BMI gave C-index 0.53/0.55/0.57, respectively). Significant improvement was obtained with 15 fat-omics features (C-index=0.69, test set). High-risk features included volume-of-voxels-having-elevated-HU-[-50, -30-HU] and HU-negative-skewness, both of which assess high HU, which as been implicated in fat inflammation. Other high-risk features include kurtosis-of-EAT-thickness, reflecting the heterogeneity of thicknesses, and EAT-volume-in-the-top-25%-of-the-heart, emphasizing adipose near the proximal coronary arteries. Kaplan-Meyer plots of Cox-identified, high- and low-risk patients were well separated with the median of the fat-omics risk, while high-risk group having HR 2.4 times that of the low-risk group (P<0.001). Conclusion: Preliminary findings indicate an opportunity to use more finely tuned, explainable assessments on EAT for improved cardiovascular risk prediction.
The Crank-Nicolson (CN) method is a well-known time integrator for evolutionary partial differential equations (PDEs) arising in many real-world applications. Since the solution at any time depends on the solution at previous time steps, the CN method will be inherently difficult to parallelize. In this paper, we consider a parallel method for the solution of evolutionary PDEs with the CN scheme. Using an all-at-once approach, we can solve for all time steps simultaneously using a parallelizable over time preconditioner within a standard iterative method. Due to the diagonalization of the proposed preconditioner, we can prove that most eigenvalues of preconditioned matrices are equal to 1 and the others lie in the set: $\left\{z\in\mathbb{C}: 1/(1 + \alpha) < |z| < 1/(1 - \alpha)~{\rm and}~\Re{e}(z) > 0\right\}$, where $0 < \alpha < 1$ is a free parameter. Meanwhile, the efficient implementation of this proposed preconditioner is described and a mesh-independent convergence rate of the preconditioned GMRES method is derived under certain conditions. Finally, we will verify our theoretical findings via numerical experiments on financial option pricing partial differential equations.
In this manuscript we propose and analyze an implicit two-point type method (or inertial method) for obtaining stable approximate solutions to linear ill-posed operator equations. The method is based on the iterated Tikhonov (iT) scheme. We establish convergence for exact data, and stability and semi-convergence for noisy data. Regarding numerical experiments we consider: i) a 2D Inverse Potential Problem, ii) an Image Deblurring Problem; the computational efficiency of the method is compared with standard implementations of the iT method.
Bias correction can often improve the finite sample performance of estimators. We show that the choice of bias correction method has no effect on the higher-order variance of semiparametrically efficient parametric estimators, so long as the estimate of the bias is asymptotically linear. It is also shown that bootstrap, jackknife, and analytical bias estimates are asymptotically linear for estimators with higher-order expansions of a standard form. In particular, we find that for a variety of estimators the straightforward bootstrap bias correction gives the same higher-order variance as more complicated analytical or jackknife bias corrections. In contrast, bias corrections that do not estimate the bias at the parametric rate, such as the split-sample jackknife, result in larger higher-order variances in the i.i.d. setting we focus on. For both a cross-sectional MLE and a panel model with individual fixed effects, we show that the split-sample jackknife has a higher-order variance term that is twice as large as that of the `leave-one-out' jackknife.
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements. Our approach demonstrates the potential of combined LLMs and acoustic models for a more natural and conversational interaction between humans and speech-enabled AI agents.
BCH codes form an important subclass of cyclic codes, and are widely used in compact discs, digital audio tapes and other data storage systems to improve data reliability. As far as we know, there are few results on $q$-ary BCH codes of length $n=\frac{q^{m}+1}{q+1}$. This is because it is harder to deal with BCH codes of such length. In this paper, we study $q$-ary BCH codes with lengths $n=\frac{q^{m}+1}{q+1}$ and $n=q^m+1$. These two classes of BCH codes are always LCD codes. For $n=\frac{q^{m}+1}{q+1}$, the dimensions of narrow-sense BCH codes of length $n$ with designed distance $\delta=\ell q^{\frac{m-1}{2}}+1$ are determined, where $q>2$ and $2\leq \ell \leq q-1$. Moreover, the largest coset leader is given for $m=3$ and the first two largest coset leaders are given for $q=2$. The parameters of BCH codes related to the first few largest coset leaders are investigated. Some binary BCH codes of length $n=\frac{2^m+1}{3}$ have optimal parameters. For ternary narrow-sense BCH codes of length $n=3^m+1$, a lower bound on the minimum distance of their dual codes is developed, which is good in some cases.
Self-testing is a method to certify quantum states and measurements in a device-independent way. The device-independent certification of quantum properties is purely based on input-output measurement statistics of the involved devices with minimal knowledge about their internal workings. Bipartite pure entangled states can be self-tested, but, in the case of multipartite pure entangled states, the answer is not so straightforward. Nevertheless, \v{S}upi\'{c} et al. recently introduced a novel self-testing method for any pure entangled quantum state, which leverages network assistance and relies on bipartite entangled measurements. Hence, their scheme loses the true device-independent flavor of self-testing. In this regard, we provide a self-testing scheme for genuine multipartite pure entangle states in the true sense by employing a generalized Hardy-type non-local argument. Our scheme involves only local operations and classical communications and does not depend on bipartite entangled measurements and is free from any network assistance. In addition, we provide the device-independent bound of the maximum probability of success for generalized Hardy-type nonlocality argument.
Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.