In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or instance segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility and effectiveness of synthetic EHOI data produced by the proposed tool, we designed a new method that predicts and combines different multimodal signals to detect EHOIs in RGB images. Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data. Moreover, the proposed approach outperforms state-of-the-art class-agnostic methods. To support research in this field, we publicly release the datasets, source code, and pre-trained models at //iplab.dmi.unict.it/egoism-hoi.
Developability refers to the process of creating a surface without any tearing or shearing from a two-dimensional plane. It finds practical applications in the fabrication industry. An essential characteristic of a developable 3D surface is its zero Gaussian curvature, which means that either one or both of the principal curvatures are zero. This paper introduces a method for reconstructing an approximate developable surface from a neural implicit surface. The central idea of our method involves incorporating a regularization term that operates on the second-order derivatives of the neural implicits, effectively promoting zero Gaussian curvature. Implicit surfaces offer the advantage of smoother deformation with infinite resolution, overcoming the high polygonal constraints of state-of-the-art methods using discrete representations. We draw inspiration from the properties of surface curvature and employ rank minimization techniques derived from compressed sensing. Experimental results on both developable and non-developable surfaces, including those affected by noise, validate the generalizability of our method.
In this paper, we propose an adaptive approach, based on mesh refinement or parametric enrichment, for convection diffusion equations containing randomness in their coefficients. A parametric system of convection diffusion equations obtained by an application of stochastic Galerkin approach is discretized by using a symmetric interior penalty Galerkin (SIPG) method with upwinding for the convection term in the spatial domain. We show the reliability of the proposed residual-based error estimator in the energy norm contributed by the error due to the SIPG discretization, the error due to the data oscillations, and the error due to the (generalized) polynomial chaos discretization in the parametric space. To illustrate the performance of the proposed estimator, several benchmark examples including a random diffusivity parameter, a random velocity parameter, random diffusivity/velocity parameters, and a random (jump) discontinuous diffusivity parameter, are tested.
In this paper, we propose a novel ROM stabilization strategy for under-resolved convection-dominated flows, the approximate deconvolution Leray ROM (ADL-ROM). The new ADL-ROM introduces AD as a new means to increase the accuracy of the classical Leray ROM (L-ROM) without degrading its numerical stability. We also introduce two new AD ROM strategies: the Tikhonov and van Cittert methods. Our numerical investigation for convection-dominated systems shows that, when the filter radius is relatively large, the new ADL-ROM is more accurate than the standard L-ROM. Furthermore, the new ADL-ROM is less sensitive with respect to model parameters than L-ROM.
In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e., the 3D geometry structure and 2D image feature, where we hierarchically recover the mesh geometry with priors from body structure and conduct token clustering to pass fewer but more discriminative image feature tokens to the Transformer. Our method massively reduces the number of tokens involved in high-complexity interactions in the Transformer. This leads to a significantly reduced computational cost while still achieving competitive or even higher accuracy in shape recovery. Extensive experiments across a wide range of benchmarks validate the superior effectiveness of the proposed method. We further demonstrate the generalizability of our method on hand mesh recovery. Visit our project page at //frank-zy-dou.github.io/projects/Tore/index.html.
Visual Inertial Odometry (VIO) is an essential component of modern Augmented Reality (AR) applications. However, VIO only tracks the relative pose of the device, leading to drift over time. Absolute pose estimation methods infer the device's absolute pose, but their accuracy depends on the input quality. This paper introduces VIO-APR, a new framework for markerless mobile AR that combines an absolute pose regressor (APR) with a local VIO tracking system. VIO-APR uses VIO to assess the reliability of the APR and the APR to identify and compensate for VIO drift. This feedback loop results in more accurate positioning and more stable AR experiences. To evaluate VIO-APR, we created a dataset that combines camera images with ARKit's VIO system output for six indoor and outdoor scenes of various scales. Over this dataset, VIO-APR improves the median accuracy of popular APR by up to 36\% in position and 29\% in orientation, increases the percentage of frames in the high ($0.25 m, 2^{\circ}$) accuracy level by up to 112\% and reduces the percentage of frames predicted below the low ($5 m, 10^\circ$) accuracy greatly. We implement VIO-APR into a mobile AR application using Unity to demonstrate its capabilities. VIO-APR results in noticeably more accurate localization and a more stable overall experience.
In this paper, we describe a new algorithm called Preferential Attachment k-class Classifier (PreAttacK) for detecting fake accounts in a social network. Recently, several algorithms have obtained high accuracy on this problem. However, they have done so by relying on information about fake accounts' friendships or the content they share with others--the very things we seek to prevent. PreAttacK represents a significant departure from these approaches. We provide some of the first detailed distributional analyses of how new fake (and real) accounts first attempt to request friends after joining a major network (Facebook). We show that even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth. We use this model to derive a new algorithm, PreAttacK. We prove that in relevant problem instances, PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests. These are the first provable guarantees for fake account detection that apply to new users, and that do not require strong homophily assumptions. This principled approach also makes PreAttacK the only algorithm with provable guarantees that obtains state-of-the-art performance on new users on the global Facebook network, where it converges to AUC=0.9 after new users send + receive a total of just 20 not-yet-answered friend requests. For comparison, state-of-the-art benchmarks do not obtain this AUC even after observing additional data on new users' first 100 friend requests. Thus, unlike mainstream algorithms, PreAttacK converges before the median new fake account has made a single friendship (accepted friend request) with a human.
In this paper, we consider the full Lambek calculus enriched with subexponential modalities in a distributive setting. We show that the distributive Lambek calculus with subexponentials is complete with respect to its Kripke frames via canonical extensions. In this approach, we consider subexponentials as S4-like modalities and each modality is interpreted with a reflexive and transitive relation similarly to usual Kripke semantics.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum