Tensor robust principal component analysis (TRPCA) is a classical way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular value equally. However, for real-world visual data, large singular values represent more significant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms existing TRPCA methods in visual data recovery. The demo code is available at //github.com/qguo2010/NN-TRPCA.
In answer set programming (ASP), answer sets capture solutions to search problems of interest and thus the efficient computation of answer sets is of utmost importance. One viable implementation strategy is provided by translation-based ASP where logic programs are translated into other KR formalisms such as Boolean satisfiability (SAT), SAT modulo theories (SMT), and mixed-integer programming (MIP). Consequently, existing solvers can be harnessed for the computation of answer sets. Many of the existing translations rely on program completion and level rankings to capture the minimality of answer sets and default negation properly. In this work, we take level ranking constraints into reconsideration, aiming at their generalizations to cover aggregate-based extensions of ASP in more systematic way. By applying a number of program transformations, ranking constraints can be rewritten in a general form that preserves the structure of monotone and convex aggregates and thus offers a uniform basis for their incorporation into translation-based ASP. The results open up new possibilities for the implementation of translators and solver pipelines in practice.
Due to the mutual occlusion, severe scale variation, and complex spatial distribution, the current multi-person mesh recovery methods cannot produce accurate absolute body poses and shapes in large-scale crowded scenes. To address the obstacles, we fully exploit crowd features for reconstructing groups of people from a monocular image. A novel hypergraph relational reasoning network is proposed to formulate the complex and high-order relation correlations among individuals and groups in the crowd. We first extract compact human features and location information from the original high-resolution image. By conducting the relational reasoning on the extracted individual features, the underlying crowd collectiveness and interaction relationship can provide additional group information for the reconstruction. Finally, the updated individual features and the localization information are used to regress human meshes in camera coordinates. To facilitate the network training, we further build pseudo ground-truth on two crowd datasets, which may also promote future research on pose estimation and human behavior understanding in crowded scenes. The experimental results show that our approach outperforms other baseline methods both in crowded and common scenarios. The code and datasets are publicly available at //github.com/boycehbz/GroupRec.
Recent work has demonstrated the significant potential of denoising diffusion models for generating human motion, including text-to-motion capabilities. However, these methods are restricted by the paucity of annotated motion data, a focus on single-person motions, and a lack of detailed control. In this paper, we introduce three forms of composition based on diffusion priors: sequential, parallel, and model composition. Using sequential composition, we tackle the challenge of long sequence generation. We introduce DoubleTake, an inference-time method with which we generate long animations consisting of sequences of prompted intervals and their transitions, using a prior trained only for short clips. Using parallel composition, we show promising steps toward two-person generation. Beginning with two fixed priors as well as a few two-person training examples, we learn a slim communication block, ComMDM, to coordinate interaction between the two resulting motions. Lastly, using model composition, we first train individual priors to complete motions that realize a prescribed motion for a given joint. We then introduce DiffusionBlending, an interpolation mechanism to effectively blend several such models to enable flexible and efficient fine-grained joint and trajectory-level control and editing. We evaluate the composition methods using an off-the-shelf motion diffusion model, and further compare the results to dedicated models trained for these specific tasks.
Random linear codes (RLCs) are well known to have nice combinatorial properties and near-optimal parameters in many different settings. However, getting explicit constructions matching the parameters of RLCs is challenging, and RLCs are hard to decode efficiently. This motivated several previous works to study the problem of partially derandomizing RLCs, by applying certain operations to an explicit mother code. Among them, one of the most well studied operations is random puncturing, where a series of works culminated in the work of Guruswami and Mosheiff (FOCS' 22), which showed that a random puncturing of a low-biased code is likely to possess almost all interesting local properties of RLCs. In this work, we provide an in-depth study of another, dual operation of random puncturing, known as random shortening, which can be viewed equivalently as random puncturing on the dual code. Our main results show that for any small $\varepsilon$, by starting from a mother code with certain weaker conditions (e.g., having a large distance) and performing a random (or even pseudorandom) shortening, the new code is $\varepsilon$-biased with high probability. Our results hold for any field size and yield a shortened code with constant rate. This can be viewed as a complement to random puncturing, and together, we can obtain codes with properties like RLCs from weaker initial conditions. Our proofs involve several non-trivial methods of estimating the weight distribution of codewords, which may be of independent interest.
Turbo-Codes (TC) are a family of convolutional codes enabling Forward-Error-Correction (FEC) while approaching the theoretical limit of channel capacity predicted by Shannons theorem. One of the bottlenecks of a Turbo Encoder (TE) lies in the non-uniform interleaving stage. Interleaving algorithms require stalling the input vector bits before the bit rearrangement causing a delay in the overall process. This paper presents performance enhancement via a parallel algorithm for the interleaving stage of a Turbo Encoder application compliant with the DVB-RCS2 standard. The algorithm efficiently implements the interleaving operation while utilizing attributes of a given DSP. We will discuss and compare a serial model for the TE, with the presented parallel processed algorithm. Results showed a speed-up factor of up to 3.4 Total-Cycles, 4.8 Write and 7.3 Read.
Sparse variational approximations are popular methods for scaling up inference and learning in Gaussian processes to larger datasets. For $N$ training points, exact inference has $O(N^3)$ cost; with $M \ll N$ features, state of the art sparse variational methods have $O(NM^2)$ cost. Recently, methods have been proposed using more sophisticated features; these promise $O(M^3)$ cost, with good performance in low dimensional tasks such as spatial modelling, but they only work with a very limited class of kernels, excluding some of the most commonly used. In this work, we propose integrated Fourier features, which extends these performance benefits to a very broad class of stationary covariance functions. We motivate the method and choice of parameters from a convergence analysis and empirical exploration, and show practical speedup in synthetic and real world spatial regression tasks.
Whilst contrastive learning yields powerful representations by matching different augmented views of the same instance, it lacks the ability to capture the similarities between different instances. One popular way to address this limitation is by learning global features (after the global pooling) to capture inter-instance relationships based on knowledge distillation, where the global features of the teacher are used to guide the learning of the global features of the student. Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other. This leads to our novel self-supervised framework: cross-context learning between global and hypercolumn features (CGH), that enforces the consistency of instance relations between low- and high-level semantics. Specifically, we stack the intermediate feature maps to construct a hypercolumn representation so that we can measure instance relations using two contexts (hypercolumn and global feature) separately, and then use the relations of one context to guide the learning of the other. This cross-context learning allows the model to learn from the differences between the two contexts. The experimental results on linear classification and downstream tasks show that our method outperforms the state-of-the-art methods.
Dichotomy theorems, which characterize the conditions under which a problem can be solved efficiently, have helped identify important tractability borders for as probabilistic query evaluation, view maintenance, query containment (among many more problems). However, dichotomy theorems for many such problems remain elusive under key settings such as bag semantics or for queries with self-joins. This work aims to unearth dichotomies for fundamental problems in reverse data management and knowledge representation. We use a novel approach to discovering dichotomies: instead of creating dedicated algorithms for easy (PTIME) and hard cases (NP-complete), we devise unified algorithms that are guaranteed to terminate in PTIME for easy cases. Using this approach, we discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility
Incorporating covariates into functional principal component analysis (PCA) can substantially improve the representation efficiency of the principal components and predictive performance. However, many existing functional PCA methods do not make use of covariates, and those that do often have high computational cost or make overly simplistic assumptions that are violated in practice. In this article, we propose a new framework, called Covariate Dependent Functional Principal Component Analysis (CD-FPCA), in which both the mean and covariance structure depend on covariates. We propose a corresponding estimation algorithm, which makes use of spline basis representations and roughness penalties, and is substantially more computationally efficient than competing approaches of adequate estimation and prediction accuracy. A key aspect of our work is our novel approach for modeling the covariance function and ensuring that it is symmetric positive semi-definite. We demonstrate the advantages of our methodology through a simulation study and an astronomical data analysis.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.