The linear transform-based tensor nuclear norm (TNN) methods have recently obtained promising results for tensor completion. The main idea of this type of methods is exploiting the low-rank structure of frontal slices of the targeted tensor under the linear transform along the third mode. However, the low-rankness of frontal slices is not significant under linear transforms family. To better pursue the low-rank approximation, we propose a nonlinear transform-based TNN (NTTNN). More concretely, the proposed nonlinear transform is a composite transform consisting of the linear semi-orthogonal transform along the third mode and the element-wise nonlinear transform on frontal slices of the tensor under the linear semi-orthogonal transform, which are indispensable and complementary in the composite transform to fully exploit the underlying low-rankness. Based on the suggested low-rankness metric, i.e., NTTNN, we propose a low-rank tensor completion (LRTC) model. To tackle the resulting nonlinear and nonconvex optimization model, we elaborately design the proximal alternating minimization (PAM) algorithm and establish the theoretical convergence guarantee of the PAM algorithm. Extensive experimental results on hyperspectral images, multispectral images, and videos show that the our method outperforms linear transform-based state-of-the-art LRTC methods qualitatively and quantitatively.
Low-rank tensor completion has been widely used in computer vision and machine learning. This paper develops a novel multi-modal core tensor factorization (MCTF) method combined with a tensor low-rankness measure and a better nonconvex relaxation form of this measure (NC-MCTF). The proposed models encode low-rank insights for general tensors provided by Tucker and T-SVD, and thus are expected to simultaneously model spectral low-rankness in multiple orientations and accurately restore the data of intrinsic low-rank structure based on few observed entries. Furthermore, we study the MCTF and NC-MCTF regularization minimization problem, and design an effective block successive upper-bound minimization (BSUM) algorithm to solve them. This efficient solver can extend MCTF to various tasks, such as tensor completion. A series of experiments, including hyperspectral image (HSI), video and MRI completion, confirm the superior performance of the proposed method.
We study the mathematical structure of the solution set (and its tangent space) to the matrix equation $G^*JG=J$ for a given square matrix $J$. In the language of pure mathematics, this is a Lie group which is the isometry group for a bilinear (or a sesquilinear) form. Generally these groups are described as intersections of a few special groups. The tangent space to $\{G: G^*JG=J \}$ consists of solutions to the linear matrix equation $X^*J+JX=0$. For the complex case, the solution set of this linear equation was computed by De Ter{\'a}n and Dopico. We found that on its own, the equation $X^*J+JX=0$ is hard to solve. By throwing into the mix the complementary linear equation $X^*J-JX=0$, we find that rather than increasing the complexity, we reduce the complexity. Not only is it possible to now solve the original problem, but we can approach the broader algebraic and geometric structure. One implication is that the two equations form an $\mathfrak{h}$ and $\mathfrak{m}$ pair familiar in the study of pseudo-Riemannian symmetric spaces. We explicitly demonstrate the computation of the solutions to the equation $X^*J\pm XJ=0$ for real and complex matrices. However, any real, complex or quaternionic case with an arbitrary involution (e.g., transpose, conjugate transpose, and the various quaternion transposes) can be effectively solved with the same strategy. We provide numerical examples and visualizations.
We describe a rational, but low resolution model of probability.
Inspired by recent strides in empirical efficacy of implicit learning in many robotics tasks, we seek to understand the theoretical benefits of implicit formulations in the face of nearly discontinuous functions, common characteristics for systems that make and break contact with the environment such as in legged locomotion and manipulation. We present and motivate three formulations for learning a function: one explicit and two implicit. We derive generalization bounds for each of these three approaches, exposing where explicit and implicit methods alike based on prediction error losses typically fail to produce tight bounds, in contrast to other implicit methods with violation-based loss definitions that can be fundamentally more robust to steep slopes. Furthermore, we demonstrate that this violation implicit loss can tightly bound graph distance, a quantity that often has physical roots and handles noise in inputs and outputs alike, instead of prediction losses which consider output noise only. Our insights into the generalizability and physical relevance of violation implicit formulations match evidence from prior works and are validated through a toy problem, inspired by rigid-contact models and referenced throughout our theoretical analysis.
We study fractional variants of the quasi-norms introduced by Brezis, Van Schaftingen, and Yung in the study of the Sobolev space $\dot W^{1,p}$. The resulting spaces are identified as a special class of real interpolation spaces of Sobolev-Slobodecki\u{\i} spaces. We establish the equivalence between Fourier analytic definitions and definitions via difference operators acting on measurable functions. We prove various new results on embeddings and non-embeddings, and give applications to harmonic and caloric extensions. For suitable wavelet bases we obtain a characterization of the approximation spaces for best $n$-term approximation from a wavelet basis via smoothness conditions on the function; this extends a classical result by DeVore, Jawerth and Popov.
Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.
Unpaired image-to-image translation has been applied successfully to natural images but has received very little attention for manifold-valued data such as in diffusion tensor imaging (DTI). The non-Euclidean nature of DTI prevents current generative adversarial networks (GANs) from generating plausible images and has mainly limited their application to diffusion MRI scalar maps, such as fractional anisotropy (FA) or mean diffusivity (MD). Even if these scalar maps are clinically useful, they mostly ignore fiber orientations and therefore have limited applications for analyzing brain fibers. Here, we propose a manifold-aware CycleGAN that learns the generation of high-resolution DTI from unpaired T1w images. We formulate the objective as a Wasserstein distance minimization problem of data distributions on a Riemannian manifold of symmetric positive definite 3x3 matrices SPD(3), using adversarial and cycle-consistency losses. To ensure that the generated diffusion tensors lie on the SPD(3) manifold, we exploit the theoretical properties of the exponential and logarithm maps of the Log-Euclidean metric. We demonstrate that, unlike standard GANs, our method is able to generate realistic high-resolution DTI that can be used to compute diffusion-based metrics and potentially run fiber tractography algorithms. To evaluate our model's performance, we compute the cosine similarity between the generated tensors principal orientation and their ground-truth orientation, the mean squared error (MSE) of their derived FA values and the Log-Euclidean distance between the tensors. We demonstrate that our method produces 2.5 times better FA MSE than a standard CycleGAN and up to 30% better cosine similarity than a manifold-aware Wasserstein GAN while synthesizing sharp high-resolution DTI.
We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nodes and achieves graph matching. When one of the graphs has isolated but self-connected nodes ($i.e.$, a disconnected graph), the optimal transport indicates the clustering structure of the other graph and achieves graph partitioning. Using this concept, we extend our method to multi-graph partitioning and matching by learning a Gromov-Wasserstein barycenter graph for multiple observed graphs; the barycenter graph plays the role of the disconnected graph, and since it is learned, so is the clustering. Our method combines a recursive $K$-partition mechanism with a regularized proximal gradient algorithm, whose time complexity is $\mathcal{O}(K(E+V)\log_K V)$ for graphs with $V$ nodes and $E$ edges. To our knowledge, our method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework. It outperforms state-of-the-art graph partitioning and matching methods, achieving a trade-off between accuracy and efficiency.
Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.
Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology structure to learn node representations, recently proposed attributed network embedding algorithms attempt to integrate rich node content information with network topological structure for enhancing the quality of network embedding. In reality, networks often have sparse content, incomplete node attributes, as well as the discrepancy between node attribute feature space and network structure space, which severely deteriorates the performance of existing methods. In this paper, we propose a unified framework for attributed network embedding-attri2vec-that learns node embeddings by discovering a latent node attribute subspace via a network structure guided transformation performed on the original attribute space. The resultant latent subspace can respect network structure in a more consistent way towards learning high-quality node representations. We formulate an optimization problem which is solved by an efficient stochastic gradient descent algorithm, with linear time complexity to the number of nodes. We investigate a series of linear and non-linear transformations performed on node attributes and empirically validate their effectiveness on various types of networks. Another advantage of attri2vec is its ability to solve out-of-sample problems, where embeddings of new coming nodes can be inferred from their node attributes through the learned mapping function. Experiments on various types of networks confirm that attri2vec is superior to state-of-the-art baselines for node classification, node clustering, as well as out-of-sample link prediction tasks. The source code of this paper is available at //github.com/daokunzhang/attri2vec.