A novel unconstrained optimization model named weighted trace-penalty minimization (WTPM) is proposed to address the extreme eigenvalue problem arising from the Full Configuration Interaction (FCI) method. Theoretical analysis shows that the global minimizers of the WTPM objective function are the desired eigenvectors, rather than the eigenspace. Analyzing the condition number of the Hessian operator in detail contributes to the determination of a near-optimal weight matrix. With the sparse feature of FCI matrices in mind, the coordinate descent (CD) method is adapted to WTPM and results in WTPM-CD method. The reduction of computational and storage costs in each iteration shows the efficiency of the proposed algorithm. Finally, the numerical experiments demonstrate the capability to address large-scale FCI matrices.
Most conventional crowd counting methods utilize a fully-supervised learning framework to establish a mapping between scene images and crowd density maps. They usually rely on a large quantity of costly and time-intensive pixel-level annotations for training supervision. One way to mitigate the intensive labeling effort and improve counting accuracy is to leverage large amounts of unlabeled images. This is attributed to the inherent self-structural information and rank consistency within a single image, offering additional qualitative relation supervision during training. Contrary to earlier methods that utilized the rank relations at the original image level, we explore such rank-consistency relation within the latent feature spaces. This approach enables the incorporation of numerous pyramid partial orders, strengthening the model representation capability. A notable advantage is that it can also increase the utilization ratio of unlabeled samples. Specifically, we propose a Deep Rank-consistEnt pyrAmid Model (DREAM), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images. In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4,000 images for training purposes. Extensive experiments on four benchmark datasets, namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The codes are available at //github.com/bridgeqiqi/DREAM.
Matrix factorization (MF) is a simple collaborative filtering technique that achieves superior recommendation accuracy by decomposing the user-item rating matrix into user and item latent matrices. This approach relies on learning from user-item interactions, which may not effectively capture the underlying shared dependencies between users or items. Therefore, there is scope to explicitly capture shared dependencies to further improve recommendation accuracy and the interpretability of learning results by summarizing user-item interactions. Based on these insights, we propose "Hierarchical Matrix Factorization" (HMF), which incorporates clustering concepts to capture the hierarchy, where leaf nodes and other nodes correspond to users/items and clusters, respectively. Central to our approach, called hierarchical embeddings, is the additional decomposition of the user and item latent matrices (embeddings) into probabilistic connection matrices, which link the hierarchy, and a root cluster latent matrix. Thus, each node is represented by the weighted average of the embeddings of its parent clusters. The embeddings are differentiable, allowing simultaneous learning of interactions and clustering using a single gradient descent method. Furthermore, the obtained cluster-specific interactions naturally summarize user-item interactions and provide interpretability. Experimental results on rating and ranking predictions demonstrated the competitiveness of HMF over vanilla and hierarchical MF methods, especially its robustness in sparse interactions. Additionally, it was confirmed that the clustering integration of HMF has the potential for faster learning convergence and mitigation of overfitting compared to MF, and also provides interpretability through a cluster-centered case study.
Millimeter wave (mmWave) sensing is an emerging technology with applications in 3D object characterization and environment mapping. However, realizing precise 3D reconstruction from sparse mmWave signals remains challenging. Existing methods rely on data-driven learning, constrained by dataset availability and difficulty in generalization. We propose DiffSBR, a differentiable framework for mmWave-based 3D reconstruction. DiffSBR incorporates a differentiable ray tracing engine to simulate radar point clouds from virtual 3D models. A gradient-based optimizer refines the model parameters to minimize the discrepancy between simulated and real point clouds. Experiments using various radar hardware validate DiffSBR's capability for fine-grained 3D reconstruction, even for novel objects unseen by the radar previously. By integrating physics-based simulation with gradient optimization, DiffSBR transcends the limitations of data-driven approaches and pioneers a new paradigm for mmWave sensing.
Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the tissues. Such artifacts can manifest as blurring, ghosting, or misregistration of the reconstructed image, and they often compromise its diagnostic quality. We propose to resolve these artifacts by lifting the 2D MRI reconstruction problem to 3D, introducing an additional "spectral" dimension to model this off-resonance. Our approach is inspired by recent progress in modeling radiance fields, and is capable of reconstructing both static and dynamic MR images as well as separating fat and water, which is of independent clinical interest. We demonstrate our approach in the context of PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) MRI acquisitions, which are popular for their robustness to motion artifacts. Our method operates in a few minutes on a single GPU, and to our knowledge is the first to correct for chemical shift in gradient echo PROPELLER MRI reconstruction without additional measurements or pretraining data.
The common spatial pattern analysis (CSP) is a widely used signal processing technique in brain-computer interface (BCI) systems to increase the signal-to-noise ratio in electroencephalogram (EEG) recordings. Despite its popularity, the CSP's performance is often hindered by the nonstationarity and artifacts in EEG signals. The minmax CSP improves the robustness of the CSP by using data-driven covariance matrices to accommodate the uncertainties. We show that by utilizing the optimality conditions, the minmax CSP can be recast as an eigenvector-dependent nonlinear eigenvalue problem (NEPv). We introduce a self-consistent field (SCF) iteration with line search that solves the NEPv of the minmax CSP. Local quadratic convergence of the SCF for solving the NEPv is illustrated using synthetic datasets. More importantly, experiments with real-world EEG datasets show the improved motor imagery classification rates and shorter running time of the proposed SCF-based solver compared to the existing algorithm for the minmax CSP.
Ultra-reliable low-latency communication (URLLC) constitutes a key service class of the fifth generation and beyond cellular networks. Notably, designing and supporting URLLC poses a herculean task due to the fundamental need to identify and accurately characterize the underlying statistical models in which the system operates, e.g., interference statistics, channel conditions, and the behavior of protocols. In general, multi-layer end-to-end approaches considering all the potential delay and error sources and proper statistical tools and methodologies are inevitably required for providing strong reliability and latency guarantees. This paper contributes to the body of knowledge in the latter aspect by providing a tutorial on several statistical tools and methodologies that are useful for designing and analyzing URLLC systems. Specifically, we overview the frameworks related to i) reliability theory, ii) short packet communications, iii) inequalities, distribution bounds, and tail approximations, iv) rare events simulation, vi) queuing theory and information freshness, and v) large-scale tools such as stochastic geometry, clustering, compressed sensing, and mean-field games. Moreover, we often refer to prominent data-driven algorithms within the scope of the discussed tools/methodologies. Throughout the paper, we briefly review the state-of-the-art works using the addressed tools and methodologies, and their link to URLLC systems. Moreover, we discuss novel application examples focused on physical and medium access control layers. Finally, key research challenges and directions are highlighted to elucidate how URLLC analysis/design research may evolve in the coming years.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.