Logarithmic Number Systems (LNS) hold considerable promise in helping reduce the number of bits needed to represent a high dynamic range of real-numbers with finite precision, and also efficiently support multiplication and division. However, under LNS, addition and subtraction turn into non-linear functions that must be approximated - typically using precomputed table-based functions. Additionally, multiple layers of error correction are typically needed to improve result accuracy. Unfortunately, previous efforts have not characterized the resulting error bound. We provide the first rigorous analysis of LNS, covering detailed techniques such as co-transformation that are crucial to implementing subtraction with reasonable accuracy. We provide theorems capturing the error due to table interpolations, the finite precision of pre-computed values in the tables, and the error introduced by fix-point multiplications involved in LNS implementations. We empirically validate our analysis using a Python implementation, showing that our analytical bounds are tight, and that our testing campaign generates inputs diverse-enough to almost match (but not exceed) the analytical bounds. We close with discussions on how to adapt our analysis to LNS systems with different bases and also discuss many pragmatic ramifications of our work in the broader arena of scientific computing and machine learning.
Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs. However, our investigation reveals two limitations: ineffective feature learning from single modalities and overfitting to limited datasets. To address these issues, this paper proposes a novel SSC framework - Adversarial Modality Modulation Network (AMMNet) - with a fresh perspective of optimizing gradient updates. The proposed AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Specifically, the cross-modal modulation adaptively re-calibrates the features to better excite representation potentials from each single modality. The adversarial training employs a minimax game of evolving gradients, with customized guidance to strengthen the generator's perception of visual fidelity from both geometric completeness and semantic correctness. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin, providing a promising direction for improving the effectiveness and generalization of SSC methods.
We propose a new method for cloth digitalization. Deviating from existing methods which learn from data captured under relatively casual settings, we propose to learn from data captured in strictly tested measuring protocols, and find plausible physical parameters of the cloths. However, such data is currently absent, so we first propose a new dataset with accurate cloth measurements. Further, the data size is considerably smaller than the ones in current deep learning, due to the nature of the data capture process. To learn from small data, we propose a new Bayesian differentiable cloth model to estimate the complex material heterogeneity of real cloths. It can provide highly accurate digitalization from very limited data samples. Through exhaustive evaluation and comparison, we show our method is accurate in cloth digitalization, efficient in learning from limited data samples, and general in capturing material variations. Code and data are available //github.com/realcrane/Bayesian-Differentiable-Physics-for-Cloth-Digitalization
Both dual-functional radar-communication (DFRC) and massive multiple-input multiple-output (MIMO) have been recognized as enabling technologies for 6G wireless networks. This paper considers the advanced waveform design for hardware-efficient massive MIMO DFRC systems. Specifically, the transmit waveform is imposed with the quantized constant-envelope (QCE) constraint, which facilitates the employment of low-resolution digital-to-analog converters (DACs) and power-efficient amplifiers. The waveform design problem is formulated as the minimization of the mean square error (MSE) between the designed and desired beampatterns subject to the constructive interference (CI)-based communication quality of service (QoS) constraints and the QCE constraint. To solve the formulated problem, we first utilize the penalty technique to transform the discrete problem into an equivalent continuous penalty model. Then, we propose an inexact augmented Lagrangian method (ALM) algorithm for solving the penalty model. In particular, the ALM subproblem at each iteration is solved by a custom-built block successive upper-bound minimization (BSUM) algorithm, which admits closed-form updates, making the proposed inexact ALM algorithm computationally efficient. Simulation results demonstrate the superiority of the proposed approach over existing state-of-the-art ones. In addition, extensive simulations are conducted to examine the impact of various system parameters on the trade-off between communication and radar performances.
Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses on Inverse Optimal Transport (iOT), the problem of inferring the ground cost from samples drawn from a coupling that solves an eOT problem. It is a relevant problem that can be used to infer unobserved/missing links, and to obtain meaningful information about the structure of the ground cost yielding the pairing. On one side, iOT benefits from convexity, but on the other side, being ill-posed, it requires regularization to handle the sampling noise. This work presents an in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features. Specifically, we derive a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition. To provide additional insight into this condition, we work out in detail the Gaussian case. We show that as the entropic penalty varies, the iOT problem interpolates between a graphical Lasso and a classical Lasso, thereby establishing a connection between iOT and graph estimation, an important problem in ML.
Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a Multimodal Counterfactual Inference Sentiment (MCIS) analysis framework based on causality rather than conventional likelihood. Concretely, we first formulate a causal graph to discover harmful biases from already-trained vanilla models. In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases. Then, MCIS can make unbiased decisions from biased observations by comparing factual and counterfactual outcomes. We conduct extensive experiments on several standard MSA benchmarks. Qualitative and quantitative results show the effectiveness of the proposed framework.
We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [SODA 2010]. The current best known expected approximation ratio for an $\epsilon$-DP algorithm is $O\left(\frac{\log n}{\sqrt{\epsilon}}\right)$ due to Cohen-Addad et al. [AISTATS 2022] where $n$ denote the size of the metric space, meanwhile the best known lower bound is $\Omega(1/\sqrt{\epsilon})$ [NeurIPS 2019]. In this short note, we give a lower bound of $\tilde{\Omega}\left(\min\left\{\log n, \sqrt{\frac{\log n}{\epsilon}}\right\}\right)$ on the expected approximation ratio of any $\epsilon$-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.
The application of eigenvalue theory to dual quaternion Hermitian matrices holds significance in the realm of multi-agent formation control. In this paper, we study the Rayleigh quotient iteration (RQI) for solving the right eigenpairs of dual quaternion Hermitian matrices. Combined with dual representation, the RQI algorithm can effectively compute the extreme eigenvalue along with the associated eigenvector of the large dual quaternion Hermitian matrices. Furthermore, a convergence analysis of the Rayleigh quotient iteration is derived, demonstrating a local convergence rate of at least cubic, which is faster than the linear convergence rate of the power method. Numerical examples are provided to illustrate the high accuracy and low CPU time cost of the proposed Rayleigh quotient iteration compared with the power method for solving the dual quaternion Hermitian eigenvalue problem.
Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks. Despite its powerful capacity to learn and generalize from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep, which limit the model scalability. In this work, we propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism, \ie node self-attention, neighborhood attention, and layer memory attention. We explain why the proposed attentive modules can improve GNN for few-shot learning with theoretical analysis and illustrations. Extensive experiments show that the proposed Attentive GNN outperforms the state-of-the-art GNN-based methods for few-shot learning over the mini-ImageNet and Tiered-ImageNet datasets, with both inductive and transductive settings.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.