In this paper, we propose a novel method of formulating an NP-hard wireless channel assignment problem as a higher-order unconstrained binary optimization (HUBO), where the Grover adaptive search (GAS) is used to provide a quadratic speedup for solving the problem. The conventional method relies on a one-hot encoding of the channel indices, resulting in a quadratic formulation. By contrast, we conceive ascending and descending binary encodings of the channel indices, construct a specific quantum circuit, and derive the exact numbers of qubits and gates required by GAS. Our analysis clarifies that the proposed HUBO formulation significantly reduces the number of qubits and the query complexity compared with the conventional quadratic formulation. This advantage is achieved at the cost of an increased number of quantum gates, which we demonstrate can be reduced by our proposed descending binary encoding.
In this paper we study the Subset Sum Problem (SSP). Assuming the SSP has at most one solution, we provide a randomized quasi-polynomial algorithm which if the SSP has no solution, the algorithm always returns FALSE while if the SSP has a solution the algorithm returns TRUE with probability $\frac{1}{2^{\log(n)}}$. This can be seen as two types of coins. One coin, when tossed always returns TAILS while the other also returns HEADS but with probability $\frac{1}{2^{\log(n)}}$. Using the Law of Large Numbers one can identify the coin type and as such assert the existence of a solution to the SSP. The algorithm is developed in the more general framework of maximizing the distance to a given point over an intersection of balls.
In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs (Spielman Teng STOC 2004), standard approximation for directed graphs (Cohen et. al. STOC 2017), and unit-circle approximation for directed graphs (Ahmadinejad et. al. FOCS 2020). Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of random-walk matrices and bounded matrices. We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as $\ell$-step random walks on such graphs, for any $\ell\leq \text{poly}(n)$. Combined with the Eulerian scaling algorithms of (Cohen et. al. FOCS 2018), given an arbitrary (not necessarily Eulerian) directed graph and a set $S$ of vertices, we can approximate the stationary probability mass of the $(S,S^c)$ cut in an $\ell$-step random walk to within a multiplicative error of $1/\text{polylog}(n)$ and an additive error of $1/\text{poly}(n)$ in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation.
In this article we consider the estimation of static parameters for partially observed diffusion process with discrete-time observations over a fixed time interval. In particular, we assume that one must time-discretize the partially observed diffusion process and work with the model with bias and consider maximizing the resulting log-likelihood. Using a novel double randomization scheme, based upon Markovian stochastic approximation we develop a new method to unbiasedly estimate the static parameters, that is, to obtain the maximum likelihood estimator with no time discretization bias. Under assumptions we prove that our estimator is unbiased and investigate the method in several numerical examples, showing that it can empirically out-perform existing unbiased methodology.
In this work, we propose a novel research problem: assessing positive and risky messages from music products. We first establish a benchmark for multi-angle multi-level music content assessment and then present an effective multi-task prediction model with ordinality-enforcement to solve this problem. Our result shows the proposed method not only significantly outperforms strong task-specific counterparts but can concurrently evaluate multiple aspects.
In this paper, we propose simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted non-orthogonal multiple access (NOMA) networks. The considered STAR-RIS utilizes the mode switching (MS) protocol to serve multiple NOMA users located on both sides of the RIS surface. Based on the MS protocol, each STAR-RIS element can operate in full transmission or reflection mode. Within this perspective, we propose a novel algorithm to partition the STAR-RIS surface among the available users. This algorithm aims to determine the proper number of transmitting/reflecting elements needs to be assigned to each user in order to maximize the system sum-rate while guaranteeing the quality-of-service requirements for individual users. For the proposed system, we derive closed-form analytical expressions for the outage probability (OP) and its corresponding asymptotic behavior under different user deployments. Finally, Monte Carlo simulations are performed in order to verify the correctness of the theoretical analysis. It is shown that the proposed system outperforms the classical NOMA and orthogonal multiple access systems in terms of OP and sum-rate.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.