In this paper, we propose an iterative framework, which consists of two phases: a generation phase and a training phase, to generate realistic training data and yield a supervised homography network. In the generation phase, given an unlabeled image pair, we utilize the pre-estimated dominant plane masks and homography of the pair, along with another sampled homography that serves as ground truth to generate a new labeled training pair with realistic motion. In the training phase, the generated data is used to train the supervised homography network, in which the training data is refined via a content consistency module and a quality assessment module. Once an iteration is finished, the trained network is used in the next data generation phase to update the pre-estimated homography. Through such an iterative strategy, the quality of the dataset and the performance of the network can be gradually and simultaneously improved. Experimental results show that our method achieves state-of-the-art performance and existing supervised methods can be also improved based on the generated dataset. Code and dataset are available at //github.com/megvii-research/RealSH.
In the presented work, we propose to apply the framework of graph neural networks (GNNs) to predict the dynamics of a rolling element bearing. This approach offers generalizability and interpretability, having the potential for scalable use in real-time operational digital twin systems for monitoring the health state of rotating machines. By representing the bearing's components as nodes in a graph, the GNN can effectively model the complex relationships and interactions among them. We utilize a dynamic spring-mass-damper model of a bearing to generate the training data for the GNN. In this model, discrete masses represent bearing components such as rolling elements, inner raceways, and outer raceways, while a Hertzian contact model is employed to calculate the forces between these components. We evaluate the learning and generalization capabilities of the proposed GNN framework by testing different bearing configurations that deviate from the training configurations. Through this approach, we demonstrate the effectiveness of the GNN-based method in accurately predicting the dynamics of rolling element bearings, highlighting its potential for real-time health monitoring of rotating machinery.
We present a polynomial-time quantum algorithm making a single query (in superposition) to a classical oracle, such that for every state $|\psi\rangle$ there exists a choice of oracle that makes the algorithm construct an exponentially close approximation of $|\psi\rangle$. Previous algorithms for this problem either used a linear number of queries and polynomial time, or a constant number of queries and polynomially many ancillae but no nontrivial bound on the runtime. As corollaries we do the following: - We simplify the proof that statePSPACE $\subseteq$ stateQIP (a quantum state analogue of PSPACE $\subseteq$ IP) and show that a constant number of rounds of interaction suffices. - We show that QAC$\mathsf{_f^0}$ lower bounds for constructing explicit states would imply breakthrough circuit lower bounds for computing explicit boolean functions. - We prove that every $n$-qubit state can be constructed to within 0.01 error by an $O(2^n/n)$-size circuit over an appropriate finite gate set. More generally we give a size-error tradeoff which, by a counting argument, is optimal for any finite gate set.
In this paper, we propose three generic models of capacitated coverage and, more generally, submodular maximization to study task-worker assignment problems that arise in a wide range of gig economy platforms. Our models incorporate the following features: (1) Each task and worker can have an arbitrary matching capacity, which captures the limited number of copies or finite budget for the task and the working capacity of the worker; (2) Each task is associated with a coverage or, more generally, a monotone submodular utility function. Our objective is to design an allocation policy that maximizes the sum of all tasks' utilities, subject to capacity constraints on tasks and workers. We consider two settings: offline, where all tasks and workers are static, and online, where tasks are static while workers arrive dynamically. We present three LP-based rounding algorithms that achieve optimal approximation ratios of $1-1/\mathsf{e} \sim 0.632$ for offline coverage maximization, competitive ratios of $(19-67/\mathsf{e}^3)/27\sim 0.580$ and $0.436$ for online coverage and online monotone submodular maximization, respectively.
In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. In the past, two independent hierarchies have been obtained for external and internal contextual grammars, one based on selection languages defined by structural properties (finite, monoidal, nilpotent, combinational, definite, ordered, non-counting, power-separating, suffix-closed, commutative, circular, or union-free languages), the other one based on selection languages defined by resources (number of non-terminal symbols, production rules, or states needed for generating or accepting them). In a previous paper, the language families of these hierarchies for external contextual grammars were compared and the hierarchies merged. In the present paper, we compare the language families of these hierarchies for internal contextual grammars and merge these hierarchies.
This paper presents a case for exemplar parallelism of neural networks using Go as parallelization framework. Further it is shown that also limited multi-core hardware systems are feasible for these parallelization tasks, as notebooks and single board computer systems. The main question was how much speedup can be generated when using concurrent Go goroutines specifically. A simple concurrent feedforward network for MNIST digit recognition with the programming language Go was created to find the answer. The first findings when using a notebook (Lenovo Yoga 2) showed a speedup of 252% when utilizing 4 goroutines. Testing a single board computer (Banana Pi M3) delivered more convincing results: 320% with 4 goroutines, and 432% with 8 goroutines.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.