Complex multibody legged robots can have complex rotational control challenges. In this paper, we propose a concise way to understand and formulate a \emph{whole-body orientation} that (i) depends on system configuration only and not a history of motion, (ii) can be representative of the orientation of the entire system while not being attached to any specific link, and (iii) has a rate of change that approximates total system angular momentum. We relate this orientation coordinate to past work, and discuss and demonstrate, including on hardware, several different uses for it.
The goal of this paper is to detect objects by exploiting their interrelationships. Rather than relying on predefined and labeled graph structures, we infer a graph prior from object co-occurrence statistics. The key idea of our paper is to model object relations as a function of initial class predictions and co-occurrence priors to generate a graph representation of an image for improved classification and bounding box regression. We additionally learn the object-relation joint distribution via energy based modeling. Sampling from this distribution generates a refined graph representation of the image which in turn produces improved detection performance. Experiments on the Visual Genome and MS-COCO datasets demonstrate our method is detector agnostic, end-to-end trainable, and especially beneficial for rare object classes. What is more, we establish a consistent improvement over object detectors like DETR and Faster-RCNN, as well as state-of-the-art methods modeling object interrelationships.
In this paper we study the Subset Sum Problem (SSP). Assuming the SSP has at most one solution, we provide a randomized quasi-polynomial algorithm which if the SSP has no solution, the algorithm always returns FALSE while if the SSP has a solution the algorithm returns TRUE with probability $\frac{1}{2^{\log(n)}}$. This can be seen as two types of coins. One coin, when tossed always returns TAILS while the other also returns HEADS but with probability $\frac{1}{2^{\log(n)}}$. Using the Law of Large Numbers one can identify the coin type and as such assert the existence of a solution to the SSP. The algorithm is developed in the more general framework of maximizing the distance to a given point over an intersection of balls.
This paper introduces a new neural-network-based approach, namely In-Context Operator Networks (ICON), to simultaneously learn operators from the prompted data and apply it to new questions during the inference stage, without any weight update. Existing methods are limited to using a neural network to approximate a specific equation solution or a specific operator, requiring retraining when switching to a new problem with different equations. By training a single neural network as an operator learner, we can not only get rid of retraining (even fine-tuning) the neural network for new problems, but also leverage the commonalities shared across operators so that only a few demos in the prompt are needed when learning a new operator. Our numerical results show the neural network's capability as a few-shot operator learner for a diversified type of differential equation problems, including forward and inverse problems of ordinary differential equations (ODEs), partial differential equations (PDEs), and mean-field control (MFC) problems, and also show that it can generalize its learning capability to operators beyond the training distribution.
In this paper, we aim at maximizing the weighted sum-rate (WSR) of rate splitting multiple access (RSMA) in multi-user multi-antenna transmission networks through the joint optimization of rate allocation and beamforming. Unlike conventional methods like weighted minimum mean square error (WMMSE) and standard fractional programming (FP), which tackle the non-convex WSR problem iteratively using disciplined convex subproblems and optimization toolboxes, our work pioneers a novel toolbox-free approach. For the first time, we identify the optimal beamforming structure and common rate allocation for WSR maximization in RSMA by leveraging FP and Lagrangian duality. Then we propose an algorithm based on FP and fixed point iteration to optimize the beamforming and common rate allocation without the need for optimization toolboxes. Our numerical results demonstrate that the proposed algorithm attains the same performance as standard FP and classical WMMSE methods while significantly reducing computational time.
In this paper, we propose simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted non-orthogonal multiple access (NOMA) networks. The considered STAR-RIS utilizes the mode switching (MS) protocol to serve multiple NOMA users located on both sides of the RIS surface. Based on the MS protocol, each STAR-RIS element can operate in full transmission or reflection mode. Within this perspective, we propose a novel algorithm to partition the STAR-RIS surface among the available users. This algorithm aims to determine the proper number of transmitting/reflecting elements needs to be assigned to each user in order to maximize the system sum-rate while guaranteeing the quality-of-service requirements for individual users. For the proposed system, we derive closed-form analytical expressions for the outage probability (OP) and its corresponding asymptotic behavior under different user deployments. Finally, Monte Carlo simulations are performed in order to verify the correctness of the theoretical analysis. It is shown that the proposed system outperforms the classical NOMA and orthogonal multiple access systems in terms of OP and sum-rate.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum
The key issue of few-shot learning is learning to generalize. In this paper, we propose a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the softmax classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning models, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax