In this paper, we consider a min-max optimization problem under adversarial manipulation, where there are $n$ cost functions, up to $f$ of which may be replaced by arbitrary faulty functions by an adversary. The goal is to minimize the maximum cost over $x$ among the $n$ functions despite the faulty functions. The problem formulation could naturally extend to Byzantine fault-tolerant distributed min-max optimization. We present a simple algorithm for Byzantine min-max optimization, and provide bounds on the output of the algorithm. We also present an approximate algorithm for this problem. We then extend the problem to a distributed setting and present a distributed algorithm. To the best of our knowledge, we are the first to consider this problem.
Recently, a mask-based beamformer with attention-based spatial covariance matrix aggregator (ASA) was proposed, which was demonstrated to track moving sources accurately. However, the deep neural network model used in this algorithm is limited to a specific channel configuration, requiring a different model in case a different channel permutation, channel count, or microphone array geometry is considered. Addressing this limitation, in this paper, we investigate three approaches to improve the robustness of the ASA-based tracking method against such variations: incorporating random channel configurations during the training process, employing the transform-average-concatenate (TAC) method to process multi-channel input features (allowing for any channel count and enabling permutation invariance), and utilizing input features that are robust against variations of the channel configuration. Our experiments, conducted using the CHiME-3 and DEMAND datasets, demonstrate improved robustness against mismatches in channel permutations, channel counts, and microphone array geometries compared to the conventional ASA-based tracking method without compromising performance in matched conditions, suggesting that the mask-based beamformer with ASA integrating the proposed approaches has the potential to track moving sources for arbitrary microphone arrays.
We construct a randomized vector quantizer which has a smaller maximum error compared to all known lattice quantizers with the same entropy for dimensions 5, 6, ..., 48, and also has a smaller mean squared error compared to known lattice quantizers with the same entropy for dimensions 35, ..., 48, in the high resolution limit. Moreover, our randomized quantizer has a desirable property that the quantization error is always uniform over the ball and independent of the input. Our construction is based on applying rejection sampling on universal quantization, which allows us to shape the error distribution to be any continuous distribution, not only uniform distributions over basic cells of a lattice as in conventional dithered quantization. We also characterize the high SNR limit of one-shot channel simulation for any additive noise channel under a mild assumption (e.g., the AWGN channel), up to an additive constant of 1.45 bits.
In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret $O\left(N^3 \log \frac{B}{\Delta} f(\log T) \log T \right)$ (assuming bounded rewards, with unknown bound), where $f(t)$ is any function diverging to infinity with $t$. This significantly improves previous results which had the same upper bound on the regret of order $O(f(\log T) \log T )$ but an exponential dependence on the number of agents. The result is attained by using a distributed auction algorithm to learn the sample-optimal matching and a novel order-statistics-based regret analysis. Simulation results present the dependence of the regret on $\log T$.
In this paper, we investigate the problem of multi-user linearly decomposable function computation, where $N$ servers help compute functions for $K$ users, and where each such function can be expressed as a linear combination of $L$ basis subfunctions. The process begins with each server computing some of the subfunctions, then broadcasting a linear combination of its computed outputs to a selected group of users, and finally having each user linearly combine its received data to recover its function. As it has become recently known, this problem can be translated into a matrix decomposition problem $\mathbf{F}=\mathbf{D}\mathbf{E}$, where $\mathbf{F} \in \mathbf{GF}(q)^{K \times L}$ describes the coefficients that define the users' demands, where $\mathbf{E} \in \mathbf{GF}(q)^{N \times L}$ describes which subfunction each server computes and how it combines the computed outputs, and where $\mathbf{D} \in \mathbf{GF}(q)^{K \times N}$ describes which servers each user receives data from and how it combines this data. Our interest here is in reducing the total number of subfunction computations across the servers (cumulative computational cost), as well as the worst-case load which can be a measure of computational delay. Our contribution consists of novel bounds on the two computing costs, where these bounds are linked here to the covering and packing radius of classical codes. One of our findings is that in certain cases, our distributed computing problem -- and by extension our matrix decomposition problem -- is treated optimally when $\mathbf{F}$ is decomposed into a parity check matrix $\mathbf{D}$ of a perfect code, and a matrix $\mathbf{E}$ which has as columns the coset leaders of this same code.
Due to strong capabilities in conducting fluent, multi-turn conversations with users, Large Language Models (LLMs) have the potential to further improve the performance of Conversational Recommender System (CRS). Unlike the aimless chit-chat that LLM excels at, CRS has a clear target. So it is imperative to control the dialogue flow in the LLM to successfully recommend appropriate items to the users. Furthermore, user feedback in CRS can assist the system in better modeling user preferences, which has been ignored by existing studies. However, simply prompting LLM to conduct conversational recommendation cannot address the above two key challenges. In this paper, we propose Multi-Agent Conversational Recommender System (MACRS) which contains two essential modules. First, we design a multi-agent act planning framework, which can control the dialogue flow based on four LLM-based agents. This cooperative multi-agent framework will generate various candidate responses based on different dialogue acts and then choose the most appropriate response as the system response, which can help MACRS plan suitable dialogue acts. Second, we propose a user feedback-aware reflection mechanism which leverages user feedback to reason errors made in previous turns to adjust the dialogue act planning, and higher-level user information from implicit semantics. We conduct extensive experiments based on user simulator to demonstrate the effectiveness of MACRS in recommendation and user preferences collection. Experimental results illustrate that MACRS demonstrates an improvement in user interaction experience compared to directly using LLMs.
The paper develops a novel motion model, called Generalized Multi-Speed Dubins Motion Model (GMDM), which extends the Dubins model by considering multiple speeds. While the Dubins model produces time-optimal paths under a constant-speed constraint, these paths could be suboptimal if this constraint is relaxed to include multiple speeds. This is because a constant speed results in a large minimum turning radius, thus producing paths with longer maneuvers and larger travel times. In contrast, multi-speed relaxation allows for slower speed sharp turns, thus producing more direct paths with shorter maneuvers and smaller travel times. Furthermore, the inability of the Dubins model to reduce speed could result in fast maneuvers near obstacles, thus producing paths with high collision risks. In this regard, GMDM provides the motion planners the ability to jointly optimize time and risk by allowing the change of speed along the path. GMDM is built upon the six Dubins path types considering the change of speed on path segments. It is theoretically established that GMDM provides full reachability of the configuration space for any speed selections. Furthermore, it is shown that the Dubins model is a specific case of GMDM for constant speeds. The solutions of GMDM are analytical and suitable for real-time applications. The performance of GMDM in terms of solution quality (i.e., time/time-risk cost) and computation time is comparatively evaluated against the existing motion models in obstacle-free as well as obstacle-rich environments via extensive Monte Carlo simulations. The results show that in obstacle-free environments, GMDM produces near time-optimal paths with significantly lower travel times than the Dubins model while having similar computation times. In obstacle-rich environments, GMDM produces time-risk optimized paths with substantially lower collision risks.
Few-shot class-incremental learning (FSCIL) aims at recognizing novel classes continually with limited novel class samples. A mainstream baseline for FSCIL is first to train the whole model in the base session, then freeze the feature extractor in the incremental sessions. Despite achieving high overall accuracy, most methods exhibit notably low accuracy for incremental classes. Some recent methods somewhat alleviate the accuracy imbalance between base and incremental classes by fine-tuning the feature extractor in the incremental sessions, but they further cause the accuracy imbalance between past and current incremental classes. In this paper, we study the causes of such classification accuracy imbalance for FSCIL, and abstract them into a unified model bias problem. Based on the analyses, we propose a novel method to mitigate model bias of the FSCIL problem during training and inference processes, which includes mapping ability stimulation, separately dual-feature classification, and self-optimizing classifiers. Extensive experiments on three widely-used FSCIL benchmark datasets show that our method significantly mitigates the model bias problem and achieves state-of-the-art performance.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
The key issue of few-shot learning is learning to generalize. In this paper, we propose a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the softmax classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning models, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax