This paper investigates the signal detection problem in colored noise with an unknown covariance matrix. In particular, we focus on detecting an unknown non-random signal by capitalizing on the leading eigenvalue of the whitened sample covariance matrix as the test statistic (a.k.a. Roy's largest root test). Since the unknown signal is non-random, the whitened sample covariance matrix turns out to have a non-central $F$-distribution. This distribution assumes a singular or non-singular form depending on whether the number of observations $p\lessgtr$ the system dimensionality $m$. Therefore, we statistically characterize the leading eigenvalue of the singular and non-singular $F$-matrices by deriving their cumulative distribution functions (c.d.f.). Subsequently, they have been utilized in deriving the corresponding receiver operating characteristic (ROC) profiles. We also extend our analysis into the high dimensional domain. It turns out that, when the signal is sufficiently strong, the maximum eigenvalue can reliably detect it in this regime. Nevertheless, weak signals cannot be detected in the high dimensional regime with the leading eigenvalue.
In this paper, we explore the descriptive complexity theory of finite groups by examining the power of the second Ehrenfeucht-Fra\"iss\'e bijective pebble game in Hella's (Ann. Pure Appl. Log., 1989) heirarchy. This is a Spoiler-Duplicator game in which Spoiler can place up to two pebbles each round. While it trivially solves graph isomorphism, it may be nontrivial for finite groups, and other ternary relational structures. We first provide a novel generalization of Weisfeiler-Leman (WL) coloring, which we call 2-ary WL. We then show that the 2-ary WL is equivalent to the second Ehrenfeucht-Fra\"iss\'e bijective pebble game in Hella's heirarchy. Our main result is that, in the pebble game characterization, only $O(1)$ pebbles and $O(1)$ rounds are sufficient to identify all groups without Abelian normal subgroups (a class of groups for which isomorphism testing is known to be in $\mathsf{P}$; Babai, Codenotti, & Qiao, ICALP 2012). In particular, we show that within the first few rounds, Spoiler can force Duplicator to select an isomorphism between two such groups at each subsequent round. By Hella's results (\emph{ibid.}), this is equivalent to saying that these groups are identified by formulas in first-order logic with generalized 2-ary quantifiers, using only $O(1)$ variables and $O(1)$ quantifier depth.
Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as PhAST, and evaluate them thoroughly on multiple architectures. Overall, PhAST improves energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{//phast.readthedocs.io}.
In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility and effectiveness of synthetic EHOI data produced by the proposed tool, we designed a new method that predicts and combines different multimodal signals to detect EHOIs in RGB images. Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data. Moreover, to fully understand the usefulness of our method, we conducted an in-depth analysis in which we compared and highlighted the superiority of the proposed approach over different state-of-the-art class-agnostic methods. To support research in this field, we publicly release the datasets, source code, and pre-trained models at //iplab.dmi.unict.it/egoism-hoi.
For the ground state of the Gross-Pitaevskii (GP) eigenvalue problem, we consider a fully discretized Sobolev gradient flow, which can be regarded as the Riemannian gradient descent on the sphere under a metric induced by a modified $H^1$-norm. We prove its global convergence to a critical point of the discrete GP energy and its local exponential convergence to the ground state of the discrete GP energy. The local exponential convergence rate depends on the eigengap of the discrete GP energy. When the discretization is the classical second-order finite difference in two dimensions, such an eigengap can be further proven to be mesh independent, i.e., it has a uniform positive lower bound, thus the local exponential convergence rate is mesh independent. Numerical experiments with discretization by high order $Q^k$ spectral element methods in two and three dimensions are provided to validate the efficiency of the proposed method.
In this paper, we propose a non-parallel any-to-many voice conversion (VC) method termed VoiceGrad. Inspired by WaveGrad, a recently introduced novel waveform generation method, VoiceGrad is based upon the concepts of score matching and Langevin dynamics. It uses weighted denoising score matching to train a score approximator, a fully convolutional network with a U-Net structure designed to predict the gradient of the log density of the speech feature sequences of multiple speakers, and performs VC by using annealed Langevin dynamics to iteratively update an input feature sequence towards the nearest stationary point of the target distribution based on the trained score approximator network. Thanks to the nature of this concept, VoiceGrad enables any-to-many VC, a VC scenario in which the speaker of input speech can be arbitrary, and allows for non-parallel training, which requires no parallel utterances or transcriptions.
In this paper, we introduce the Fongbe to French Speech Translation Corpus (FFSTC) for the first time. This corpus encompasses approximately 31 hours of collected Fongbe language content, featuring both French transcriptions and corresponding Fongbe voice recordings. FFSTC represents a comprehensive dataset compiled through various collection methods and the efforts of dedicated individuals. Furthermore, we conduct baseline experiments using Fairseq's transformer_s and conformer models to evaluate data quality and validity. Our results indicate a score of 8.96 for the transformer_s model and 8.14 for the conformer model, establishing a baseline for the FFSTC corpus.
In this paper, we apply the Paired-Explicit Runge-Kutta (P-ERK) schemes by Vermeire et. al. (2019, 2022) to dynamically partitioned systems arising from adaptive mesh refinement. The P-ERK schemes enable multirate time-integration with no changes in the spatial discretization methodology, making them readily implementable in existing codes that employ a method-of-lines approach. We show that speedup compared to a range of state of the art Runge-Kutta methods can be realized, despite additional overhead due to the dynamic re-assignment of flagging variables and restricting nonlinear stability properties. The effectiveness of the approach is demonstrated for a range of simulation setups for viscous and inviscid convection-dominated compressible flows for which we provide a reproducibility repository. In addition, we perform a thorough investigation of the nonlinear stability properties of the Paired-Explicit Runge-Kutta schemes regarding limitations due to the violation of monotonicity properties of the underlying spatial discretization. Furthermore, we present a novel approach for estimating the relevant eigenvalues of large Jacobians required for the optimization of stability polynomials.
In this paper, we study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) to specify the reward functions such that the prior knowledge of high-level events in a task can be leveraged to facilitate the learning efficiency. Unlike the existing work that RMs have been incorporated into MARL for task decomposition and policy learning in relatively simple domains or with an assumption of independencies among the agents, we present Multi-Agent Reinforcement Learning with a Hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios when the events among agents can occur concurrently and the agents are highly interdependent. MAHRM exploits the relationship of high-level events to decompose a task into a hierarchy of simpler subtasks that are assigned to a small group of agents, so as to reduce the overall computational complexity. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at //github.com/ChyaZhang/ChatTraffic.
Object detectors usually achieve promising results with the supervision of complete instance annotations. However, their performance is far from satisfactory with sparse instance annotations. Most existing methods for sparsely annotated object detection either re-weight the loss of hard negative samples or convert the unlabeled instances into ignored regions to reduce the interference of false negatives. We argue that these strategies are insufficient since they can at most alleviate the negative effect caused by missing annotations. In this paper, we propose a simple but effective mechanism, called Co-mining, for sparsely annotated object detection. In our Co-mining, two branches of a Siamese network predict the pseudo-label sets for each other. To enhance multi-view learning and better mine unlabeled instances, the original image and corresponding augmented image are used as the inputs of two branches of the Siamese network, respectively. Co-mining can serve as a general training mechanism applied to most of modern object detectors. Experiments are performed on MS COCO dataset with three different sparsely annotated settings using two typical frameworks: anchor-based detector RetinaNet and anchor-free detector FCOS. Experimental results show that our Co-mining with RetinaNet achieves 1.4%~2.1% improvements compared with different baselines and surpasses existing methods under the same sparsely annotated setting.