The matrixdist R package provides a comprehensive suite of tools for the statistical analysis of matrix distributions, including phase-type, inhomogeneous phase-type, discrete phase-type, and related multivariate distributions. This paper introduces the package and its key features, including the estimation of these distributions and their extensions through expectation-maximisation algorithms, as well as the implementation of regression through the proportional intensities and mixture-of-experts models. Additionally, the paper provides an overview of the theoretical background, discusses the algorithms and methods implemented in the package, and offers practical examples to illustrate the application of matrixdist in real-world scenarios. The matrixdist R package aims to provide researchers and practitioners a wide set of tools for analysing and modelling complex data using matrix distributions.
The swiftly maturing sector of cryptocurrencies proffers an array of challenges and prospects for both enterprises and consumers. This study explores the knowledge, expertise, and purchasing behaviors of individuals engaged in shopping using cryptocurrencies to furnish an exhaustive understanding of this distinctive consumer cohort. By analyzing data from our survey of 516 participants, our findings illuminate a range of knowledge levels, encompassing neophytes to connoisseurs, with a significant segment exhibiting high procurement frequency amidst constrained expertise. Regression analyses unveil that, although knowledge significantly influences purchase behaviors, its explanatory capacity remains restricted. Additionally, a K-means cluster analysis discloses three disparate crypto-shopper profiles, each possessing unique knowledge and expertise levels. These insights contravene conventional wisdom regarding the nexus between domain knowledge and adoption, insinuating that the appeal of cryptocurrencies transcends technical knowledge. The revelations of this research are instrumental for enterprises aspiring to address the diverse needs of the crypto-shopper demographic, accentuating the imperative of personalized strategies and user experiences. This exploration furthermore lays the groundwork for ensuing research focused on unraveling the extensive implications of crypto acceptance and its confluence with consumer conduct.
Multiway data analysis aims to uncover patterns in data structured as multi-indexed arrays, and the covariance of such data plays a crucial role in various machine learning applications. However, the intrinsically high dimension of multiway covariance presents significant challenges. To address these challenges, factorized covariance models have been proposed that rely on a separability assumption: the multiway covariance can be accurately expressed as a sum of Kronecker products of mode-wise covariances. This paper is concerned with the accuracy of such separable models for representing multiway covariances. We reduce the question of whether a given covariance can be represented as a separable multiway covariance to an equivalent question about separability of quantum states. Based on this equivalence, we establish that generic multiway covariances tend to be not separable. Moreover, we show that determining the best separable approximation of a generic covariance is NP-hard. Our results suggest that factorized covariance models might not accurately approximate covariance, without additional assumptions ensuring separability. To balance these negative results, we propose an iterative Frank-Wolfe algorithm for computing Kronecker-separable covariance approximations with some additional side information. We establish an oracle complexity bound and empirically observe its consistent convergence to a separable limit point, often close to the ``best'' separable approximation. These results suggest that practical methods may be able to find a Kronecker-separable approximation of covariances, despite the worst-case NP hardness results.
We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions correctly, our approach takes inspirations from principles in educational learning sciences. We explicitly ask LLMs to mimic a novice learner by answering questions in a specific incorrect manner based on incomplete knowledge; and to mimic an expert tutor by identifying misconception(s) corresponding to an incorrect answer to a question. Using simple grade-school math problems, our experiments reveal that, while LLMs can easily answer these questions correctly, they struggle to identify 1) the incorrect answer corresponding to specific incomplete knowledge (misconceptions); 2) the misconceptions that explain particular incorrect answers. Our study indicates new opportunities for enhancing LLMs' math reasoning capabilities, especially on developing robust student simulation and expert tutoring models in the educational applications such as intelligent tutoring systems.
We propose efficient algorithms for enumerating the notorious combinatorial structures of maximal planar graphs, called canonical orderings and Schnyder woods, and the related classical graph drawings by de Fraysseix, Pach, and Pollack [Combinatorica, 1990] and by Schnyder [SODA, 1990], called canonical drawings and Schnyder drawings, respectively. To this aim (i) we devise an algorithm for enumerating special $e$-bipolar orientations of maximal planar graphs, called canonical orientations; (ii) we establish bijections between canonical orientations and canonical drawings, and between canonical orientations and Schnyder drawings; and (iii) we exploit the known correspondence between canonical orientations and canonical orderings, and the known bijection between canonical orientations and Schnyder woods. All our enumeration algorithms have $O(n)$ setup time, space usage, and delay between any two consecutively listed outputs, for an $n$-vertex maximal planar graph.
Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter $\omega$ has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm -- using only the number of iterations as feedback -- can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $\omega$ as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best $\omega$ for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms.
In this paper, the limitations of YOLOv5s model on small target detection task are deeply studied and improved. The performance of the model is successfully enhanced by introducing GhostNet-based convolutional module, RepGFPN-based Neck module optimization, CA and Transformer's attention mechanism, and loss function improvement using NWD. The experimental results validate the positive impact of these improvement strategies on model precision, recall and mAP. In particular, the improved model shows significant superiority in dealing with complex backgrounds and tiny targets in real-world application tests. This study provides an effective optimization strategy for the YOLOv5s model on small target detection, and lays a solid foundation for future related research and applications.
The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs and strong requirements for the data set and accessibility of the face recognition model. By analyzing the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively, and our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.
The accurate representation and prediction of physical phenomena through numerical computer codes remains to be a vast and intricate interdisciplinary topic of research. Especially within the last decades, there has been a considerable push toward high performance numerical schemes to solve partial differential equations (PDEs) from the applied mathematics and numerics community. The resulting landscape of choices regarding numerical schemes for a given system of PDEs can thus easily appear daunting for an application expert that is familiar with the relevant physics, but not necessarily with the numerics. Bespoke high performance schemes in particular pose a substantial hurdle for domain scientists regarding their theory and implementation. Here, we propose a unifying scheme for grid based approximation methods to address this issue. We introduce some well defined restrictions to systematically guide an application expert through the process of classifying a given multiphysics problem, identifying suitable numerical schemes and implementing them. We introduce a fixed set of input parameters, amongst them for example the governing equations and the hardware configuration. This method not only helps to identify and assemble suitable schemes, but enables the unique combination of multiple methods on a per field basis. We exemplarily demonstrate this process and its effectiveness using different approaches and systematically show how one should exploit some given properties of a PDE problem to arrive at an efficient compound discretisation.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.