We provide a universal construction of the category of finite-dimensional C*-algebras and completely positive trace-nonincreasing maps from the rig category of finite-dimensional Hilbert spaces and unitaries. This construction, which can be applied to any dagger rig category, is described in three steps, each associated with their own universal property, and draws on results from dilation theory in finite dimension. In this way, we explicitly construct the category that captures hybrid quantum/classical computation with possible nontermination from the category of its reversible foundations. We discuss how this construction can be used in the design and semantics of quantum programming languages.
In this paper, we propose an orthogonal block wise Kaczmarz (POBK) algorithm based on preprocessing techniques to solve large-scale sparse linear systems $Ax=f$. Firstly, the Reverse Cuthill McKee Algorithm (RCM) algorithm is used to preprocess the linear system, and then a new partitioning strategy is proposed to divide orthogonal blocks into one category, in order to accelerate the convergence rate of the Kaczmarz algorithm. The convergence of the POBK algorithm has been theoretically proven, and a theoretical analysis of its faster convergence is also provided. In addition, the experimental results confirm that this algorithm is far superior to GRBK, RBK(k), and GREBK(k) algorithms in both iteration steps (IT) and CPU time aspects.
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}
Communication protocols form the bedrock of our interconnected world, yet vulnerabilities within their implementations pose significant security threats. Recent developments have seen a surge in fuzzing-based research dedicated to uncovering these vulnerabilities within protocol implementations. However, there still lacks a systematic overview of protocol fuzzing for answering the essential questions such as what the unique challenges are, how existing works solve them, etc. To bridge this gap, we conducted a comprehensive investigation of related works from both academia and industry. Our study includes a detailed summary of the specific challenges in protocol fuzzing, and provides a systematic categorization and overview of existing research efforts. Furthermore, we explore and discuss potential future research directions in protocol fuzzing. This survey serves as a foundational guideline for researchers and practitioners in the field.
We introduce a new class of hybrid preconditioners for solving parametric linear systems of equations. The proposed preconditioners are constructed by hybridizing the deep operator network, namely DeepONet, with standard iterative methods. Exploiting the spectral bias, DeepONet-based components are harnessed to address low-frequency error components, while conventional iterative methods are employed to mitigate high-frequency error components. Our preconditioning framework comprises two distinct hybridization approaches: direct preconditioning (DP) and trunk basis (TB) approaches. In the DP approach, DeepONet is used to approximate an action of an inverse operator to a vector during each preconditioning step. In contrast, the TB approach extracts basis functions from the trained DeepONet to construct a map to a smaller subspace, in which the low-frequency component of the error can be effectively eliminated. Our numerical results demonstrate that utilizing the TB approach enhances the convergence of Krylov methods by a large margin compared to standard non-hybrid preconditioning strategies. Moreover, the proposed hybrid preconditioners exhibit robustness across a wide range of model parameters and problem resolutions.
The proliferation of low-quality online information in today's era has underscored the need for robust and automatic mechanisms to evaluate the trustworthiness of online news publishers. In this paper, we analyse the trustworthiness of online news media outlets by leveraging a dataset of 4033 news stories from 40 different sources. We aim to infer the trustworthiness level of the source based on the classification of individual articles' content. The trust labels are obtained from NewsGuard, a journalistic organization that evaluates news sources using well-established editorial and publishing criteria. The results indicate that the classification model is highly effective in classifying the trustworthiness levels of the news articles. This research has practical applications in alerting readers to potentially untrustworthy news sources, assisting journalistic organizations in evaluating new or unfamiliar media outlets and supporting the selection of articles for their trustworthiness assessment.
We extend Ziv and Lempel's model of finite-state encoders to the realm of lossy compression of individual sequences. In particular, the model of the encoder includes a finite-state reconstruction codebook followed by an information lossless finite-state encoder that compresses the reconstruction codeword with no additional distortion. We first derive two different lower bounds to the compression ratio that depend on the number of states of the lossless encoder. Both bounds are asymptotically achievable by conceptually simple coding schemes. We then show that when the number of states of the lossless encoder is large enough in terms of the reconstruction block-length, the performance can be improved, sometimes significantly so. In particular, the improved performance is achievable using a random-coding ensemble that is universal, not only in terms of the source sequence, but also in terms of the distortion measure.
The availability of the Global Positioning System (GPS) trajectory data is increasing along with the availability of different GPS receivers and with the increasing use of various mobility services. GPS trajectory is an important data source which is used in traffic density detection, transport mode detection, mapping data inferences with the use of different methods such as image processing and machine learning methods. While the data size increases, efficient representation of this type of data is becoming difficult to be used in these methods. A common approach is the representation of GPS trajectory information such as average speed, bearing, etc. in raster image form and applying analysis methods. In this study, we evaluate GPS trajectory data rasterization using the spatial join functions of QGIS, PostGIS+QGIS, and our iterative spatial structured grid aggregation implementation coded in the Python programming language. Our implementation is also parallelizable, and this parallelization is also included as the fourth method. According to the results of experiment carried out with an example GPS trajectory dataset, QGIS method and PostGIS+QGIS method showed relatively low performance with respect to our method using the metric of total processing time. PostGIS+QGIS method achieved the best results for spatial join though its total performance decreased quickly while test area size increases. On the other hand, both of our methods' performances decrease directly proportional to GPS point. And our methods' performance can be increased proportional to the increase with the number of processor cores and/or with multiple computing clusters.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.