Linear codes are widely studied in coding theory as they have nice applications in distributed storage, combinatorics, lattices, cryptography and so on. Constructing linear codes with desirable properties is an interesting research topic. In this paper, based on the augmentation technique, we present two families of linear codes from some functions over finite fields. The first family of linear codes is constructed from monomial functions over finite fields. The locality of them is determined and the weight distributions of two subfamilies of the codes are also given. An infinite family of almost optimal recoverable codes and some optimal recoverable codes are obtained from the linear codes. In particular, the two subfamilies of the codes are proved to be both optimally or almost optimally extendable and self-orthogonal. The second family of linear codes is constructed from weakly regular bent functions over finite fields and their weight distribution is determined. This family of codes is proved to have locality 3 for some cases and is conjectured to have locality 2 for other cases. Particularly, two families of optimal locally recoverable codes are derived from the linear codes. Besides, this family of codes is also proved to be both optimally or almost optimally extendable and self-orthogonal.
Source code plagiarism is a significant issue in educational practice, and educators need user-friendly tools to cope with such academic dishonesty. This article introduces the latest version of Dolos, a state-of-the-art ecosystem of tools for detecting and preventing plagiarism in educational source code. In this new version, the primary focus has been on enhancing the user experience. Educators can now run the entire plagiarism detection pipeline from a new web app in their browser, eliminating the need for any installation or configuration. Completely redesigned analytics dashboards provide an instant assessment of whether a collection of source files contains suspected cases of plagiarism and how widespread plagiarism is within the collection. The dashboards support hierarchically structured navigation to facilitate zooming in and out of suspect cases. Clusters are an essential new component of the dashboard design, reflecting the observation that plagiarism can occur among larger groups of students. To meet various user needs, the Dolos software stack for source code plagiarism detections now includes a web interface, a JSON application programming interface (API), a command line interface (CLI), a JavaScript library and a preconfigured Docker container. Clear documentation and a free-to-use instance of the web app can be found at //dolos.ugent.be. The source code is also available on GitHub.
We consider the numerical behavior of the fixed-stress splitting method for coupled poromechanics as undrained regimes are approached. We explain that pressure stability is related to the splitting error of the scheme, not the fact that the discrete saddle point matrix never appears in the fixed-stress approach. This observation reconciles previous results regarding the pressure stability of the splitting method. Using examples of compositional poromechanics with application to geological CO$_2$ sequestration, we see that solutions obtained using the fixed-stress scheme with a low order finite element-finite volume discretization which is not inherently inf-sup stable can exhibit the same pressure oscillations obtained with the corresponding fully implicit scheme. Moreover, pressure jump stabilization can effectively remove these spurious oscillations in the fixed-stress setting, while also improving the efficiency of the scheme in terms of the number of iterations required at every time step to reach convergence.
We introduce Optimistix: a nonlinear optimisation library built in JAX and Equinox. Optimistix introduces a novel, modular approach for its minimisers and least-squares solvers. This modularity relies on new practical abstractions for optimisation which we call search and descent, and which generalise classical notions of line search, trust-region, and learning-rate algorithms. It provides high-level APIs and solvers for minimisation, nonlinear least-squares, root-finding, and fixed-point iteration. Optimistix is available at //github.com/patrick-kidger/optimistix.
Locally repairable codes have been extensively investigated due to practical applications in distributed and cloud storage systems in recent years. However, not much work on asymptotic behavior of locally repairable codes has been done. In particular, there is few result on constructive lower bound of asymptotic behavior of locally repairable codes with multiple recovering sets. In this paper, we construct some families of asymptotically good locally repairable codes with multiple recovering sets via automorphism groups of function fields of the Garcia-Stichtenoth towers. The main advantage of our construction is to allow more flexibility of localities.
The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts lexical diversity and downstream model performance. We compare the effects over 5 different LLMs, 6 datasets and 2 downstream models. We show that diversity is most increased by taboo words, but downstream model performance is highest with hints.
We analyze a Discontinuous Galerkin method for a problem with linear advection-reaction and $p$-type diffusion, with Sobolev indices $p\in (1, \infty)$. The discretization of the diffusion term is based on the full gradient including jump liftings and interior-penalty stabilization while, for the advective contribution, we consider a strengthened version of the classical upwind scheme. The developed error estimates track the dependence of the local contributions to the error on local P\'eclet numbers. A set of numerical tests supports the theoretical derivations.
Motivated by deterministic identification via (classical) channels, where the encoder is not allowed to use randomization, we revisit the problem of identification via quantum channels but now with the additional restriction that the message encoding must use pure quantum states, rather than general mixed states. Together with the previously considered distinction between simultaneous and general decoders, this suggests a two-dimensional spectrum of different identification capacities, whose behaviour could a priori be very different. We demonstrate two new results as our main findings: first, we show that all four combinations (pure/mixed encoder, simultaneous/general decoder) have a double-exponentially growing code size, and that indeed the corresponding identification capacities are lower bounded by the classical transmission capacity for a general quantum channel, which is given by the Holevo-Schumacher-Westmoreland Theorem. Secondly, we show that the simultaneous identification capacity of a quantum channel equals the simultaneous identification capacity with pure state encodings, thus leaving three linearly ordered identification capacities. By considering some simple examples, we finally show that these three are all different: general identification capacity can be larger than pure-state-encoded identification capacity, which in turn can be larger than pure-state-encoded simultaneous identification capacity.
Mesh-based Graph Neural Networks (GNNs) have recently shown capabilities to simulate complex multiphysics problems with accelerated performance times. However, mesh-based GNNs require a large number of message-passing (MP) steps and suffer from over-smoothing for problems involving very fine mesh. In this work, we develop a multiscale mesh-based GNN framework mimicking a conventional iterative multigrid solver, coupled with adaptive mesh refinement (AMR), to mitigate challenges with conventional mesh-based GNNs. We use the framework to accelerate phase field (PF) fracture problems involving coupled partial differential equations with a near-singular operator due to near-zero modulus inside the crack. We define the initial graph representation using all mesh resolution levels. We perform a series of downsampling steps using Transformer MP GNNs to reach the coarsest graph followed by upsampling steps to reach the original graph. We use skip connectors from the generated embedding during coarsening to prevent over-smoothing. We use Transfer Learning (TL) to significantly reduce the size of training datasets needed to simulate different crack configurations and loading conditions. The trained framework showed accelerated simulation times, while maintaining high accuracy for all cases compared to physics-based PF fracture model. Finally, this work provides a new approach to accelerate a variety of mesh-based engineering multiphysics problems
Inner products of neural network feature maps arises in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.