We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such implicit regularization can be described by a Support Vector Machine (SVM) problem with respect to the attention weights. This finding contrasts with prior results showing that the gradient descent induces an implicit regularization on the Frobenius norm on the product weight matrix when the key and query matrices are combined into a single weight matrix for training. For diagonal key and query matrices, our analysis builds upon the reparameterization technique and exploits approximate KKT conditions of the SVM associated with the classification data. Moreover, the results are extended to general weights configurations given proper alignment of the weight matrices' singular spaces with the data features at initialization.
This study delves into the auto-ignition temperature of n-heptane and ethanol mixtures within a counterflow flame configuration under low strain rate, with a particular focus on the impact of ethanol blending on heat release rates. Employing the sensitivity analysis method inspired by Zurada's sensitivity approach for neural network, this study identifies the group of critical species influencing the heat release rate. Further analysis concentration change reveals the intricate interactions among these various radicals across different temperature zones. It is found that, in n-heptane dominant mixtures, inhibition of low-temperature chemistry (LTC) caused by additional ethanol, impacts heat release rate at high temperature zone through diffusion effect of specific radicals such as CH2O, C2H4, C3H6 and H2O2. For ethanol-dominant mixtures, an increase in heat release rate was observed with higher ethanol fraction. Further concentration change analysis elucidated it is primarily attributed to the decomposition of ethanol and its subsequent reactions. This research underscores the significance of incorporating both chemical kinetics and species diffusion effects when analyzing the counterflow configuration of complex fuel mixtures.
We study the parameterized complexity of a generalization of the coordinated motion planning problem on graphs, where the goal is to route a specified subset of a given set of $k$ robots to their destinations with the aim of minimizing the total energy (i.e., the total length traveled). We develop novel techniques to push beyond previously-established results that were restricted to solid grids. We design a fixed-parameter additive approximation algorithm for this problem parameterized by $k$ alone. This result, which is of independent interest, allows us to prove the following two results pertaining to well-studied coordinated motion planning problems: (1) A fixed-parameter algorithm, parameterized by $k$, for routing a single robot to its destination while avoiding the other robots, which is related to the famous Rush-Hour Puzzle; and (2) a fixed-parameter algorithm, parameterized by $k$ plus the treewidth of the input graph, for the standard \textsc{Coordinated Motion Planning} (CMP) problem in which we need to route all the $k$ robots to their destinations. The latter of these results implies, among others, the fixed-parameter tractability of CMP parameterized by $k$ on graphs of bounded outerplanarity, which include bounded-height subgrids. We complement the above results with a lower bound which rules out the fixed-parameter tractability for CMP when parameterized by the total energy. This contrasts the recently-obtained tractability of the problem on solid grids under the same parameterization. As our final result, we strengthen the aforementioned fixed-parameter tractability to hold not only on solid grids but all graphs of bounded local treewidth -- a class including, among others, all graphs of bounded genus.
This study evaluates the usage of virtual reality (VR) technologies as a teaching tool in oral placement therapy, a subset of speech therapy. The researcher distributed instructional videos using traditional lecture and modified three-dimensional video to prompt responses. Data was gathered with a two-part Google Form: In "Section 1: Knowledge Test" participants were asked to determine how well they received the information displayed to them. In "Section 2: Opinion Test" participants were asked diagnostic and subjective questions via Likert scale ranging from 1 ("Strongly Disagree") to 5 ("Strongly Agree") to determine how well they enjoyed viewing the information displayed to them. Averages for Section 1 were 92.00% for the control group (viewing 2D, unmodified video) and 77.88% for the experimental group (viewing 3D, VR video). Almost all participants answered at least 60% of the questions correctly. Averages for 2D and 3D participants were 4.53/5 and 3.82/5, respectively for "positive" prompts. Exactly 50% of participants experiencing VR video preferred the method to a traditional lecture. This study determines that virtual reality is viable as a learning tool, but knowledge obtained is not necessarily as high as using traditional lecture. Further experimentation is required to determine how well oral placement therapists respond to physically interacting with a model instead of only viewing it. Copies of the Google Form used to collect responses, all raw data, and a flowchart outlining each step used to construct the 3D video can be found in the Appendix.
This study evaluates three different lemmatization approaches to Estonian -- Generative character-level models, Pattern-based word-level classification models, and rule-based morphological analysis. According to our experiments, a significantly smaller Generative model consistently outperforms the Pattern-based classification model based on EstBERT. Additionally, we observe a relatively small overlap in errors made by all three models, indicating that an ensemble of different approaches could lead to improvements.
We propose a method to couple local and nonlocal diffusion models. By inheriting desirable properties such as patch tests, asymptotic compatibility and unintrusiveness from related splice and optimization-based coupling schemes, it enables the use of weak (or variational) formulations, is computationally efficient and straightforward to implement. We prove well-posedness of the coupling scheme and demonstrate its properties and effectiveness in a variety of numerical examples.
With the robust uptick in the applications of Bayesian external data borrowing, eliciting a prior distribution with the proper amount of information becomes increasingly critical. The prior effective sample size (ESS) is an intuitive and efficient measure for this purpose. The majority of ESS definitions have been proposed in the context of borrowing control information. While many Bayesian models can be naturally extended to leveraging external information on the treatment effect scale, very little attention has been directed to computing the prior ESS in this setting. In this research, we bridge this methodological gap by extending the popular ELIR ESS definition. We lay out the general framework, and derive the prior ESS for various types of endpoints and treatment effect measures. The posterior distribution and the predictive consistency property of ESS are also examined. The methods are implemented in R programs available on GitHub: //github.com/squallteo/TrtEffESS.
We study minimax optimization problems defined over infinite-dimensional function classes. In particular, we restrict the functions to the class of overparameterized two-layer neural networks and study (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural network. As an initial step, we consider the minimax optimization problem stemming from estimating a functional equation defined by conditional expectations via adversarial estimation, where the objective function is quadratic in the functional space. For this problem, we establish convergence under the mean-field regime by considering the continuous-time and infinite-width limit of the optimization dynamics. Under this regime, gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters. We prove that the Wasserstein gradient flow converges globally to a stationary point of the minimax objective at a $\mathcal{O}(T^{-1} + \alpha^{-1} ) $ sublinear rate, and additionally finds the solution to the functional equation when the regularizer of the minimax objective is strongly convex. Here $T$ denotes the time and $\alpha$ is a scaling parameter of the neural network. In terms of representation learning, our results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $\mathcal{O}(\alpha^{-1})$, measured in terms of the Wasserstein distance. Finally, we apply our general results to concrete examples including policy evaluation, nonparametric instrumental variable regression, and asset pricing.
This study introduces a new approach to optimize the geometrical parameters of pipe diffusers in centrifugal compressors for Micro Gas Turbines, tailored for a 100 kW unit. The methodology draws insights from optimized airfoil-type diffusers and addresses the unique topological challenges of pipe diffusers, using diffuser maps to enhance design precision. The effectiveness of this method is validated through 3D-RANS based steady CFD simulations, using the ANSYS CFX solver. Comparative performance assessments at 100 percent rotation speed show that the best-performing pipe diffuser slightly trails its airfoil counterpart in efficiency, achieving 82.2 percent total-to-total isentropic efficiency compared to 84.4 percent. However, it offers a reduced frontal area, enhancing compactness. The analysis also reveals a dualistic impact from the leading-edge geometry of the pipe diffuser, which generates two counter-rotating vortices. These vortices have beneficial effects in pseudo and semi-vaneless spaces while introducing destabilizing factors in channel spaces. This investigation highlights potential trade-offs and outlines conditions under which adverse effects dominate, leading to significant flow separation. These insights pave the way for refining diffuser designs to better balance performance with spatial efficiency, marking a critical step forward in compressor technology of micro gas turbine for decentralized power systems.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as the graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many studies on extending deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks, graph recurrent neural networks, graph attention networks, graph generative networks, spatial-temporal graph convolutional networks, and hybrid forms of GNNs) are summarized, and key applications in power systems such as fault diagnosis, power prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.