Large-scale discrete fracture network (DFN) simulators are standard fare for studies involving the sub-surface transport of particles since direct observation of real world underground fracture networks is generally infeasible. While these simulators have seen numerous successes over several engineering applications, estimations on quantities of interest (QoI) - such as breakthrough time of particles reaching the edge of the system - suffer from a two distinct types of uncertainty. A run of a DFN simulator requires several parameter values to be set that dictate the placement and size of fractures, the density of fractures, and the overall permeability of the system; uncertainty on the proper parameter choices will lead to some amount of uncertainty in the QoI, called epistemic uncertainty. Furthermore, since DFN simulators rely on stochastic processes to place fractures and govern flow, understanding how this randomness affects the QoI requires several runs of the simulator at distinct random seeds. The uncertainty in the QoI attributed to different realizations (i.e. different seeds) of the same random process leads to a second type of uncertainty, called aleatoric uncertainty. In this paper, we perform a Sensitivity Analysis, which directly attributes the uncertainty observed in the QoI to the epistemic uncertainty from each input parameter and to the aleatoric uncertainty. We make several design choices to handle an observed heteroskedasticity in DFN simulators, where the aleatoric uncertainty changes for different inputs, since the quality makes several standard statistical methods inadmissible. Beyond the specific takeaways on which input variables affect uncertainty the most for DFN simulators, a major contribution of this paper is the introduction of a statistically rigorous workflow for characterizing the uncertainty in DFN flow simulations that exhibit heteroskedasticity.
The development of large language models (LLMs) capable of following instructions and engaging in conversational interactions sparked increased interest in their utilization across various support tools. We investigate the utility of modern LLMs in assisting professional writers via an empirical user study (n=30). The design of our collaborative writing interface is grounded in the cognitive process model of writing that views writing as a goal-oriented thinking process encompassing non-linear cognitive activities: planning, translating, and reviewing. Participants are asked to submit a post-completion survey to provide feedback on the potential and pitfalls of LLMs as writing collaborators. Upon analyzing the writer-LLM interactions, we find that while writers seek LLM's help across all three types of cognitive activities, they find LLMs more helpful in translation and reviewing. Our findings from analyzing both the interactions and the survey responses highlight future research directions in creative writing assistance using LLMs.
Mixture-of-Experts (MoE) models are a promising way to scale up model capacity without significantly increasing computational cost. A key component of MoEs is the router, which decides which subset of parameters (experts) process which feature embeddings (tokens). In this paper, we present a comprehensive study of routers in MoEs for computer vision tasks. We introduce a unified MoE formulation that subsumes different MoEs with two parametric routing tensors. This formulation covers both sparse MoE, which uses a binary or hard assignment between experts and tokens, and soft MoE, which uses a soft assignment between experts and weighted combinations of tokens. Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert. We conduct head-to-head experiments with 6 different routers, including existing routers from prior work and new ones we introduce. We show that (i) many routers originally developed for language modeling can be adapted to perform strongly in vision tasks, (ii) in sparse MoE, Expert Choice routers generally outperform Token Choice routers, and (iii) soft MoEs generally outperform sparse MoEs with a fixed compute budget. These results provide new insights regarding the crucial role of routers in vision MoE models.
The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, particularly in creating effective vector representations of natural language inputs. However, these models face notable challenges in domain-specific contexts, especially in highly specialized scientific sub-fields. Traditional methods often struggle in this regime, either overgeneralizing similarities within a niche or being overly sensitive to minor differences, resulting in inaccurate text classification and subpar vector representation. In an era where retrieval augmentation and search are increasingly crucial, precise and concise numerical representations are essential. In this paper, we target this issue by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We employ two key strategies for fine-tuning state-of-the-art models: 1. Domain-specific Fine-Tuning, which tailors pretrained models to a single domain, and 2. Universal Applicability with Mixture of Experts (MoE), adapting pretrained models with enforced routing for multiple domains simultaneously. Our training approach emphasizes the use of abstracts for faster training, incorporating Multiple Negative Rankings loss for efficient contrastive learning. Notably, our MoE variants, equipped with $N$ experts, achieve the efficacy of $N$ individual models, heralding a new era of versatile, One-Size-Fits-All transformer networks for various tasks. This methodology marks significant advancements in scientific text classification metrics and holds promise for enhancing vector database search and compilation.
The assumption that data are invariant under the action of a compact group is implicit in many statistical modeling assumptions such as normality, or the assumption of independence and identical distributions. Hence, testing for the presence of such invariances offers a principled way to falsify various statistical models. In this article, we develop sequential, anytime-valid tests of distributional symmetry under the action of general compact groups. The tests that are developed allow for the continuous monitoring of data as it is collected while keeping type-I error guarantees, and include tests for exchangeability and rotational symmetry as special examples. The main tool to this end is the machinery developed for conformal prediction. The resulting test statistic, called a conformal martingale, can be interpreted as a likelihood ratio. We use this interpretation to show that the test statistics are optimal -- in a specific log-optimality sense -- against certain alternatives. Furthermore, we draw a connection between conformal prediction, anytime-valid tests of distributional invariance, and current developments on anytime-valid testing. In particular, we extend existing anytime-valid tests of independence, which leverage exchangeability, to work under general group invariances. Additionally, we discuss testing for invariance under subgroups of the permutation group and orthogonal group, the latter of which corresponds to testing the assumptions behind linear regression models.
The quantum convolutional neural network (QCNN) is a promising quantum machine learning (QML) model that is expected to achieve quantum advantages in classically intractable problems. However, the QCNN requires a large number of measurements for data learning, limiting its practical applications in large-scale problems. To alleviate this requirement, we propose a novel architecture called split-parallelizing QCNN (sp-QCNN), which exploits the prior knowledge of quantum data to design an efficient model. This architecture draws inspiration from geometric quantum machine learning and targets translationally symmetric quantum data commonly encountered in physics and quantum computing science. By splitting the quantum circuit based on translational symmetry, the sp-QCNN can substantially parallelize the conventional QCNN without increasing the number of qubits and improve the measurement efficiency by an order of the number of qubits. To demonstrate its effectiveness, we apply the sp-QCNN to a quantum phase recognition task and show that it can achieve comparable classification accuracy to the conventional QCNN while considerably reducing the measurement resources required. Due to its high measurement efficiency, the sp-QCNN can mitigate statistical errors in estimating the gradient of the loss function, thereby accelerating the learning process. These results open up new possibilities for incorporating the prior data knowledge into the efficient design of QML models, leading to practical quantum advantages.
We use Markov categories to develop generalizations of the theory of Markov chains and hidden Markov models in an abstract setting. This comprises characterizations of hidden Markov models in terms of local and global conditional independences as well as existing algorithms for Bayesian filtering and smoothing applicable in all Markov categories with conditionals. We show that these algorithms specialize to existing ones such as the Kalman filter, forward-backward algorithm, and the Rauch-Tung-Striebel smoother when instantiated in appropriate Markov categories. Under slightly stronger assumptions, we also prove that the sequence of outputs of the Bayes filter is itself a Markov chain with a concrete formula for its transition maps. There are two main features of this categorical framework. The first is its generality, as it can be used in any Markov category with conditionals. In particular, it provides a systematic unified account of hidden Markov models and algorithms for filtering and smoothing in discrete probability, Gaussian probability, measure-theoretic probability, possibilistic nondeterminism and others at the same time. The second feature is the intuitive visual representation of information flow in these algorithms in terms of string diagrams.
Mobile edge computing (MEC) is powerful to alleviate the heavy computing tasks in integrated sensing and communication (ISAC) systems. In this paper, we investigate joint beamforming and offloading design in a three-tier integrated sensing, communication and computation (ISCC) framework comprising one cloud server, multiple mobile edge servers, and multiple terminals. While executing sensing tasks, the user terminals can optionally offload sensing data to either MEC server or cloud servers. To minimize the execution latency, we jointly optimize the transmit beamforming matrices and offloading decision variables under the constraint of sensing performance. An alternating optimization algorithm based on multidimensional fractional programming is proposed to tackle the non-convex problem. Simulation results demonstrates the superiority of the proposed mechanism in terms of convergence and task execution latency reduction, compared with the state-of-the-art two-tier ISCC framework.
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.
Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.