Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.
This work concerns the minimization of the pseudospectral abscissa of a matrix-valued function dependent on parameters analytically. The problem is motivated by robust stability and transient behavior considerations for a linear control system that has optimization parameters. We describe a subspace procedure to cope with the setting when the matrix-valued function is of large size. The proposed subspace procedure solves a sequence of reduced problems obtained by restricting the matrix-valued function to small subspaces, whose dimensions increase gradually. It possesses desirable features such as the global convergence of the minimal values of the reduced problems to the minimal value of the original problem, and a superlinear convergence exhibited by the decay in the errors of the minimizers of the reduced problems. In mathematical terms, the problem we consider is a large-scale nonconvex minimax eigenvalue optimization problem such that the eigenvalue function appears in the constraint of the inner maximization problem. Devising and analyzing a subspace framework for the minimax eigenvalue optimization problem at hand with the eigenvalue function in the constraint require special treatment that makes use of a Lagrangian and dual variables. There are notable advantages in minimizing the pseudospectral abscissa over maximizing the distance to instability or minimizing the $\mathcal{H}_\infty$ norm; the optimized pseudospectral abscissa provides quantitative information about the worst-case transient growth, and the initial guesses for the parameter values to optimize the pseudospectral abscissa can be arbitrary, unlike the case to optimize the distance to instability and $\mathcal{H}_\infty$ norm that would normally require initial guesses yielding asymptotically stable systems.
Identifying scientific publications that are within a dynamic field of research often requires costly annotation by subject-matter experts. Resources like widely-accepted classification criteria or field taxonomies are unavailable for a domain like artificial intelligence (AI), which spans emerging topics and technologies. We address these challenges by inferring a functional definition of AI research from existing expert labels, and then evaluating state-of-the-art chatbot models on the task of expert data annotation. Using the arXiv publication database as ground-truth, we experiment with prompt engineering for GPT chatbot models to identify an alternative, automated expert annotation pipeline that assigns AI labels with 94% accuracy. For comparison, we fine-tune SPECTER, a transformer language model pre-trained on scientific publications, that achieves 96% accuracy (only 2% higher than GPT) on classifying AI publications. Our results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required. To evaluate the utility of chatbot-annotated datasets on downstream classification tasks, we train a new classifier on GPT-labeled data and compare its performance to the arXiv-trained model. The classifier trained on GPT-labeled data outperforms the arXiv-trained model by nine percentage points, achieving 82% accuracy.
We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch.
We build on recent research on polynomial randomized approximation (PRAX) algorithms for the hard problems of NFA universality and NFA equivalence. Loosely speaking, PRAX algorithms use sampling of infinite domains within any desired accuracy $\delta$. In the spirit of experimental mathematics, we extend the concept of PRAX algorithms to be applicable to the emptiness and universality problems in any domain whose instances admit a tractable distribution as defined in this paper. A technical result here is that a linear (w.r.t. $1/\delta$) number of samples is sufficient, as opposed to the quadratic number of samples in previous papers. We show how the improved and generalized PRAX algorithms apply to universality and emptiness problems in various domains: ordinary automata, tautology testing of propositions, 2D automata, and to solution sets of certain Diophantine equations.
This study investigates a strongly-coupled system of partial differential equations (PDE) governing heat transfer in a copper rod, longitudinal vibrations, and total charge accumulation at electrodes within a magnetizable piezoelectric beam. Conducted within the transmission line framework, the analysis reveals profound interactions between traveling electromagnetic and mechanical waves in magnetizable piezoelectric beams, despite disparities in their velocities. Findings suggest that in the open-loop scenario, the interaction of heat and beam dynamics lacks exponential stability solely considering thermal effects. To confront this challenge, two types of boundary-type state feedback controllers are proposed: (i) employing static feedback controllers entirely and (ii) adopting a hybrid approach wherein the electrical controller dynamically enhances system dynamics. In both cases, solutions of the PDE systems demonstrate exponential stability through meticulously formulated Lyapunov functions with diverse multipliers. The proposed proof technique establishes a robust foundation for demonstrating the exponential stability of Finite-Difference-based model reductions as the discretization parameter approaches zero.
Modelling musical structure is vital yet challenging for artificial intelligence systems that generate symbolic music compositions. This literature review dissects the evolution of techniques for incorporating coherent structure, from symbolic approaches to foundational and transformative deep learning methods that harness the power of computation and data across a wide variety of training paradigms. In the later stages, we review an emerging technique which we refer to as "sub-task decomposition" that involves decomposing music generation into separate high-level structural planning and content creation stages. Such systems incorporate some form of musical knowledge or neuro-symbolic methods by extracting melodic skeletons or structural templates to guide the generation. Progress is evident in capturing motifs and repetitions across all three eras reviewed, yet modelling the nuanced development of themes across extended compositions in the style of human composers remains difficult. We outline several key future directions to realize the synergistic benefits of combining approaches from all eras examined.
Shuffling gradient methods, which are also known as stochastic gradient descent (SGD) without replacement, are widely implemented in practice, particularly including three popular algorithms: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understanding for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
The Yang and Prentice (YP) regression models have garnered interest from the scientific community due to their ability to analyze data whose survival curves exhibit intersection. These models include proportional hazards (PH) and proportional odds (PO) models as specific cases. However, they encounter limitations when dealing with multivariate survival data due to potential dependencies between the times-to-event. A solution is introducing a frailty term into the hazard functions, making it possible for the times-to-event to be considered independent, given the frailty term. In this study, we propose a new class of YP models that incorporate frailty. We use the exponential distribution, the piecewise exponential distribution (PE), and Bernstein polynomials (BP) as baseline functions. Our approach adopts a Bayesian methodology. The proposed models are evaluated through a simulation study, which shows that the YP frailty models with BP and PE baselines perform similarly to the generator parametric model of the data. We apply the models in two real data sets.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.