We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/\lambda$ suffices for global convergence. However, for all step sizes between $1/\lambda$ and the critical step size $2/\lambda$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/\lambda$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization.
Kernel methods map data into high-dimensional spaces, enabling linear algorithms to learn nonlinear functions without explicitly storing the feature vectors. Quantum kernel methods promise efficient learning by encoding feature maps into exponentially large Hilbert spaces inherent in quantum systems. In this work we implement quantum kernels on a 10-qubit star-topology register in a nuclear magnetic resonance (NMR) platform. We experimentally encode classical data in the evolution of multiple quantum coherence orders using data-dependent unitary transformations and then demonstrate one-dimensional regression and two-dimensional classification tasks. By extending the register to a double-layered star configuration, we propose an extended quantum kernel to handle non-parametrized operator inputs. By numerically simulating the extended quantum kernel, we show classification of entangling and nonentangling unitaries. These results confirm that quantum kernels exhibit strong capabilities in classical as well as quantum machine learning tasks.
We introduce the Coarse Payoff-Assessment Learning (CPAL) model, which captures reinforcement learning by boundedly rational decision-makers who focus on the aggregate outcomes of choosing among exogenously defined clusters of alternatives (similarity classes), rather than evaluating each alternative individually. Analyzing a smooth approximation of the model, we show that the learning dynamics exhibit steady-states corresponding to smooth Valuation Equilibria (Jehiel and Samet, 2007). We demonstrate the existence of multiple equilibria in decision trees with generic payoffs and establish the local asymptotic stability of pure equilibria when they occur. Conversely, when trivial choices featuring alternatives within the same similarity class yield sufficiently high payoffs, a unique mixed equilibrium emerges, characterized by indifferences between similarity classes, even under acute sensitivity to payoff differences. Finally, we prove that this unique mixed equilibrium is globally asymptotically stable under the CPAL dynamics.
We consider the computational efficiency of Monte Carlo (MC) and Multilevel Monte Carlo (MLMC) methods applied to partial differential equations with random coefficients. These arise, for example, in groundwater flow modelling, where a commonly used model for the unknown parameter is a random field. We make use of the circulant embedding procedure for sampling from the aforementioned coefficient. To improve the computational complexity of the MLMC estimator in the case of highly oscillatory random fields, we devise and implement a smoothing technique integrated into the circulant embedding method. This allows to choose the coarsest mesh on the first level of MLMC independently of the correlation length of the covariance function of the random field, leading to considerable savings in computational cost. We illustrate this with numerical experiments, where we see a saving of factor 5-10 in computational cost for accuracies of practical interest.
We introduce a novel, data-driven approach for reconstructing temporally coherent 3D motion from unstructured and potentially partial observations of non-rigidly deforming shapes. Our goal is to achieve high-fidelity motion reconstructions for shapes that undergo near-isometric deformations, such as humans wearing loose clothing. The key novelty of our work lies in its ability to combine implicit shape representations with explicit mesh-based deformation models, enabling detailed and temporally coherent motion reconstructions without relying on parametric shape models or decoupling shape and motion. Each frame is represented as a neural field decoded from a feature space where observations over time are fused, hence preserving geometric details present in the input data. Temporal coherence is enforced with a near-isometric deformation constraint between adjacent frames that applies to the underlying surface in the neural field. Our method outperforms state-of-the-art approaches, as demonstrated by its application to human and animal motion sequences reconstructed from monocular depth videos.
Bias studies on multilingual models confirm the presence of gender-related stereotypes in masked models processing languages with high NLP resources. We expand on this line of research by introducing Filipino CrowS-Pairs and Filipino WinoQueer: benchmarks that assess both sexist and anti-queer biases in pretrained language models (PLMs) handling texts in Filipino, a low-resource language from the Philippines. The benchmarks consist of 7,074 new challenge pairs resulting from our cultural adaptation of English bias evaluation datasets, a process that we document in detail to guide similar forthcoming efforts. We apply the Filipino benchmarks on masked and causal multilingual models, including those pretrained on Southeast Asian data, and find that they contain considerable amounts of bias. We also find that for multilingual models, the extent of bias learned for a particular language is influenced by how much pretraining data in that language a model was exposed to. Our benchmarks and insights can serve as a foundation for future work analyzing and mitigating bias in multilingual models.
Maximal clique enumeration (MCE) is crucial for tasks like community detection and biological network analysis. Existing algorithms typically adopt the branch-and-bound framework with the vertex-oriented Bron-Kerbosch (BK) branching strategy, which forms the sub-branches by expanding the partial clique with a vertex. In this paper, we present a novel approach called HBBMC, a hybrid framework combining vertex-oriented BK branching and edge-oriented BK branching, where the latter adopts a branch-and-bound framework which forms the sub-branches by expanding the partial clique with an edge. This hybrid strategy enables more effective pruning and helps achieve a worst-case time complexity better than the best known one under a condition that holds for the majority of real-world graphs. To further enhance efficiency, we introduce an early termination technique, which leverages the topological information of the graphs and constructs the maximal cliques directly without branching. Our early termination technique is applicable to all branch-and-bound frameworks. Extensive experiments demonstrate the superior performance of our techniques.
We construct a polynomial-time classical algorithm that samples from the output distribution of low-depth noisy Clifford circuits with any product-state inputs and final single-qubit measurements in any basis. This class of circuits includes Clifford-magic circuits and Conjugated-Clifford circuits, which are important candidates for demonstrating quantum advantage using non-universal gates. Additionally, our results generalize a simulation algorithm for IQP circuits [Rajakumar et. al, SODA'25] to the case of IQP circuits augmented with CNOT gates, which is another class of non-universal circuits that are relevant to current experiments. Importantly, our results do not require randomness assumptions over the circuit families considered (such as anticoncentration properties) and instead hold for every circuit in each class. This allows us to place tight limitations on the robustness of these circuits to noise. In particular, we show that there is no quantum advantage at large depths with realistically noisy Clifford circuits, even with perfect magic state inputs, or IQP circuits with CNOT gates, even with arbitrary diagonal non-Clifford gates. The key insight behind the algorithm is that interspersed noise causes a decay of long-range entanglement, and at depths beyond a critical threshold, the noise builds up to an extent that most correlations can be classically simulated. To prove our results, we merge techniques from percolation theory with tools from Pauli path analysis.
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.