In this work, we initiate the study of Hamiltonian learning for positive temperature bosonic Gaussian states, the quantum generalization of the widely studied problem of learning Gaussian graphical models. We obtain efficient protocols, both in sample and computational complexity, for the task of inferring the parameters of their underlying quadratic Hamiltonian under the assumption of bounded temperature, squeezing, displacement and maximal degree of the interaction graph. Our protocol only requires heterodyne measurements, which are often experimentally feasible, and has a sample complexity that scales logarithmically with the number of modes. Furthermore, we show that it is possible to learn the underlying interaction graph in a similar setting and sample complexity. Taken together, our results put the status of the quantum Hamiltonian learning problem for continuous variable systems in a much more advanced state when compared to spins, where state-of-the-art results are either unavailable or quantitatively inferior to ours. In addition, we use our techniques to obtain the first results on learning Gaussian states in trace distance with a quadratic scaling in precision and polynomial in the number of modes, albeit imposing certain restrictions on the Gaussian states. Our main technical innovations are several continuity bounds for the covariance and Hamiltonian matrix of a Gaussian state, which are of independent interest, combined with what we call the local inversion technique. In essence, the local inversion technique allows us to reliably infer the Hamiltonian of a Gaussian state by only estimating in parallel submatrices of the covariance matrix whose size scales with the desired precision, but not the number of modes. This way we bypass the need to obtain precise global estimates of the covariance matrix, controlling the sample complexity.
We investigate diffusion models to solve the Traveling Salesman Problem. Building on the recent DIFUSCO and T2TCO approaches, we propose IDEQ. IDEQ improves the quality of the solutions by leveraging the constrained structure of the state space of the TSP. Another key component of IDEQ consists in replacing the last stages of DIFUSCO curriculum learning by considering a uniform distribution over the Hamiltonian tours whose orbits by the 2-opt operator converge to the optimal solution as the training objective. Our experiments show that IDEQ improves the state of the art for such neural network based techniques on synthetic instances. More importantly, our experiments show that IDEQ performs very well on the instances of the TSPlib, a reference benchmark in the TSP community: it closely matches the performance of the best heuristics, LKH3, being even able to obtain better solutions than LKH3 on 2 instances of the TSPlib defined on 1577 and 3795 cities. IDEQ obtains 0.3% optimality gap on TSP instances made of 500 cities, and 0.5% on TSP instances with 1000 cities. This sets a new SOTA for neural based methods solving the TSP. Moreover, IDEQ exhibits a lower variance and better scales-up with the number of cities with regards to DIFUSCO and T2TCO.
In this study, we address the central issue of statistical inference for Markov jump processes using discrete time observations. The primary problem at hand is to accurately estimate the infinitesimal generator of a Markov jump process, a critical task in various applications. To tackle this problem, we begin by reviewing established methods for generating sample paths from a Markov jump process conditioned to endpoints, known as Markov bridges. Additionally, we introduce a novel algorithm grounded in the concept of time-reversal, which serves as our main contribution. Our proposed method is then employed to estimate the infinitesimal generator of a Markov jump process. To achieve this, we use a combination of Markov Chain Monte Carlo techniques and the Monte Carlo Expectation-Maximization algorithm. The results obtained from our approach demonstrate its effectiveness in providing accurate parameter estimates. To assess the efficacy of our proposed method, we conduct a comprehensive comparative analysis with existing techniques (Bisection, Uniformization, Direct, Rejection, and Modified Rejection), taking into consideration both speed and accuracy. Notably, our method stands out as the fastest among the alternatives while maintaining high levels of precision.
We design and investigate a variety of multigrid solvers for high-order local discontinuous Galerkin methods applied to elliptic interface and multiphase Stokes problems. Using the template of a standard multigrid V-cycle, we consider a variety of element-wise block smoothers, including Jacobi, multi-coloured Gauss-Seidel, processor-block Gauss-Seidel, and with special interest, smoothers based on sparse approximate inverse (SAI) methods. In particular, we develop SAI methods that: (i) balance the smoothing of velocity and pressure variables in Stokes problems; and (ii) robustly handles high-contrast viscosity coefficients in multiphase problems. Across a broad range of two- and three-dimensional test cases, including Poisson, elliptic interface, steady-state Stokes, and unsteady Stokes problems, we examine a multitude of multigrid smoother and solver combinations. In every case, there is at least one approach that matches the performance of classical geometric multigrid algorithms, e.g., 4 to 8 iterations to reduce the residual by 10 orders of magnitude. We also discuss their relative merits with regard to simplicity, robustness, computational cost, and parallelisation.
Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of actions. Exploration can also be directed using intrinsic rewards, such as curiosity or model epistemic uncertainty. However, effectively balancing task and intrinsic rewards is challenging and often task-dependent. In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. MaxInfoRL steers exploration towards informative transitions, by maximizing intrinsic rewards such as the information gain about the underlying task. When combined with Boltzmann exploration, this approach naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits. We then apply this general formulation to a variety of off-policy model-free RL methods for continuous state-action spaces, yielding novel algorithms that achieve superior performance across hard exploration problems and complex scenarios such as visual control tasks.
This study analyses the publication activity and migration patterns of Ukrainian scholars in the social sciences and humanities (SSH) during the initial two years of the Russo-Ukrainian war. Focusing on scholars who published at least three papers, the study underscores the resilience of these scholars, who continued their academic endeavours within their homeland despite the conflict. The research utilizes data from the Social Sciences Citation Index (SSCI) and the Arts & Humanities Citation Index (AHCI) to illustrate their continued scientific contributions under adverse conditions. It also highlights the crucial role of international collaboration in supporting Ukrainian SSH research, emphasizing that such collaborations primarily manifest through joint research projects rather than relocation of scholars to foreign institutions.
This study presents a novel representation learning model tailored for dynamic networks, which describes the continuously evolving relationships among individuals within a population. The problem is encapsulated in the dimension reduction topic of functional data analysis. With dynamic networks represented as matrix-valued functions, our objective is to map this functional data into a set of vector-valued functions in a lower-dimensional learning space. This space, defined as a metric functional space, allows for the calculation of norms and inner products. By constructing this learning space, we address (i) attribute learning, (ii) community detection, and (iii) link prediction and recovery of individual nodes in the dynamic network. Our model also accommodates asymmetric low-dimensional representations, enabling the separate study of nodes' regulatory and receiving roles. Crucially, the learning method accounts for the time-dependency of networks, ensuring that representations are continuous over time. The functional learning space we define naturally spans the time frame of the dynamic networks, facilitating both the inference of network links at specific time points and the reconstruction of the entire network structure without direct observation. We validated our approach through simulation studies and real-world applications. In simulations, we compared our methods link prediction performance to existing approaches under various data corruption scenarios. For real-world applications, we examined a dynamic social network replicated across six ant populations, demonstrating that our low-dimensional learning space effectively captures interactions, roles of individual ants, and the social evolution of the network. Our findings align with existing knowledge of ant colony behavior.
Eye movements provide a window into human behaviour, attention, and interaction dynamics. Challenges in real-world, multi-person environments have, however, restrained eye-tracking research predominantly to single-person, in-lab settings. We developed a system to stream, record, and analyse synchronised data from multiple mobile eye-tracking devices during collective viewing experiences (e.g., concerts, films, lectures). We implemented lightweight operator interfaces for real-time-monitoring, remote-troubleshooting, and gaze-projection from individual egocentric perspectives to a common coordinate space for shared gaze analysis. We tested the system in a live concert and a film screening with 30 simultaneous viewers during each of two public events (N=60). We observe precise time-synchronisation between devices measured through recorded clock-offsets, and accurate gaze-projection in challenging dynamic scenes. Our novel analysis metrics and visualizations illustrate the potential of collective eye-tracking data for understanding collaborative behaviour and social interaction. This advancement promotes ecological validity in eye-tracking research and paves the way for innovative interactive tools.
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.