Osteoarthritis is a degenerative condition affecting bones and cartilage, often leading to osteophyte formation, bone density loss, and joint space narrowing. Treatment options to restore normal joint function vary depending on the severity of the condition. This work introduces an innovative deep-learning framework processing shoulder CT scans. It features the semantic segmentation of the proximal humerus and scapula, the 3D reconstruction of bone surfaces, the identification of the glenohumeral (GH) joint region, and the staging of three common osteoarthritic-related pathologies: osteophyte formation (OS), GH space reduction (JS), and humeroscapular alignment (HSA). The pipeline comprises two cascaded CNN architectures: 3D CEL-UNet for segmentation and 3D Arthro-Net for threefold classification. A retrospective dataset of 571 CT scans featuring patients with various degrees of GH osteoarthritic-related pathologies was used to train, validate, and test the pipeline. Root mean squared error and Hausdorff distance median values for 3D reconstruction were 0.22mm and 1.48mm for the humerus and 0.24mm and 1.48mm for the scapula, outperforming state-of-the-art architectures and making it potentially suitable for a PSI-based shoulder arthroplasty preoperative plan context. The classification accuracy for OS, JS, and HSA consistently reached around 90% across all three categories. The computational time for the inference pipeline was less than 15s, showcasing the framework's efficiency and compatibility with orthopedic radiology practice. The outcomes represent a promising advancement toward the medical translation of artificial intelligence tools. This progress aims to streamline the preoperative planning pipeline delivering high-quality bone surfaces and supporting surgeons in selecting the most suitable surgical approach according to the unique patient joint conditions.
We propose a framework to learn the time-dependent Hartree-Fock (TDHF) inter-electronic potential of a molecule from its electron density dynamics. Though the entire TDHF Hamiltonian, including the inter-electronic potential, can be computed from first principles, we use this problem as a testbed to develop strategies that can be applied to learn a priori unknown terms that arise in other methods/approaches to quantum dynamics, e.g., emerging problems such as learning exchange-correlation potentials for time-dependent density functional theory. We develop, train, and test three models of the TDHF inter-electronic potential, each parameterized by a four-index tensor of size up to $60 \times 60 \times 60 \times 60$. Two of the models preserve Hermitian symmetry, while one model preserves an eight-fold permutation symmetry that implies Hermitian symmetry. Across seven different molecular systems, we find that accounting for the deeper eight-fold symmetry leads to the best-performing model across three metrics: training efficiency, test set predictive power, and direct comparison of true and learned inter-electronic potentials. All three models, when trained on ensembles of field-free trajectories, generate accurate electron dynamics predictions even in a field-on regime that lies outside the training set. To enable our models to scale to large molecular systems, we derive expressions for Jacobian-vector products that enable iterative, matrix-free training.
Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources, e.g. data collected under different circumstances or synthesized by generative models. We refer to such data as `surrogate data'. We study a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution. Surprisingly, this can happen even when the surrogate data is unrelated to the original ones. We trace back this behavior to the classical Stein's paradox. $(ii)$ In order to reap the benefit of surrogate data, it is crucial to use optimally weighted ERM. $(iii)$ The test error of models trained on mixtures of real and surrogate data is approximately described by a scaling law. This scaling law can be used to predict the optimal weighting scheme, and to choose the amount of surrogate data to add.
The development of mechanistic models of physical systems is essential for understanding their behavior and formulating predictions that can be validated experimentally. Calibration of these models, especially for complex systems, requires automated optimization methods due to the impracticality of manual parameter tuning. In this study, we use an autoencoder to automatically extract relevant features from the membrane trace of a complex neuron model emulated on the BrainScaleS-2 neuromorphic system, and subsequently leverage sequential neural posterior estimation (SNPE), a simulation-based inference algorithm, to approximate the posterior distribution of neuron parameters. Our results demonstrate that the autoencoder is able to extract essential features from the observed membrane traces, with which the SNPE algorithm is able to find an approximation of the posterior distribution. This suggests that the combination of an autoencoder with the SNPE algorithm is a promising optimization method for complex systems.
Many partial differential equations (PDEs) such as Navier--Stokes equations in fluid mechanics, inelastic deformation in solids, and transient parabolic and hyperbolic equations do not have an exact, primal variational structure. Recently, a variational principle based on the dual (Lagrange multiplier) field was proposed. The essential idea in this approach is to treat the given PDE as constraints, and to invoke an arbitrarily chosen auxiliary potential with strong convexity properties to be optimized. This leads to requiring a convex dual functional to be minimized subject to Dirichlet boundary conditions on dual variables, with the guarantee that even PDEs that do not possess a variational structure in primal form can be solved via a variational principle. The vanishing of the first variation of the dual functional is, up to Dirichlet boundary conditions on dual fields, the weak form of the primal PDE problem with the dual-to-primal change of variables incorporated. We derive the dual weak form for the linear, one-dimensional, transient convection-diffusion equation. A Galerkin discretization is used to obtain the discrete equations, with the trial and test functions chosen as linear combination of either RePU activation functions (shallow neural network) or B-spline basis functions; the corresponding stiffness matrix is symmetric. For transient problems, a space-time Galerkin implementation is used with tensor-product B-splines as approximating functions. Numerical results are presented for the steady-state and transient convection-diffusion equation, and transient heat conduction. The proposed method delivers sound accuracy for ODEs and PDEs and rates of convergence are established in the $L^2$ norm and $H^1$ seminorm for the steady-state convection-diffusion problem.
A time-varying bivariate copula joint model, which models the repeatedly measured longitudinal outcome at each time point and the survival data jointly by both the random effects and time-varying bivariate copulas, is proposed in this paper. A regular joint model normally supposes there exist subject-specific latent random effects or classes shared by the longitudinal and time-to-event processes and the two processes are conditionally independent given these latent variables. Under this assumption, the joint likelihood of the two processes is straightforward to derive and their association, as well as heterogeneity among the population, are naturally introduced by the unobservable latent variables. However, because of the unobservable nature of these latent variables, the conditional independence assumption is difficult to verify. Therefore, besides the random effects, a time-varying bivariate copula is introduced to account for the extra time-dependent association between the two processes. The proposed model includes a regular joint model as a special case under some copulas. Simulation studies indicates the parameter estimators in the proposed model are robust against copula misspecification and it has superior performance in predicting survival probabilities compared to the regular joint model. A real data application on the Primary biliary cirrhosis (PBC) data is performed.
Liquid composites moulding is an important manufacturing technology for fibre reinforced composites, due to its cost-effectiveness. Challenges lie in the optimisation of the process due to the lack of understanding of key characteristic of textile fabrics - permeability. The problem of computing the permeability coefficient can be modelled as the well-known Stokes-Brinkman equation, which introduces a heterogeneous parameter $\beta$ distinguishing macropore regions and fibre-bundle regions. In the present work, we train a Fourier neural operator to learn the nonlinear map from the heterogeneous coefficient $\beta$ to the velocity field $u$, and recover the corresponding macroscopic permeability $K$. This is a challenging inverse problem since both the input and output fields span several order of magnitudes, we introduce different regularization techniques for the loss function and perform a quantitative comparison between them.
Reconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ($d_5$). Using Onion Clustering -- an efficient unsupervised method for single-point time-series analysis -- we assess the maximum extractable information for each descriptor and rank them via a high-dimensional metric. Our results show that advanced descriptors like SOAP and LENS outperform classical ones due to higher signal-to-noise ratios. Nonetheless, even simple descriptors can rival or exceed advanced ones after local signal denoising. For example, $d_5$, initially among the weakest, becomes the most effective at resolving the system's non-local dynamical complexity after denoising. This work highlights the critical role of noise in information extraction from molecular trajectories and offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.
Block majorization-minimization (BMM) is a simple iterative algorithm for constrained nonconvex optimization that sequentially minimizes majorizing surrogates of the objective function in each block while the others are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We first establish that for general constrained nonsmooth nonconvex optimization, BMM with $\rho$-strongly convex and $L_g$-smooth surrogates can produce an $\epsilon$-approximate first-order optimal point within $\widetilde{O}((1+L_g+\rho^{-1})\epsilon^{-2})$ iterations and asymptotically converges to the set of first-order optimal points. Next, we show that BMM combined with trust-region methods with diminishing radius has an improved complexity of $\widetilde{O}((1+L_g) \epsilon^{-2})$, independent of the inverse strong convexity parameter $\rho^{-1}$, allowing improved theoretical and practical performance with `flat' surrogates. Our results hold robustly even when the convex sub-problems are solved as long as the optimality gaps are summable. Central to our analysis is a novel continuous first-order optimality measure, by which we bound the worst-case sub-optimality in each iteration by the first-order improvement the algorithm makes. We apply our general framework to obtain new results on various algorithms such as the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung, regularized nonnegative tensor decomposition, and the classical block projected gradient descent algorithm. Lastly, we numerically demonstrate that the additional use of diminishing radius can improve the convergence rate of BMM in many instances.
Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether these techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unknown if it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using feedback from a weaker (less capable) model to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept prior from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach in multiple LLM alignment tasks.
Traditional electrostatic simulation are meshed-based methods which convert partial differential equations into an algebraic system of equations and their solutions are approximated through numerical methods. These methods are time consuming and any changes in their initial or boundary conditions will require solving the numerical problem again. Newer computational methods such as the physics informed neural net (PINN) similarly require re-training when boundary conditions changes. In this work, we propose an end-to-end deep learning approach to model parameter changes to the boundary conditions. The proposed method is demonstrated on the test problem of a long air-filled capacitor structure. The proposed approach is compared to plain vanilla deep learning (NN) and PINN. It is shown that our method can significantly outperform both NN and PINN under dynamic boundary condition as well as retaining its full capability as a forward model.