This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.
We propose a new Riemannian gradient descent method for computing spherical area-preserving mappings of topological spheres using a Riemannian retraction-based framework with theoretically guaranteed convergence. The objective function is based on the stretch energy functional, and the minimization is constrained on a power manifold of unit spheres embedded in 3-dimensional Euclidean space. Numerical experiments on several mesh models demonstrate the accuracy and stability of the proposed framework. Comparisons with two existing state-of-the-art methods for computing area-preserving mappings demonstrate that our algorithm is both competitive and more efficient. Finally, we present a concrete application to the problem of landmark-aligned surface registration of two brain models.
This paper focuses on the inverse elastic impedance and the geometry problem by a Cauchy data pair on the access part of the boundary in a two-dimensional case. Through the decomposition of the displacement, the problem is transform the solution of into a coupled boundary value problem that involves two scalar Helmholtz equations. Firstly, a uniqueness result is given, and a non-iterative algorithm is proposed to solve the data completion problem using a Cauchy data pair on a known part of the solution domain's boundary. Next, we introduce a Newton-type iterative method for reconstructing the boundary and the impedance function using the completion data on the unknown boundary, which is governed by a specific type of boundary conditions. Finally, we provide several examples to demonstrate the effectiveness and accuracy of the proposed method.
Evaluating the expected information gain (EIG) is a critical task in many areas of computational science and statistics, necessitating the approximation of nested integrals. Available techniques for this problem based on quasi-Monte Carlo (QMC) methods have focused on enhancing the efficiency of either the inner or outer integral approximation. In this work, we introduce a novel approach that extends the scope of these efforts to address inner and outer expectations simultaneously. Leveraging the principles of Owen's scrambling of digital nets, we develop a randomized QMC (rQMC) method that improves the convergence behavior of the approximation of nested integrals. We also indicate how to combine this methodology with importance sampling to address a measure concentration arising in the inner integral. Our method capitalizes on the unique structure of nested expectations to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we handle integrands exhibiting infinite variation in the Hardy--Krause sense, paving the way for theoretically sound error estimates. As the main contribution of this work, we derive asymptotic error bounds for the bias and variance of our estimator, along with regularity conditions under which these error bounds can be attained. In addition, we provide nearly optimal sample sizes for the rQMC approximations, which are helpful for the actual numerical implementations. Moreover, we verify the quality of our estimator through numerical experiments in the context of EIG estimation. Specifically, we compare the computational efficiency of our rQMC method against standard nested MC integration across two case studies: one in thermo-mechanics and the other in pharmacokinetics. These examples highlight our approach's computational savings and enhanced applicability.
In recent years, power analysis has become widely used in applied sciences, with the increasing importance of the replicability issue. When distribution-free methods, such as Partial Least Squares (PLS)-based approaches, are considered, formulating power analysis turns out to be challenging. In this study, we introduce the methodological framework of a new procedure for performing power analysis when PLS-based methods are used. Data are simulated by the Monte Carlo method, assuming the null hypothesis of no effect is false and exploiting the latent structure estimated by PLS in the pilot data. In this way, the complex correlation data structure is explicitly considered in power analysis and sample size estimation. The paper offers insights into selecting statistical tests for the power analysis procedure, comparing accuracy-based tests and those based on continuous parameters estimated by PLS. Simulated and real datasets are investigated to show how the method works in practice.
We consider the general class of time-homogeneous stochastic dynamical systems, both discrete and continuous, and study the problem of learning a representation of the state that faithfully captures its dynamics. This is instrumental to learning the transfer operator or the generator of the system, which in turn can be used for numerous tasks, such as forecasting and interpreting the system dynamics. We show that the search for a good representation can be cast as an optimization problem over neural networks. Our approach is supported by recent results in statistical learning theory, highlighting the role of approximation error and metric distortion in the learning problem. The objective function we propose is associated with projection operators from the representation space to the data space, overcomes metric distortion, and can be empirically estimated from data. In the discrete-time setting, we further derive a relaxed objective function that is differentiable and numerically well-conditioned. We compare our method against state-of-the-art approaches on different datasets, showing better performance across the board.
The recent paper (IEEE Trans. IT 69, 1680) introduced an analytical method for calculating the channel capacity without the need for iteration. This method has certain limitations that restrict its applicability. Furthermore, the paper does not provide an explanation as to why the channel capacity can be solved analytically in this particular case. In order to broaden the scope of this method and address its limitations, we turn our attention to the reverse em-problem, proposed by Toyota (Information Geometry, 3, 1355 (2020)). This reverse em-problem involves iteratively applying the inverse map of the em iteration to calculate the channel capacity, which represents the maximum mutual information. However, several open problems remained unresolved in Toyota's work. To overcome these challenges, we formulate the reverse em-problem based on Bregman divergence and provide solutions to these open problems. Building upon these results, we transform the reverse em-problem into em-problems and derive a non-iterative formula for the reverse em-problem. This formula can be viewed as a generalization of the aforementioned analytical calculation method. Importantly, this derivation sheds light on the information geometrical structure underlying this special case. By effectively addressing the limitations of the previous analytical method and providing a deeper understanding of the underlying information geometrical structure, our work significantly expands the applicability of the proposed method for calculating the channel capacity without iteration.
In logistic regression modeling, Firth's modified estimator is widely used to address the issue of data separation, which results in the nonexistence of the maximum likelihood estimate. Firth's modified estimator can be formulated as a penalized maximum likelihood estimator in which Jeffreys' prior is adopted as the penalty term. Despite its widespread use in practice, the formal verification of the corresponding estimate's existence has not been established. In this study, we establish the existence theorem of Firth's modified estimate in binomial logistic regression models, assuming only the full column rankness of the design matrix. We also discuss other binomial regression models obtained through alternating link functions and prove the existence of similar penalized maximum likelihood estimates for such models.
This work presents a comparative review and classification between some well-known thermodynamically consistent models of hydrogel behavior in a large deformation setting, specifically focusing on solvent absorption/desorption and its impact on mechanical deformation and network swelling. The proposed discussion addresses formulation aspects, general mathematical classification of the governing equations, and numerical implementation issues based on the finite element method. The theories are presented in a unified framework demonstrating that, despite not being evident in some cases, all of them follow equivalent thermodynamic arguments. A detailed numerical analysis is carried out where Taylor-Hood elements are employed in the spatial discretization to satisfy the inf-sup condition and to prevent spurious numerical oscillations. The resulting discrete problems are solved using the FEniCS platform through consistent variational formulations, employing both monolithic and staggered approaches. We conduct benchmark tests on various hydrogel structures, demonstrating that major differences arise from the chosen volumetric response of the hydrogel. The significance of this choice is frequently underestimated in the state-of-the-art literature but has been shown to have substantial implications on the resulting hydrogel behavior.
This paper proposes a~simple, yet powerful, method for balancing distributions of covariates for causal inference based on observational studies. The method makes it possible to balance an arbitrary number of quantiles (e.g., medians, quartiles, or deciles) together with means if necessary. The proposed approach is based on the theory of calibration estimators (Deville and S\"arndal 1992), in particular, calibration estimators for quantiles, proposed by Harms and Duchesne (2006). The method does not require numerical integration, kernel density estimation or assumptions about the distributions. Valid estimates can be obtained by drawing on existing asymptotic theory. An~illustrative example of the proposed approach is presented for the entropy balancing method and the covariate balancing propensity score method. Results of a~simulation study indicate that the method efficiently estimates average treatment effects on the treated (ATT), the average treatment effect (ATE), the quantile treatment effect on the treated (QTT) and the quantile treatment effect (QTE), especially in the presence of non-linearity and mis-specification of the models. The proposed approach can be further generalized to other designs (e.g. multi-category, continuous) or methods (e.g. synthetic control method). An open source software implementing proposed methods is available.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.