We study the high-dimensional partial linear model, where the linear part has a high-dimensional sparse regression coefficient and the nonparametric part includes a function whose derivatives are of bounded total variation. We expand upon the univariate trend filtering to develop partial linear trend filtering--a doubly penalized least square estimation approach based on $\ell_1$ penalty and total variation penalty. Analogous to the advantages of trend filtering in univariate nonparametric regression, partial linear trend filtering not only can be efficiently computed, but also achieves the optimal error rate for estimating the nonparametric function. This in turn leads to the oracle rate for the linear part as if the underlying nonparametric function were known. We compare the proposed approach with a standard smoothing spline based method, and show both empirically and theoretically that the former outperforms the latter when the underlying function possesses heterogeneous smoothness. We apply our approach to the IDATA study to investigate the relationship between metabolomic profiles and ultra-processed food (UPF) intake, efficiently identifying key metabolites associated with UPF consumption and demonstrating strong predictive performance.
This work considers the Galerkin approximation and analysis for a hyperbolic integrodifferential equation, where the non-positive variable-sign kernel and nonlinear-nonlocal damping with both the weak and viscous damping effects are involved. We derive the long-time stability of the solution and its finite-time uniqueness. For the semi-discrete-in-space Galerkin scheme, we derive the long-time stability of the semi-discrete numerical solution and its finite-time error estimate by technical splitting of intricate terms. Then we further apply the centering difference method and the interpolating quadrature to construct a fully discrete Galerkin scheme and prove the long-time stability of the numerical solution and its finite-time error estimate by designing a new semi-norm. Numerical experiments are performed to verify the theoretical findings.
In this contribution, we address the numerical solutions of high-order asymptotic equivalent partial differential equations with the results of a lattice Boltzmann scheme for an inhomogeneous advection problem in one spatial dimension. We first derive a family of equivalent partial differential equations at various orders, and we compare the lattice Boltzmann experimental results with a spectral approximation of the differential equations. For an unsteady situation, we show that the initialization scheme at a sufficiently high order of the microscopic moments plays a crucial role to observe an asymptotic error consistent with the order of approximation. For a stationary long-time limit, we observe that the measured asymptotic error converges with a reduced order of precision compared to the one suggested by asymptotic analysis.
We study the identification of binary choice models with fixed effects. We provide a condition called sign saturation and show that this condition is sufficient for the identification of the model. In particular, we can guarantee identification even when all the regressors are bounded, including multiple discrete regressors. We also show that without this condition, the model is not identified unless the error distribution belongs to a special class. The same sign saturation condition is also essential for identifying the sign of treatment effects. A test is provided to check the sign saturation condition and can be implemented using existing algorithms for the maximum score estimator.
A finite element method is introduced to track interface evolution governed by the level set equation. The method solves for the level set indicator function in a narrow band around the interface. An extension procedure, which is essential for a narrow band level set method, is introduced based on a finite element $L^2$- or $H^1$-projection combined with the ghost-penalty method. This procedure is formulated as a linear variational problem in a narrow band around the surface, making it computationally efficient and suitable for rigorous error analysis. The extension method is combined with a discontinuous Galerkin space discretization and a BDF time-stepping scheme. The paper analyzes the stability and accuracy of the extension procedure and evaluates the performance of the resulting narrow band finite element method for the level set equation through numerical experiments.
We introduce an energy-based model, which seems especially suited for constrained systems. The proposed model provides an alternative to the popular port-Hamiltonian framework and exhibits similar properties such as energy dissipation as well as structure-preserving interconnection and Petrov-Galerkin projection. In terms of time discretization, the midpoint rule and discrete gradient methods are dissipation-preserving. Besides the verification of these properties, we present ten examples from different fields of application.
We consider the problem of estimating the error when solving a system of differential algebraic equations. Richardson extrapolation is a classical technique that can be used to judge when computational errors are irrelevant and estimate the discretization error. We have simulated molecular dynamics with constraints using the GROMACS library and found that the output is not always amenable to Richardson extrapolation. We derive and illustrate Richardson extrapolation using a variety of numerical experiments. We identify two necessary conditions that are not always satisfied by the GROMACS library.
We develop some graph-based tests for spherical symmetry of a multivariate distribution using a method based on data augmentation. These tests are constructed using a new notion of signs and ranks that are computed along a path obtained by optimizing an objective function based on pairwise dissimilarities among the observations in the augmented data set. The resulting tests based on these signs and ranks have the exact distribution-free property, and irrespective of the dimension of the data, the null distributions of the test statistics remain the same. These tests can be conveniently used for high-dimensional data, even when the dimension is much larger than the sample size. Under appropriate regularity conditions, we prove the consistency of these tests in high dimensional asymptotic regime, where the dimension grows to infinity while the sample size may or may not grow with the dimension. We also propose a generalization of our methods to take care of the situations, where the center of symmetry is not specified by the null hypothesis. Several simulated data sets and a real data set are analyzed to demonstrate the utility of the proposed tests.
We study the strong approximation of the solutions to singular stochastic kinetic equations (also referred to as second-order SDEs) driven by $\alpha$-stable processes, using an Euler-type scheme inspired by [11]. For these equations, the stability index $\alpha$ lies in the range $(1,2)$, and the drift term exhibits anisotropic $\beta$-H\"older continuity with $\beta >1 - \frac{\alpha}{2}$. We establish a convergence rate of $(\frac{1}{2} + \frac{\beta}{\alpha(1+\alpha)} \wedge \frac{1}{2})$, which aligns with the results in [4] concerning first-order SDEs.
We consider an unknown multivariate function representing a system-such as a complex numerical simulator-taking both deterministic and uncertain inputs. Our objective is to estimate the set of deterministic inputs leading to outputs whose probability (with respect to the distribution of the uncertain inputs) of belonging to a given set is less than a given threshold. This problem, which we call Quantile Set Inversion (QSI), occurs for instance in the context of robust (reliability-based) optimization problems, when looking for the set of solutions that satisfy the constraints with sufficiently large probability. To solve the QSI problem we propose a Bayesian strategy, based on Gaussian process modeling and the Stepwise Uncertainty Reduction (SUR) principle, to sequentially choose the points at which the function should be evaluated to efficiently approximate the set of interest. We illustrate the performance and interest of the proposed SUR strategy through several numerical experiments.
In including random effects to account for dependent observations, the odds ratio interpretation of logistic regression coefficients is changed from population-averaged to subject-specific. This is unappealing in many applications, motivating a rich literature on methods that maintain the marginal logistic regression structure without random effects, such as generalized estimating equations. However, for spatial data, random effect approaches are appealing in providing a full probabilistic characterization of the data that can be used for prediction. We propose a new class of spatial logistic regression models that maintain both population-averaged and subject-specific interpretations through a novel class of bridge processes for spatial random effects. These processes are shown to have appealing computational and theoretical properties, including a scale mixture of normal representation. The new methodology is illustrated with simulations and an analysis of childhood malaria prevalence data in the Gambia.