We consider functional data where an underlying smooth curve is composed not just with errors, but also with irregular spikes. We propose an approach that, combining regularized spline smoothing and an Expectation-Maximization algorithm, allows one to both identify spikes and estimate the smooth component. Imposing some assumptions on the error distribution, we prove consistency of EM estimates. Next, we demonstrate the performance of our proposal on finite samples and its robustness to assumptions violations through simulations. Finally, we apply our proposal to data on the annual heatwaves index in the US and on weekly electricity consumption in Ireland. In both datasets, we are able to characterize underlying smooth trends and to pinpoint irregular/extreme behaviors.
In order to perform isogeometric analysis with increased smoothness on complex domains, trimming, variational coupling or unstructured spline methods can be used. The latter two classes of methods require a multi-patch segmentation of the domain, and provide continuous bases along patch interfaces. In the context of shell modeling, variational methods are widely used, whereas the application of unstructured spline methods on shell problems is rather scarce. In this paper, we therefore provide a qualitative and a quantitative comparison of a selection of unstructured spline constructions, in particular the D-Patch, Almost-$C^1$, Analysis-Suitable $G^1$ and the Approximate $C^1$ constructions. Using this comparison, we aim to provide insight into the selection of methods for practical problems, as well as directions for future research. In the qualitative comparison, the properties of each method are evaluated and compared. In the quantitative comparison, a selection of numerical examples is used to highlight different advantages and disadvantages of each method. In the latter, comparison with weak coupling methods such as Nitsche's method or penalty methods is made as well. In brief, it is concluded that the Approximate $C^1$ and Analysis-Suitable $G^1$ converge optimally in the analysis of a bi-harmonic problem, without the need of special refinement procedures. Furthermore, these methods provide accurate stress fields. On the other hand, the Almost-$C^1$ and D-Patch provide relatively easy construction on complex geometries. The Almost-$C^1$ method does not have limitations on the valence of boundary vertices, unlike the D-Patch, but is only applicable to biquadratic local bases. Following from these conclusions, future research directions are proposed, for example towards making the Approximate $C^1$ and Analysis-Suitable $G^1$ applicable to more complex geometries.
Besov priors are nonparametric priors that can model spatially inhomogeneous functions. They are routinely used in inverse problems and imaging, where they exhibit attractive sparsity-promoting and edge-preserving features. A recent line of work has initiated the study of their asymptotic frequentist convergence properties. In the present paper, we consider the theoretical recovery performance of the posterior distributions associated to Besov-Laplace priors in the density estimation model, under the assumption that the observations are generated by a possibly spatially inhomogeneous true density belonging to a Besov space. We improve on existing results and show that carefully tuned Besov-Laplace priors attain optimal posterior contraction rates. Furthermore, we show that hierarchical procedures involving a hyper-prior on the regularity parameter lead to adaptation to any smoothness level.
Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the physics and biology correspondences of the language processing and the unexpected general intelligence.
This paper shows how to use the shooting method, a classical numerical algorithm for solving boundary value problems, to compute the Riemannian distance on the Stiefel manifold $\mathrm{St}(n,p)$, the set of $ n \times p $ matrices with orthonormal columns. The main feature is that we provide neat, explicit expressions for the Jacobians. To the author's knowledge, this is the first time some explicit formulas are given for the Jacobians involved in the shooting methods to find the distance between two given points on the Stiefel manifold. This allows us to perform a preliminary analysis for the single shooting method. Numerical experiments demonstrate the algorithms in terms of accuracy and performance. Finally, we showcase three example applications in summary statistics, shape analysis, and model order reduction.
The generation of curves and surfaces from given data is a well-known problem in Computer-Aided Design that can be approached using subdivision schemes. They are powerful tools that allow obtaining new data from the initial one by means of simple calculations. However, in some applications, the collected data are given with noise and most of schemes are not adequate to process them. In this paper, we present some new families of binary univariate linear subdivision schemes using weighted local polynomial regression. We study their properties, such as convergence, monotonicity, polynomial reproduction and approximation and denoising capabilities. For the convergence study, we develop some new theoretical results. Finally, some examples are presented to confirm the proven properties.
A general class of the almost instantaneous fixed-to-variable-length (AIFV) codes is proposed, which contains every possible binary code we can make when allowing finite bits of decoding delay. The contribution of the paper lies in the following. (i) Introducing $N$-bit-delay AIFV codes, constructed by multiple code trees with higher flexibility than the conventional AIFV codes. (ii) Proving that the proposed codes can represent any uniquely-encodable and uniquely-decodable variable-to-variable length codes. (iii) Showing how to express codes as multiple code trees with minimum decoding delay. (iv) Formulating the constraints of decodability as the comparison of intervals in the real number line. The theoretical results in this paper are expected to be useful for further study on AIFV codes.
Binary responses arise in a multitude of statistical problems, including binary classification, bioassay, current status data problems and sensitivity estimation. There has been an interest in such problems in the Bayesian nonparametrics community since the early 1970s, but inference given binary data is intractable for a wide range of modern simulation-based models, even when employing MCMC methods. Recently, Christensen (2023) introduced a novel simulation technique based on counting permutations, which can estimate both posterior distributions and marginal likelihoods for any model from which a random sample can be generated. However, the accompanying implementation of this technique struggles when the sample size is too large (n > 250). Here we present perms, a new implementation of said technique which is substantially faster and able to handle larger data problems than the original implementation. It is available both as an R package and a Python library. The basic usage of perms is illustrated via two simple examples: a tractable toy problem and a bioassay problem. A more complex example involving changepoint analysis is also considered. We also cover the details of the implementation and illustrate the computational speed gain of perms via a simple simulation study.
We prove a discrete analogue for the composition of the fractional integral and Caputo derivative. This result is relevant in numerical analysis of fractional PDEs when one discretizes the Caputo derivative with the so-called L1 scheme. The proof is based on asymptotic evaluation of the discrete sums with the use of the Euler-Maclaurin summation formula.
Estimating parameters from data is a fundamental problem in physics, customarily done by minimizing a loss function between a model and observed statistics. In scattering-based analysis, researchers often employ their domain expertise to select a specific range of wavevectors for analysis, a choice that can vary depending on the specific case. We introduce another paradigm that defines a probabilistic generative model from the beginning of data processing and propagates the uncertainty for parameter estimation, termed ab initio uncertainty quantification (AIUQ). As an illustrative example, we demonstrate this approach with differential dynamic microscopy (DDM) that extracts dynamical information through Fourier analysis at a selected range of wavevectors. We first show that DDM is equivalent to fitting a temporal variogram in the reciprocal space using a latent factor model as the generative model. Then we derive the maximum marginal likelihood estimator, which optimally weighs information at all wavevectors, therefore eliminating the need to select the range of wavevectors. Furthermore, we substantially reduce the computational cost by utilizing the generalized Schur algorithm for Toeplitz covariances without approximation. Simulated studies validate that AIUQ significantly improves estimation accuracy and enables model selection with automated analysis. The utility of AIUQ is also demonstrated by three distinct sets of experiments: first in an isotropic Newtonian fluid, pushing limits of optically dense systems compared to multiple particle tracking; next in a system undergoing a sol-gel transition, automating the determination of gelling points and critical exponent; and lastly, in discerning anisotropic diffusive behavior of colloids in a liquid crystal. These outcomes collectively underscore AIUQ's versatility to capture system dynamics in an efficient and automated manner.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.