We investigate the randomized decision tree complexity of a specific class of read-once threshold functions. A read-once threshold formula can be defined by a rooted tree, every internal node of which is labeled by a threshold function $T_k^n$ (with output 1 only when at least $k$ out of $n$ input bits are 1) and each leaf by a distinct variable. Such a tree defines a Boolean function in a natural way. We focus on the randomized decision tree complexity of such functions, when the underlying tree is a uniform tree with all its internal nodes labeled by the same threshold function. We prove lower bounds of the form $c(k,n)^d$, where $d$ is the depth of the tree. We also treat trees with alternating levels of AND and OR gates separately and show asymptotically optimal bounds, extending the known bounds for the binary case.
Modelling noisy data in a network context remains an unavoidable obstacle; fortunately, random matrix theory may comprehensively describe network environments effectively. Thus it necessitates the probabilistic characterisation of these networks (and accompanying noisy data) using matrix variate models. Denoising network data using a Bayes approach is not common in surveyed literature. This paper adopts the Bayesian viewpoint and introduces a new matrix variate t-model in a prior sense by relying on the matrix variate gamma distribution for the noise process, following the Gaussian graphical network for the cases when the normality assumption is violated. From a statistical learning viewpoint, such a theoretical consideration indubitably benefits the real-world comprehension of structures causing noisy data with network-based attributes as part of machine learning in data science. A full structural learning procedure is provided for calculating and approximating the resulting posterior of interest to assess the considered model's network centrality measures. Experiments with synthetic and real-world stock price data are performed not only to validate the proposed algorithm's capabilities but also to show that this model has wider flexibility than originally implied in Billio et al. (2021).
Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.
The univariate integer-valued time series has been extensively studied, but literature on multivariate integer-valued time series models is quite limited and the complex correlation structure among the multivariate integer-valued time series is barely discussed. In this study, we proposed a first-order multivariate integer-valued autoregressive model to characterize the correlation among multivariate integer-valued time series with higher flexibility. Under the general conditions, we established the stationarity and ergodicity of the proposed model. With the proposed method, we discussed the models with multivariate Poisson-lognormal distribution and multivariate geometric-logitnormal distribution and the corresponding properties. The estimation method based on EM algorithm was developed for the model parameters and extensive simulation studies were performed to evaluate the effectiveness of proposed estimation method. Finally, a real crime data was analyzed to demonstrate the advantage of the proposed model with comparison to the other models.
An $(r, \delta)$-locally repairable code ($(r, \delta)$-LRC for short) was introduced by Prakash et al. for tolerating multiple failed nodes in distributed storage systems, and has garnered significant interest among researchers. An $(r,\delta)$-LRC is called an optimal code if its parameters achieve the Singleton-like bound. In this paper, we construct three classes of $q$-ary optimal cyclic $(r,\delta)$-LRCs with new parameters by investigating the defining sets of cyclic codes. Our results generalize the related work of \cite{Chen2022,Qian2020}, and the obtained optimal cyclic $(r, \delta)$-LRCs have flexible parameters. A lot of numerical examples of optimal cyclic $(r, \delta)$-LRCs are given to show that our constructions are capable of generating new optimal cyclic $(r, \delta)$-LRCs.
Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.
Parameter identification problems in partial differential equations (PDEs) consist in determining one or more unknown functional parameters in a PDE. Here, the Bayesian nonparametric approach to such problems is considered. Focusing on the representative example of inferring the diffusivity function in an elliptic PDE from noisy observations of the PDE solution, the performance of Bayesian procedures based on Gaussian process priors is investigated. Recent asymptotic theoretical guarantees establishing posterior consistency and convergence rates are reviewed and expanded upon. An implementation of the associated posterior-based inference is provided, and illustrated via a numerical simulation study where two different discretisation strategies are devised. The reproducible code is available at: //github.com/MattGiord.
In inverse scattering problems, a model that allows for the simultaneous recovery of both the domain shape and an impedance boundary condition covers a wide range of problems with impenetrable domains, including recovering the shape of sound-hard and sound-soft obstacles and obstacles with thin coatings. This work develops an optimization framework for recovering the shape and material parameters of a penetrable, dissipative obstacle in the multifrequency setting, using a constrained class of curvature-dependent impedance function models proposed by Antoine, Barucq, and Vernhet. We find that this constrained model improves the robustness of the recovery problem, compared to more general models, and provides meaningfully better obstacle recovery than simpler models. We explore the effectiveness of the model for varying levels of dissipation, for noise-corrupted data, and for limited aperture data in the numerical examples.
A powerful statistical interpolating concept, which we call \emph{fully lifted} (fl), is introduced and presented while establishing a connection between bilinearly indexed random processes and their corresponding fully decoupled (linearly indexed) comparative alternatives. Despite on occasion very involved technical considerations, the final interpolating forms and their underlying relations admit rather elegant expressions that provide conceivably highly desirable and useful tool for further studying various different aspects of random processes and their applications. We also discuss the generality of the considered models and show that they encompass many well known random structures and optimization problems to which then the obtained results automatically apply.
Many generalised distributions exist for modelling data with vastly diverse characteristics. However, very few of these generalisations of the normal distribution have shape parameters with clear roles that determine, for instance, skewness and tail shape. In this chapter, we review existing skewing mechanisms and their properties in detail. Using the knowledge acquired, we add a skewness parameter to the body-tail generalised normal distribution \cite{BTGN}, that yields the \ac{FIN} with parameters for location, scale, body-shape, skewness, and tail weight. Basic statistical properties of the \ac{FIN} are provided, such as the \ac{PDF}, cumulative distribution function, moments, and likelihood equations. Additionally, the \ac{FIN} \ac{PDF} is extended to a multivariate setting using a student t-copula, yielding the \ac{MFIN}. The \ac{MFIN} is applied to stock returns data, where it outperforms the t-copula multivariate generalised hyperbolic, Azzalini skew-t, hyperbolic, and normal inverse Gaussian distributions.
We study the optimal sample complexity of neighbourhood selection in linear structural equation models, and compare this to best subset selection (BSS) for linear models under general design. We show by example that -- even when the structure is \emph{unknown} -- the existence of underlying structure can reduce the sample complexity of neighbourhood selection. This result is complicated by the possibility of path cancellation, which we study in detail, and show that improvements are still possible in the presence of path cancellation. Finally, we support these theoretical observations with experiments. The proof introduces a modified BSS estimator, called klBSS, and compares its performance to BSS. The analysis of klBSS may also be of independent interest since it applies to arbitrary structured models, not necessarily those induced by a structural equation model. Our results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.