We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
Sample selection models represent a common methodology for correcting bias induced by data missing not at random. It is well known that these models are not empirically identifiable without exclusion restrictions. In other words, some variables predictive of missingness do not affect the outcome model of interest. The drive to establish this requirement often leads to the inclusion of irrelevant variables in the model. A recent proposal uses adaptive LASSO to circumvent this problem, but its performance depends on the so-called covariance assumption, which can be violated in small to moderate samples. Additionally, there are no tools yet for post-selection inference for this model. To address these challenges, we propose two families of spike-and-slab priors to conduct Bayesian variable selection in sample selection models. These prior structures allow for constructing a Gibbs sampler with tractable conditionals, which is scalable to the dimensions of practical interest. We illustrate the performance of the proposed methodology through a simulation study and present a comparison against adaptive LASSO and stepwise selection. We also provide two applications using publicly available real data. An implementation and code to reproduce the results in this paper can be found at //github.com/adam-iqbal/selection-spike-slab
Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.
Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's, respectively.
We develop $(\epsilon,\delta)$-differentially private projection-depth-based medians using the propose-test-release (PTR) and exponential mechanisms. Under general conditions on the input parameters and the population measure, (e.g. we do not assume any moment bounds), we quantify the probability the test in PTR fails, as well as the cost of privacy via finite sample deviation bounds. We demonstrate our main result on the canonical projection-depth-based median. In the Gaussian setting, we show that the resulting deviation bound matches the known lower bound for private Gaussian mean estimation, up to a polynomial function of the condition number of the covariance matrix. In the Cauchy setting, we show that the ``outlier error amplification'' effect resulting from the heavy tails outweighs the cost of privacy. This result is then verified via numerical simulations. Additionally, we present results on general PTR mechanisms and a uniform concentration result on the projected spacings of order statistics.
Sparse attention as a efficient method can significantly decrease the computation cost, but current sparse attention tend to rely on window self attention which block the global information flow. For this problem, we present Shifted Cross Chunk Attention (SCCA), using different KV shifting strategy to extend respective field in each attention layer. Except, we combine Dilated Attention(DA) and Dilated Neighborhood Attention(DNA) to present Shifted Dilated Attention(SDA). Both SCCA and SDA can accumulate attention results in multi head attention to obtain approximate respective field in full attention. In this paper, we conduct language modeling experiments using different pattern of SCCA and combination of SCCA and SDA. The proposed shifted cross chunk attention (SCCA) can effectively extend large language models (LLMs) to longer context combined with Positional interpolation(PI) and LoRA than current sparse attention. Notably, SCCA adopts LLaMA2 7B from 4k context to 8k in single V100. This attention pattern can provide a Plug-and-play fine-tuning method to extend model context while retaining their original architectures, and is compatible with most existing techniques.
In survival analysis, complex machine learning algorithms have been increasingly used for predictive modeling. Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. In particular, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of infection over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioral factors, contribute toward overall predictiveness. Time-to-event outcomes such as time to infection are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference, and enjoys double-robustness. We assess the performance of our proposed procedure via numerical simulations and analyze data from the HVTN 702 study to inform enrollment strategies for future HIV vaccine trials.
Non-parametric machine learning models, such as random forests and gradient boosted trees, are frequently used to estimate house prices due to their predictive accuracy, but such methods are often limited in their ability to quantify prediction uncertainty. Conformal Prediction (CP) is a model-agnostic framework for constructing confidence sets around machine learning prediction models with minimal assumptions. However, due to the spatial dependencies observed in house prices, direct application of CP leads to confidence sets that are not calibrated everywhere, i.e., too large of confidence sets in certain geographical regions and too small in others. We survey various approaches to adjust the CP confidence set to account for this and demonstrate their performance on a data set from the housing market in Oslo, Norway. Our findings indicate that calibrating the confidence sets on a \textit{locally weighted} version of the non-conformity scores makes the coverage more consistently calibrated in different geographical regions. We also perform a simulation study on synthetically generated sale prices to empirically explore the performance of CP on housing market data under idealized conditions with known data-generating mechanisms.
The joint modeling of multiple longitudinal biomarkers together with a time-to-event outcome is a challenging modeling task of continued scientific interest. In particular, the computational complexity of high dimensional (generalized) mixed effects models often restricts the flexibility of shared parameter joint models, even when the subject-specific marker trajectories follow highly nonlinear courses. We propose a parsimonious multivariate functional principal components representation of the shared random effects. This allows better scalability, as the dimension of the random effects does not directly increase with the number of markers, only with the chosen number of principal component basis functions used in the approximation of the random effects. The functional principal component representation additionally allows to estimate highly flexible subject-specific random trajectories without parametric assumptions. The modeled trajectories can thus be distinctly different for each biomarker. We build on the framework of flexible Bayesian additive joint models implemented in the R-package 'bamlss', which also supports estimation of nonlinear covariate effects via Bayesian P-splines. The flexible yet parsimonious functional principal components basis used in the estimation of the joint model is first estimated in a preliminary step. We validate our approach in a simulation study and illustrate its advantages by analyzing a study on primary biliary cholangitis.
Normal modal logics extending the logic K4.3 of linear transitive frames are known to lack the Craig interpolation property, except some logics of bounded depth such as S5. We turn this `negative' fact into a research question and pursue a non-uniform approach to Craig interpolation by investigating the following interpolant existence problem: decide whether there exists a Craig interpolant between two given formulas in any fixed logic above K4.3. Using a bisimulation-based characterisation of interpolant existence for descriptive frames, we show that this problem is decidable and coNP-complete for all finitely axiomatisable normal modal logics containing K4.3. It is thus not harder than entailment in these logics, which is in sharp contrast to other recent non-uniform interpolation results. We also extend our approach to Priorean temporal logics (with both past and future modalities) over the standard time flows-the integers, rationals, reals, and finite strict linear orders-none of which is blessed with the Craig interpolation property.
With the rising concern on model interpretability, the application of eXplainable AI (XAI) tools on deepfake detection models has been a topic of interest recently. In image classification tasks, XAI tools highlight pixels influencing the decision given by a model. This helps in troubleshooting the model and determining areas that may require further tuning of parameters. With a wide range of tools available in the market, choosing the right tool for a model becomes necessary as each one may highlight different sets of pixels for a given image. There is a need to evaluate different tools and decide the best performing ones among them. Generic XAI evaluation methods like insertion or removal of salient pixels/segments are applicable for general image classification tasks but may produce less meaningful results when applied on deepfake detection models due to their functionality. In this paper, we perform experiments to show that generic removal/insertion XAI evaluation methods are not suitable for deepfake detection models. We also propose and implement an XAI evaluation approach specifically suited for deepfake detection models.