For regression tasks one often leverages large datasets for training predictive machine learning models. However, using large datasets may not be feasible due to computational limitations or high data labelling costs. Therefore, suitably selecting small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining efficiency. In this work, we study Farthest Point Sampling (FPS), a data selection approach that aims to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error, conditional to the location of the unlabelled data points, that linearly depends on the training set fill distance. For empirical validation, we perform experiments using two regression models on three datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing our derived bound, significantly reduces the maximum prediction error of various regression models, outperforming alternative sampling approaches by a large margin. Furthermore, we show that selecting training sets with the FPS can also increase model stability for the specific case of Gaussian kernel regression approaches.
Tracking ripening tomatoes is time consuming and labor intensive. Artificial intelligence technologies combined with those of computer vision can help users optimize the process of monitoring the ripening status of plants. To this end, we have proposed a tomato ripening monitoring approach based on deep learning in complex scenes. The objective is to detect mature tomatoes and harvest them in a timely manner. The proposed approach is declined in two parts. Firstly, the images of the scene are transmitted to the pre-processing layer. This process allows the detection of areas of interest (area of the image containing tomatoes). Then, these images are used as input to the maturity detection layer. This layer, based on a deep neural network learning algorithm, classifies the tomato thumbnails provided to it in one of the following five categories: green, brittle, pink, pale red, mature red. The experiments are based on images collected from the internet gathered through searches using tomato state across diverse languages including English, German, French, and Spanish. The experimental results of the maturity detection layer on a dataset composed of images of tomatoes taken under the extreme conditions, gave a good classification rate.
We develop a novel multiple hypothesis testing correction with family-wise error rate (FWER) control that efficiently exploits positive dependencies between potentially correlated statistical hypothesis tests. Our proposed algorithm $\texttt{max-rank}$ is conceptually straight-forward, relying on the use of a $\max$-operator in the rank domain of computed test statistics. We compare our approach to the frequently employed Bonferroni correction, theoretically and empirically demonstrating its superiority over Bonferroni in the case of existing positive dependency, and its equivalence otherwise. Our advantage over Bonferroni increases as the number of tests rises, and we maintain high statistical power whilst ensuring FWER control. We specifically frame our algorithm in the context of parallel permutation testing, a scenario that arises in our primary application of conformal prediction, a recently popularized approach for quantifying uncertainty in complex predictive settings.
In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models. Latent variables indeed encode both transferable distributional information and structural relationships. Thus the distributions of the source latent variables (prior) can be combined with the knowledge learned from the target data (likelihood) to yield the distributions of the target latent variables (posterior) with the goal of addressing acoustic mismatches between training and testing conditions. The prior knowledge transfer is accomplished through Variational Bayes (VB). In addition, we also investigate Maximum a Posteriori (MAP) based Bayesian adaptation. Experimental results on device adaptation in acoustic scene classification show that our proposed approaches can obtain good improvements on target devices, and consistently outperforms other cut-edging algorithms.
The privacy in classical federated learning can be breached through the use of local gradient results by using engineered queries from the clients. However, quantum communication channels are considered more secure because the use of measurements in the data causes some loss of information, which can be detected. Therefore, the quantum version of federated learning can be used to provide more privacy. Additionally, sending an $N$ dimensional data vector through a quantum channel requires sending $\log N$ entangled qubits, which can provide exponential efficiency if the data vector is obtained as quantum states. In this paper, we propose a quantum federated learning model where fixed design quantum chips are operated based on the quantum states sent by a centralized server. Based on the coming superposition states, the clients compute and then send their local gradients as quantum states to the server, where they are aggregated to update parameters. Since the server does not send model parameters, but instead sends the operator as a quantum state, the clients are not required to share the model. This allows for the creation of asynchronous learning models. In addition, the model as a quantum state is fed into client-side chips directly; therefore, it does not require measurements on the upcoming quantum state to obtain model parameters in order to compute gradients. This can provide efficiency over the models where parameter vector is sent via classical or quantum channels and local gradients are obtained through the obtained values of these parameters.
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy a large number of interactive samples, and the communication between these samples is crucial in speeding up the convergence. To justify the validity of these sampling strategies, the concept of interacting particles naturally calls for the mean-field theory. The theory establishes a correspondence between particle interactions encoded in a set of coupled ODEs/SDEs and a PDE that characterizes the evolution of the underlying distribution. This bridges numerical algorithms with the PDE theory used to show convergence in time. Many mathematical machineries are developed to provide the mean-field analysis, and we showcase two such examples: The coupling method and the compactness argument built upon the martingale strategy. The former has been deployed to show the convergence of ensemble Kalman sampler and ensemble Kalman inversion, and the latter will be shown to be immensely powerful in proving the validity of the Vlasov-Boltzmann simulator.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.