Plant diseases are major causes of production losses and may have a significant impact on the agricultural sector. Detecting pests as early as possible can help increase crop yields and production efficiency. Several robotic monitoring systems have been developed allowing to collect data and provide a greater understanding of environmental processes. An agricultural robot can enable accurate timely detection of pests, by traversing the field autonomously and monitoring the entire cropped area within a field. However, in many cases it is impossible to sample all plants due to resource limitations. In this thesis, the development and evaluation of several sampling algorithms are presented to address the challenge of an agriculture-monitoring ground robot designed to locate insects in an agricultural field, where complete sampling of all the plants is infeasible. Two situations were investigated in simulation models that were specially developed as part of this thesis: where no a-priori information on the insects is available and where prior information on the insects distributions within the field is known. For the first situation, seven algorithms were tested, each utilizing an approach to sample the field without prior knowledge of it. For the second situation, we present the development and evaluation of a dynamic sampling algorithm which utilizes real-time information to prioritize sampling at suspected points, locate hot spots and adapt sampling plans accordingly. The algorithm's performance was compared to two existing algorithms using Tetranychidae insect data from previous research. Analyses revealed that the dynamic algorithm outperformed the others.
Historical materials are abundant. Yet, piecing together how human knowledge has evolved and spread both diachronically and synchronically remains a challenge that can so far only be very selectively addressed. The vast volume of materials precludes comprehensive studies, given the restricted number of human specialists. However, as large amounts of historical materials are now available in digital form there is a promising opportunity for AI-assisted historical analysis. In this work, we take a pivotal step towards analyzing vast historical corpora by employing innovative machine learning (ML) techniques, enabling in-depth historical insights on a grand scale. Our study centers on the evolution of knowledge within the `Sacrobosco Collection' -- a digitized collection of 359 early modern printed editions of textbooks on astronomy used at European universities between 1472 and 1650 -- roughly 76,000 pages, many of which contain astronomic, computational tables. An ML based analysis of these tables helps to unveil important facets of the spatio-temporal evolution of knowledge and innovation in the field of mathematical astronomy in the period, as taught at European universities.
We consider uncertainty quantification for the Poisson problem subject to domain uncertainty. For the stochastic parameterization of the random domain, we use the model recently introduced by Kaarnioja, Kuo, and Sloan (SIAM J. Numer. Anal., 2020) in which a countably infinite number of independent random variables enter the random field as periodic functions. We develop lattice quasi-Monte Carlo (QMC) cubature rules for computing the expected value of the solution to the Poisson problem subject to domain uncertainty. These QMC rules can be shown to exhibit higher order cubature convergence rates permitted by the periodic setting independently of the stochastic dimension of the problem. In addition, we present a complete error analysis for the problem by taking into account the approximation errors incurred by truncating the input random field to a finite number of terms and discretizing the spatial domain using finite elements. The paper concludes with numerical experiments demonstrating the theoretical error estimates.
Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.
Data on neighbourhood characteristics are not typically collected in epidemiological studies. They are however useful in the study of small-area health inequalities. Neighbourhood characteristics are collected in some surveys and could be linked to the data of other studies. We propose to use kriging based on semi-variogram models to predict values at non-observed locations with the aim of constructing bespoke indices of neighbourhood characteristics to be linked to data from epidemiological studies. We perform a simulation study to assess the feasibility of the method as well as a case study using data from the RECORD study. Apart from having enough observed data at small distances to the non-observed locations, a good fitting semi-variogram, a larger range and the absence of nugget effects for the semi-variogram models are factors leading to a higher reliability.
Recurrent neural networks (RNNs) have yielded promising results for both recognizing objects in challenging conditions and modeling aspects of primate vision. However, the representational dynamics of recurrent computations remain poorly understood, especially in large-scale visual models. Here, we studied such dynamics in RNNs trained for object classification on MiniEcoset, a novel subset of ecoset. We report two main insights. First, upon inference, representations continued to evolve after correct classification, suggesting a lack of the notion of being ``done with classification''. Second, focusing on ``readout zones'' as a way to characterize the activation trajectories, we observe that misclassified representations exhibit activation patterns with lower L2 norm, and are positioned more peripherally in the readout zones. Such arrangements help the misclassified representations move into the correct zones as time progresses. Our findings generalize to networks with lateral and top-down connections, and include both additive and multiplicative interactions with the bottom-up sweep. The results therefore contribute to a general understanding of RNN dynamics in naturalistic tasks. We hope that the analysis framework will aid future investigations of other types of RNNs, including understanding of representational dynamics in primate vision.
Projected distributions have proved to be useful in the study of circular and directional data. Although any multivariate distribution can be used to produce a projected model, these distributions are typically parametric. In this article we consider a multivariate P\'olya tree on $R^k$ and project it to the unit hypersphere $S^k$ to define a new Bayesian nonparametric model for directional data. We study the properties of the proposed model and in particular, concentrate on the implied conditional distributions of some directions given the others to define a directional-directional regression model. We also define a multivariate linear regression model with P\'olya tree error and project it to define a linear-directional regression model. We obtain the posterior characterisation of all models and show their performance with simulated and real datasets.
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables efficient data exchange and discuss the impact of different architecture setups on the performance. The corresponding compiler algorithms are optimized for high speedup and low memory consumption during CNN inference. We achieve more than 99% of the theoretical acceleration limit with a marginal data transmission overhead of less than 4% for state-of-the-art CNN benchmarks.
DBSCAN is a fundamental density-based clustering technique that identifies any arbitrary shape of the clusters. However, it becomes infeasible while handling big data. On the other hand, centroid-based clustering is important for detecting patterns in a dataset since unprocessed data points can be labeled to their nearest centroid. However, it can not detect non-spherical clusters. For a large data, it is not feasible to store and compute labels of every samples. These can be done as and when the information is required. The purpose can be accomplished when clustering act as a tool to identify cluster representatives and query is served by assigning cluster labels of nearest representative. In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data. Additionally, it chooses a set of representatives for each cluster.
The decoding performance of conventional belief propagation decoders is seriously confined by the existence of message dependence in the code structure for short or moderate LDPC codes. In spite of the similarity of the external performance, we found the corresponding decoding failures of varied decoders, symbolized by the cross-entropy metric, will leave differed room for improvement for the postprocessing of ordered statistical decoding. Bearing in mind the postprocessor of higher order ensures better performance and incurs more expensive complexity, we propose a dynamic assignment of searching scope with respect to each decoding pattern for the order statistical decoding. Furthermore, the segmentation of decoding patterns, determined on the fly by the number of swaps in reducing the code check matrix into its systematic form via Gaussian elimination operation. will also benefit reducing complexity. Compared with the existing methods, our adapted strategy is justified by saving most memory consumption and inefficient searching of code-word candidates in extensive simulation especially for longer codes, at the cost of marginal performance loss.
Estimating model parameters of a general family of cure models is always a challenging task mainly due to flatness and multimodality of the likelihood function. In this work, we propose a fully Bayesian approach in order to overcome these issues. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution. It is demonstrated via simulations that the proposed algorithm freely explores the multimodal posterior distribution and produces robust point estimates, while it outperforms maximum likelihood estimation via the Expectation-Maximization algorithm. A by-product of our Bayesian implementation is to control the False Discovery Rate when classifying items as cured or not. Finally, the proposed method is illustrated in a real dataset which refers to recidivism for offenders released from prison; the event of interest is whether the offender was re-incarcerated after probation or not.