Remote sensing and artificial intelligence are pivotal technologies of precision agriculture nowadays. The efficient retrieval of large-scale field imagery combined with machine learning techniques shows success in various tasks like phenotyping, weeding, cropping, and disease control. This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation for the use case disease severity scoring for Cercospora Leaf Spot (CLS) in sugar beet. With concepts of Deep Label Distribution Learning (DLDL), special loss functions, and a tailored model architecture, we develop an efficient Vision Transformer based model for disease severity scoring called SugarViT. One novelty in this work is the combination of remote sensing data with environmental parameters of the experimental sites for disease severity prediction. Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks. With our framework, it is even possible to learn models on multi-objective problems as we show by a pretraining on environmental metadata.
Multiphysics incompressible fluid dynamics simulations play a crucial role in understanding intricate behaviors of many complex engineering systems that involve interactions between solids, fluids, and various phases like liquid and gas. Numerical modeling of these interactions has generated significant research interest in recent decades and has led to the development of open source simulation tools and commercial software products targeting specific applications or general problem classes in computational fluid dynamics. As the demand increases for these simulations to adapt to platform heterogeneity, ensure composability between different physics models, and effectively utilize inheritance within partial differentiation systems, a fundamental reconsideration of numerical solver design becomes imperative. The discussion presented in this paper emphasizes the importance of these considerations and introduces the Flash-X approach as a potential solution. The software design strategies outlined in the article serve as a guide for Flash-X developers, providing insights into complexities associated with performance portability, composability, and sustainable development. These strategies provide a foundation for improving design of both new and existing simulation tools grappling with these challenges. By incorporating the principles outlined in the Flash-X approach, engineers and researchers can enhance the adaptability, efficiency, and overall effectiveness of their numerical solvers in the ever-evolving field of multiphysics simulations.
We consider optimal experimental design (OED) for nonlinear Bayesian inverse problems governed by large-scale partial differential equations (PDEs). For the optimality criteria of Bayesian OED, we consider both expected information gain and summary statistics including the trace and determinant of the information matrix that involves the evaluation of the parameter-to-observable (PtO) map and its derivatives. However, it is prohibitive to compute and optimize these criteria when the PDEs are very expensive to solve, the parameters to estimate are high-dimensional, and the optimization problem is combinatorial, high-dimensional, and non-convex. To address these challenges, we develop an accurate, scalable, and efficient computational framework to accelerate the solution of Bayesian OED. In particular, the framework is developed based on derivative-informed neural operator (DINO) surrogates with proper dimension reduction techniques and a modified swapping greedy algorithm. We demonstrate the high accuracy of the DINO surrogates in the computation of the PtO map and the optimality criteria compared to high-fidelity finite element approximations. We also show that the proposed method is scalable with increasing parameter dimensions. Moreover, we demonstrate that it achieves high efficiency with over 1000X speedup compared to a high-fidelity Bayesian OED solution for a three-dimensional PDE example with tens of thousands of parameters, including both online evaluation and offline construction costs of the surrogates.
We explore some connections between association schemes and the analyses of the semidefinite programming (SDP) based convex relaxations of combinatorial optimization problems in the Lov\'{a}sz--Schrijver lift-and-project hierarchy. Our analysis of the relaxations of the stable set polytope leads to bounds on the clique and stability numbers of some regular graphs reminiscent of classical bounds by Delsarte and Hoffman, as well as the notion of deeply vertex-transitive graphs -- highly symmetric graphs that we show arise naturally from some association schemes. We also study relaxations of the hypergraph matching problem, and determine exactly or provide bounds on the lift-and-project ranks of these relaxations. Our proofs for these results also inspire the study of the general hypermatching pseudo-scheme, which is an association scheme except it is generally non-commutative. We then illustrate the usefulness of obtaining commutative subschemes from non-commutative pseudo-schemes via contraction in this context.
Understanding object recognition patterns in mice is crucial for advancing behavioral neuroscience and has significant implications for human health, particularly in the realm of Alzheimer's research. This study is centered on the development, application, and evaluation of a state-of-the-art computational pipeline designed to analyze such behaviors, specifically focusing on Novel Object Recognition (NOR) and Spontaneous Location Recognition (SLR) tasks. The pipeline integrates three advanced computational models: Any-Maze for initial data collection, DeepLabCut for detailed pose estimation, and Convolutional Neural Networks (CNNs) for nuanced behavioral classification. Employed across four distinct mouse groups, this pipeline demonstrated high levels of accuracy and robustness. Despite certain challenges like video quality limitations and the need for manual calculations, the results affirm the pipeline's efficacy and potential for scalability. The study serves as a proof of concept for a multidimensional computational approach to behavioral neuroscience, emphasizing the pipeline's versatility and readiness for future, more complex analyses.
Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by using extra input features, loss functions, and/or expensive training procedures such as knowledge distillation. In this paper, we aim to leverage SR for efficiency gains without costly training or architectural changes. Specifically, we build a simple NeRF+SR pipeline that directly combines existing modules, and we propose a lightweight augmentation technique, random patch sampling, for training. Compared to existing NeRF+SR methods, our pipeline mitigates the SR computing overhead and can be trained up to 23x faster, making it feasible to run on consumer devices such as the Apple MacBook. Experiments show our pipeline can upscale NeRF outputs by 2-4x while maintaining high quality, increasing inference speeds by up to 18x on an NVIDIA V100 GPU and 12.8x on an M1 Pro chip. We conclude that SR can be a simple but effective technique for improving the efficiency of NeRF models for consumer devices.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $94k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we further introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially at no additional memory cost, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data will be released.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.