Interactive notebooks, such as Jupyter, have revolutionized the field of data science by providing an integrated environment for data, code, and documentation. However, their adoption by robotics researchers and model developers has been limited. This study investigates the logging and record-keeping practices of robotics researchers, drawing parallels to the pre-interactive notebook era of data science. Through interviews with robotics researchers, we identified the reliance on diverse and often incompatible tools for managing experimental data, leading to challenges in reproducibility and data traceability. Our findings reveal that robotics researchers can benefit from a specialized version of interactive notebooks that supports comprehensive data entry, continuous context capture, and agile data staging. We propose extending interactive notebooks to better serve the needs of robotics researchers by integrating features akin to traditional lab notebooks. This adaptation aims to enhance the organization, analysis, and reproducibility of experimental data in robotics, fostering a more streamlined and efficient research workflow.
Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can demonstrate the benefit of Adagrad over SGD was obtained in the original paper of Adagrad for nonsmooth objective functions. However, for nonsmooth objective functions, there can be a linear slowdown of convergence when batch size increases, and thus a convergence analysis based on nonsmooth assumption cannot be used for large batch algorithms. In this work, we resolve this gap between theory and practice by providing a new analysis of Adagrad on both convex and nonconvex smooth objectives suitable for the large batch setting. It is shown that under the anisotropic smoothness and noise conditions, increased batch size does not slow down convergence for Adagrad, and thus it can still achieve a faster convergence guarantee over SGD even in the large batch setting. We present detailed comparisons between SGD and Adagrad to provide a better understanding of the benefits of adaptive gradient methods. Experiments in logistic regression and instruction following fine-tuning tasks provide strong evidence to support our theoretical analysis.
This study examines the problem of determining whether to treat individuals based on observed covariates. The most common decision rule is the conditional empirical success (CES) rule proposed by Manski (2004), which assigns individuals to treatments that yield the best experimental outcomes conditional on the observed covariates. Conversely, using shrinkage estimators, which shrink unbiased but noisy preliminary estimates toward the average of these estimates, is a common approach in statistical estimation problems because it is well-known that shrinkage estimators have smaller mean squared errors than unshrunk estimators. Inspired by this idea, we propose a computationally tractable shrinkage rule that selects the shrinkage factor by minimizing the upper bound of the maximum regret. Then, we compare the maximum regret of the proposed shrinkage rule with that of CES and pooling rules when the parameter space is correctly specified or misspecified. Our theoretical results demonstrate that the shrinkage rule performs well in many cases and these findings are further supported by numerical experiments. Specifically, we show that the maximum regret of the shrinkage rule can be strictly smaller than that of the CES and pooling rules in certain cases when the parameter space is correctly specified. In addition, we find that the shrinkage rule is robust against misspecifications of the parameter space. Finally, we apply our method to experimental data from the National Job Training Partnership Act Study.
Robotic assistance for experimental manipulation in the life sciences is expected to enable favorable outcomes, regardless of the skill of the scientist. Experimental specimens in the life sciences are subject to individual variability hence require intricate algorithms for successful autonomous robotic control. As a use case, we are studying the creation of cranial windows in mice. This operation requires the removal of an 8-mm-circular patch of the skull, which is approximately 300 um thick, but the shape and thickness of the mouse skull significantly varies depending on the strain of mouse, sex, and age. In this work, we propose an autonomous robotic drilling method with no offline planning, consisting of a trajectory planning block with execution-time feedback with completion level recognition based on image and force information. The force information allows for completion-level resolution to increase 10 fold. We evaluate the proposed method in two ways. First, in an eggshell drilling task and achieved a success rate of 95% and average drilling time of 7.1 min out of 20 trials. Second, in postmortem mice and with a success rate of 70% and average drilling time of 9.3 min out of 20 trials.
In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for distance-based clustering, little work has been done to explore axioms relevant to the graph partitioning problem when the graph is unweighted and given without a distance matrix. Here, we propose and explore axioms for graph partitioning for this case, including modifications of Kleinberg's axioms and three others: two axioms relevant to the ``Resolution Limit'' and one addressing well-connectedness. We prove that clustering under the Constant Potts Model satisfies all the axioms, while Modularity clustering and iterative k-core both fail many axioms we pose. These theoretical properties of the clustering methods are relevant both for theoretical investigation as well as to practitioners considering which methods to use for their domain science studies.
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task, optimizing both performance and computational efficiency. With MRP, LLM reasoning operates in two phases. Initially, the LLM identifies the most appropriate reasoning method using task input cues and objective descriptions of available methods. Subsequently, it applies the chosen method to complete the task. This dynamic strategy mirrors human meta-reasoning, allowing the model to excel in a wide range of problem domains. We evaluate the effectiveness of MRP through comprehensive benchmarks. The results demonstrate that MRP achieves or approaches state-of-the-art performance across diverse tasks. MRP represents a significant advancement in enabling LLMs to identify cognitive challenges across problems and leverage benefits across different reasoning approaches, enhancing their ability to handle diverse and complex problem domains efficiently. Every LLM deserves a Meta-Reasoning Prompting to unlock its full potential and ensure adaptability in an ever-evolving landscape of challenges and applications.
It was recently conjectured that every component of a discrete-time rational dynamical system is a solution to an algebraic difference equation that is linear in its highest-shift term (a quasi-linear equation). We prove that the conjecture holds in the special case of holonomic sequences, which can straightforwardly be represented by rational dynamical systems. We propose two algorithms for converting holonomic recurrence equations into such quasi-linear equations. The two algorithms differ in their efficiency and the minimality of orders in their outputs.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).