Curb detection is essential for environmental awareness in Automated Driving (AD), as it typically limits drivable and non-drivable areas. Annotated data are necessary for developing and validating an AD function. However, the number of public datasets with annotated point cloud curbs is scarce. This paper presents a method for detecting 3D curbs in a sequence of point clouds captured from a LiDAR sensor, which consists of two main steps. First, our approach detects the curbs at each scan using a segmentation deep neural network. Then, a sequence-level processing step estimates the 3D curbs in the reconstructed point cloud using the odometry of the vehicle. From these 3D points of the curb, we obtain polylines structured following ASAM OpenLABEL standard. These detections can be used as pre-annotations in labelling pipelines to efficiently generate curb-related ground truth data. We validate our approach through an experiment in which different human annotators were required to annotate curbs in a group of LiDAR-based sequences with and without our automatically generated pre-annotations. The results show that the manual annotation time is reduced by 50.99% thanks to our detections, keeping the data quality level.
The inability to naturally enforce safety in Reinforcement Learning (RL), with limited failures, is a core challenge impeding its use in real-world applications. One notion of safety of vast practical relevance is the ability to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
Cluster randomization trials commonly employ multiple endpoints. When a single summary of treatment effects across endpoints is of primary interest, global hypothesis testing/effect estimation methods represent a common analysis strategy. However, specification of the joint distribution required by these methods is non-trivial, particularly when endpoint properties differ. We develop rank-based interval estimators for a global treatment effect referred to as the "global win probability," or the probability that a treatment individual responds better than a control individual on average. Using endpoint-specific ranks among the combined sample and within each arm, each individual-level observation is converted to a "win fraction" which quantifies the proportion of wins experienced over every observation in the comparison arm. An individual's multiple observations are then replaced by a single "global win fraction," constructed by averaging win fractions across endpoints. A linear mixed model is applied directly to the global win fractions to recover point, variance, and interval estimates of the global win probability adjusted for clustering. Simulation demonstrates our approach performs well concerning coverage and type I error, and methods are easily implemented using standard software. A case study using publicly available data is provided with corresponding R and SAS code.
Optimization under uncertainty is important in many applications, particularly to inform policy and decision making in areas such as public health. A key source of uncertainty arises from the incorporation of environmental variables as inputs into computational models or simulators. Such variables represent uncontrollable features of the optimization problem and reliable decision making must account for the uncertainty they propagate to the simulator outputs. Often, multiple, competing objectives are defined from these outputs such that the final optimal decision is a compromise between different goals. Here, we present emulation-based optimization methodology for such problems that extends expected quantile improvement (EQI) to address multi-objective optimization. Focusing on the practically important case of two objectives, we use a sequential design strategy to identify the Pareto front of optimal solutions. Uncertainty from the environmental variables is integrated out using Monte Carlo samples from the simulator. Interrogation of the expected output from the simulator is facilitated by use of (Gaussian process) emulators. The methodology is demonstrated on an optimization problem from public health involving the dispersion of anthrax spores across a spatial terrain. Environmental variables include meteorological features that impact the dispersion, and the methodology identifies the Pareto front even when there is considerable input uncertainty.
In this work some advances in the theory of curvature of two-dimensional probability manifolds corresponding to families of distributions are proposed. It is proved that location-scale distributions are hyperbolic in the Information Geometry sense even when the generatrix is non-even or non-smooth. A novel formula is obtained for the computation of curvature in the case of exponential families: this formula implies some new flatness criteria in dimension 2. Finally, it is observed that many two parameter distributions, widely used in applications, are locally hyperbolic, which highlights the role of hyperbolic geometry in the study of commonly employed probability manifolds. These results have benefited from the use of explainable computational tools, which can substantially boost scientific productivity.
The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.
We show that the known list-decoding algorithms for univariate multiplicity and folded Reed-Solomon (FRS) codes can be made to run in nearly-linear time. This yields, to our knowledge, the first known family of codes that can be decoded in nearly linear time, even as they approach the list decoding capacity. Univariate multiplicity codes and FRS codes are natural variants of Reed-Solomon codes that were discovered and studied for their applications to list-decoding. It is known that for every $\epsilon >0$, and rate $R \in (0,1)$, there exist explicit families of these codes that have rate $R$ and can be list-decoded from a $(1-R-\epsilon)$ fraction of errors with constant list size in polynomial time (Guruswami & Wang (IEEE Trans. Inform. Theory, 2013) and Kopparty, Ron-Zewi, Saraf & Wootters (SIAM J. Comput. 2023)). In this work, we present randomized algorithms that perform the above tasks in nearly linear time. Our algorithms have two main components. The first builds upon the lattice-based approach of Alekhnovich (IEEE Trans. Inf. Theory 2005), who designed a nearly linear time list-decoding algorithm for Reed-Solomon codes approaching the Johnson radius. As part of the second component, we design nearly-linear time algorithms for two natural algebraic problems. The first algorithm solves linear differential equations of the form $Q\left(x, f(x), \frac{df}{dx}, \dots,\frac{d^m f}{dx^m}\right) \equiv 0$ where $Q$ has the form $Q(x,y_0,\dots,y_m) = \tilde{Q}(x) + \sum_{i = 0}^m Q_i(x)\cdot y_i$. The second solves functional equations of the form $Q\left(x, f(x), f(\gamma x), \dots,f(\gamma^m x)\right) \equiv 0$ where $\gamma$ is a high-order field element. These algorithms can be viewed as generalizations of classical algorithms of Sieveking (Computing 1972) and Kung (Numer. Math. 1974) for computing the modular inverse of a power series, and might be of independent interest.
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
We study propositional proof systems with inference rules that formalize restricted versions of the ability to make assumptions that hold without loss of generality, commonly used informally to shorten proofs. Each system we study is built on resolution. They are called BC${}^-$, RAT${}^-$, SBC${}^-$, and GER${}^-$, denoting respectively blocked clauses, resolution asymmetric tautologies, set-blocked clauses, and generalized extended resolution - all "without new variables." They may be viewed as weak versions of extended resolution (ER) since they are defined by first generalizing the extension rule and then taking away the ability to introduce new variables. Except for SBC${}^-$, they are known to be strictly between resolution and extended resolution. Several separations between these systems were proved earlier by exploiting the fact that they effectively simulate ER. We answer the questions left open: We prove exponential lower bounds for SBC${}^-$ proofs of a binary encoding of the pigeonhole principle, which separates ER from SBC${}^-$. Using this new separation, we prove that both RAT${}^-$ and GER${}^-$ are exponentially separated from SBC${}^-$. This completes the picture of their relative strengths.
Comparisons of frequency distributions often invoke the concept of shift to describe directional changes in properties such as the mean. In the present study, we sought to define shift as a property in and of itself. Specifically, we define distributional shift (DS) as the concentration of frequencies away from the discrete class having the greatest value (e.g., the right-most bin of a histogram). We derive a measure of DS using the normalized sum of exponentiated cumulative frequencies. We then define relative distributional shift (RDS) as the difference in DS between two distributions, revealing the magnitude and direction by which one distribution is concentrated to lesser or greater discrete classes relative to another. We find that RDS is highly related to popular measures that, while based on the comparison of frequency distributions, do not explicitly consider shift. While RDS provides a useful complement to other comparative measures, DS allows shift to be quantified as a property of individual distributions, similar in concept to a statistical moment.
Dynamical systems across the sciences, from electrical circuits to ecological networks, undergo qualitative and often catastrophic changes in behavior, called bifurcations, when their underlying parameters cross a threshold. Existing methods predict oncoming catastrophes in individual systems but are primarily time-series-based and struggle both to categorize qualitative dynamical regimes across diverse systems and to generalize to real data. To address this challenge, we propose a data-driven, physically-informed deep-learning framework for classifying dynamical regimes and characterizing bifurcation boundaries based on the extraction of topologically invariant features. We focus on the paradigmatic case of the supercritical Hopf bifurcation, which is used to model periodic dynamics across a wide range of applications. Our convolutional attention method is trained with data augmentations that encourage the learning of topological invariants which can be used to detect bifurcation boundaries in unseen systems and to design models of biological systems like oscillatory gene regulatory networks. We further demonstrate our method's use in analyzing real data by recovering distinct proliferation and differentiation dynamics along pancreatic endocrinogenesis trajectory in gene expression space based on single-cell data. Our method provides valuable insights into the qualitative, long-term behavior of a wide range of dynamical systems, and can detect bifurcations or catastrophic transitions in large-scale physical and biological systems.