We present DyFOS, an active perception method that dynamically finds optimal states to minimize localization uncertainty while avoiding obstacles and occlusions. We consider the scenario where a perception-denied rover relies on position and uncertainty measurements from a viewer robot to localize itself along an obstacle-filled path. The position uncertainty from the viewer's sensor is a function of the states of the sensor itself, the rover, and the surrounding environment. To find an optimal sensor state that minimizes the rover's localization uncertainty, DyFOS uses a localization uncertainty prediction pipeline in an optimization search. Given numerous samples of the states mentioned above, the pipeline predicts the rover's localization uncertainty with the help of a trained, complex state-dependent sensor measurement model (a probabilistic neural network). Our pipeline also predicts occlusion and obstacle collision to remove undesirable viewer states and reduce unnecessary computations. We evaluate the proposed method numerically and in simulation. Our results show that DyFOS is faster than brute force yet performs on par. DyFOS also yielded lower localization uncertainties than faster random and heuristic-based searches.
Instrumental variable (IV) strategies are widely used in political science to establish causal relationships. However, the identifying assumptions required by an IV design are demanding, and it remains challenging for researchers to assess their validity. In this paper, we replicate 67 papers published in three top journals in political science during 2010-2022 and identify several troubling patterns. First, researchers often overestimate the strength of their IVs due to non-i.i.d. errors, such as a clustering structure. Second, the most commonly used t-test for the two-stage-least-squares (2SLS) estimates often severely underestimates uncertainty. Using more robust inferential methods, we find that around 19-30% of the 2SLS estimates in our sample are underpowered. Third, in the majority of the replicated studies, the 2SLS estimates are much larger than the ordinary-least-squares estimates, and their ratio is negatively correlated with the strength of the IVs in studies where the IVs are not experimentally generated, suggesting potential violations of unconfoundedness or the exclusion restriction. To help researchers avoid these pitfalls, we provide a checklist for better practice.
Recently, there has been a growing focus on determining the minimum width requirements for achieving the universal approximation property in deep, narrow Multi-Layer Perceptrons (MLPs). Among these challenges, one particularly challenging task is approximating a continuous function under the uniform norm, as indicated by the significant disparity between its lower and upper bounds. To address this problem, we propose a framework that simplifies finding the minimum width for deep, narrow MLPs into determining a purely geometrical function denoted as $w(d_x, d_y)$. This function relies solely on the input and output dimensions, represented as $d_x$ and $d_y$, respectively. Two key steps support this framework. First, we demonstrate that deep, narrow MLPs, when provided with a small additional width, can approximate a $C^2$-diffeomorphism. Subsequently, using this result, we prove that $w(d_x, d_y)$ equates to the optimal minimum width required for deep, narrow MLPs to achieve universality. By employing the aforementioned framework and the Whitney embedding theorem, we provide an upper bound for the minimum width, given by $\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$, where $0 \leq \alpha(\sigma) \leq 2$ represents a constant depending on the activation function. Furthermore, we provide a lower bound of $4$ for the minimum width in cases where the input and output dimensions are both equal to two.
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map. Using our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization. To demonstrate the ability of our framework to accommodate general parameterization schemes, we provide its sample complexity when using shallow neural networks, show that it represents an improvement upon the previous best results, and empirically validate the effectiveness of our theoretical claims on classic control tasks.
Cooperative utilization of Unmanned Aerial Vehicles (UAVs) in public and military surveillance applications has attracted significant attention in recent years. Most UAVs are equipped with sensors that have bounded coverage and wireless communication equipment with limited range. Such limitations pose challenging problems to monitor mobile targets. This paper examines fulfilling surveillance objectives to achieve better coverage while building a resilient network between UAVs with an extended lifetime. The multiple target tracking problem is studied by including a relay UAV within the fleet whose trajectory is autonomously calculated in order to achieve a reliable connected network among all UAVs. Optimization problems are formulated for single-hop and multi-hop communications among UAVs. Three heuristic algorithms are proposed for multi-hop communications and their performances are evaluated. A hybrid algorithm, which dynamically switches between single-hop and multi-hop communications is also proposed. The effect of the time horizon considered in the optimization problem is studied. Performance evaluation results show that the trajectories generated for the relay UAV by the hybrid algorithm can achieve network lifetimes that are within 5% of the maximum possible network lifetime which can be obtained if the entire trajectories of all targets were known a priori.
One of the most famous conjectures in combinatorial optimization is the four-thirds conjecture, which states that the integrality gap of the subtour LP relaxation of the TSP is equal to $\frac43$. For 40 years, the best known upper bound was 1.5, due to Wolsey (1980). Recently, Karlin, Klein, and Oveis Gharan (2022) showed that the max entropy algorithm for the TSP gives an improved bound of $1.5 - 10^{-36}$. In this paper, we show that the approximation ratio of the max entropy algorithm is at least 1.375, even for graphic TSP. Thus the max entropy algorithm does not appear to be the algorithm that will ultimately resolve the four-thirds conjecture in the affirmative, should that be possible.
We proposed an extension of Akaike's relative power contribution that could be applied to data with correlations between noises. This method decomposes the power spectrum into a contribution of the terms caused by correlation between two noises, in addition to the contributions of the independent noises. Numerical examples confirm that some of the correlated noise has the effect of reducing the power spectrum.
Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters. The solution can be obtained either as a closed-form solution, which includes a matrix inversion, or iteratively through gradient descent. Using the iterative approach opens up for changing the kernel during training, something that is investigated in this paper. We theoretically address the effects this has on model complexity and generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth but are known to appear for neural networks.
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Graph Neural Networks (GNNs) have recently become increasingly popular due to their ability to learn complex systems of relations or interactions arising in a broad spectrum of problems ranging from biology and particle physics to social networks and recommendation systems. Despite the plethora of different models for deep learning on graphs, few approaches have been proposed thus far for dealing with graphs that present some sort of dynamic nature (e.g. evolving features or connectivity over time). In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed events. Thanks to a novel combination of memory modules and graph-based operators, TGNs are able to significantly outperform previous approaches being at the same time more computationally efficient. We furthermore show that several previous models for learning on dynamic graphs can be cast as specific instances of our framework. We perform a detailed ablation study of different components of our framework and devise the best configuration that achieves state-of-the-art performance on several transductive and inductive prediction tasks for dynamic graphs.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.