In this paper, we introduce an innovative approach for extracting trajectories from a camera sensor in GPS-denied environments, leveraging visual odometry. The system takes video footage captured by a forward-facing camera mounted on a vehicle as input, with the output being a chain code representing the camera's trajectory. The proposed methodology involves several key steps. Firstly, we employ phase correlation between consecutive frames of the video to extract essential information. Subsequently, we introduce a novel chain code method termed "dynamic chain code," which is based on the x-shift values derived from the phase correlation. The third step involves determining directional changes (forward, left, right) by establishing thresholds and extracting the corresponding chain code. This extracted code is then stored in a buffer for further processing. Notably, our system outperforms traditional methods reliant on spatial features, exhibiting greater speed and robustness in noisy environments. Importantly, our approach operates without external camera calibration information. Moreover, by incorporating visual odometry, our system enhances its accuracy in estimating camera motion, providing a more comprehensive understanding of trajectory dynamics. Finally, the system culminates in the visualization of the normalized camera motion trajectory.
In this paper, we introduce a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising arbitrary instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. This enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer for object detection. We conduct thorough experiments to show that, object detector can be enhanced while training on the synthetic dataset from InstaGen, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios.
In this paper, we completely solve the reversibility of one-dimensional finite cellular automata (FCA). This means that we will have an efficient method to determine the reversibility of any FCA with all numbers (n) of cells. The complexity of this algorithm is independent of n. We perform calculations on two new kinds of graphs and discover that the reversibility of any FCA exhibits periodicity as n increases. We successfully provide a method to compute the reversibility sequence that encompasses the reversibility of FCA with any number of cells. Additionally, the calculations in this paper are applicable to FCA with various types of boundaries.
In this paper, we propose a web search retrieval approach which automatically detects recency sensitive queries and increases the freshness of the ordinary document ranking by a degree proportional to the probability of the need in recent content. We propose to solve the recency ranking problem by using result diversification principles and deal with the query's non-topical ambiguity appearing when the need in recent content can be detected only with uncertainty. Our offline and online experiments with millions of queries from real search engine users demonstrate the significant increase in satisfaction of users presented with a search result generated by our approach.
In this paper, we introduce two novel methods to design outer polar codes for two previously proposed concatenated polar code architectures: augmented polar codes and local-global polar codes. These methods include a stopping set (SS) construction and a nonstationary density evolution (NDE) construction. Simulation results demonstrate the advantage of these methods over previously proposed constructions based on density evolution (DE) and LLR evolution.
In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.
In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.
In this paper, we extend our prior research named DKIC and propose the perceptual-oriented learned image compression method, PO-DKIC. Specifically, DKIC adopts a dynamic kernel-based dynamic residual block group to enhance the transform coding and an asymmetric space-channel context entropy model to facilitate the estimation of gaussian parameters. Based on DKIC, PO-DKIC introduces PatchGAN and LPIPS loss to enhance visual quality. Furthermore, to maximize the overall perceptual quality under a rate constraint, we formulate this challenge into a constrained programming problem and use the Linear Integer Programming method for resolution. The experiments demonstrate that our proposed method can generate realistic images with richer textures and finer details when compared to state-of-the-art image compression techniques.
Data-driven predictions are often perceived as inaccurate in hindsight due to behavioral responses. In this study, we explore the role of interface design choices in shaping individuals' decision-making processes in response to predictions presented on a shared information display in a strategic setting. We introduce a novel staged experimental design to investigate the effects of design features, such as visualizations of prediction uncertainty and error, within a repeated congestion game. In this game, participants assume the role of taxi drivers and use a shared information display to decide where to search for their next ride. Our experimental design endows agents with varying level-$k$ depths of thinking, allowing some agents to possess greater sophistication in anticipating the decisions of others using the same information display. Through several extensive experiments, we identify trade-offs between displays that optimize individual decisions and those that best serve the collective social welfare of the system. We find that the influence of display characteristics varies based on an agent's strategic sophistication. We observe that design choices promoting individual-level decision-making can lead to suboptimal system outcomes, as manifested by a lower realization of potential social welfare. However, this decline in social welfare is offset by a reduction in the distribution shift, narrowing the gap between predicted and realized system outcomes, which potentially enhances the perceived reliability and trustworthiness of the information display post hoc. Our findings pave the way for new research questions concerning the design of effective prediction interfaces in strategic environments.
In this paper, we consider microgrids that interconnect prosumers with distributed energy resources and dynamic loads. Prosumers are connected through the microgrid to trade energy and gain profit while respecting the network constraints. We establish a local energy market by defining a competitive equilibrium which balances energy and satisfies voltage constraints within the microgrid for all time. Using duality theory, we prove that under some convexity assumptions, a competitive equilibrium is equivalent to a social welfare maximization solution. Additionally, we show that a competitive equilibrium is equivalent to a Nash equilibrium of a standard game. In general, the energy price for each prosumer is different, leading to the concept of locational prices. We investigate a case under which all prosumers have the same locational prices. Additionally, we show that under some assumptions on the resource supply and network topology, locational prices decay to zero after a period of time, implying the available supply will be more than the demand required to stabilize the system. Finally, two numerical examples are provided to validate the results, one of which is a direct application of our results on electric vehicle charging control.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.