亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Trajectory generation and trajectory prediction are two critical tasks in autonomous driving, which generate various trajectories for testing during development and predict the trajectories of surrounding vehicles during operation, respectively. In recent years, emerging data-driven deep learning-based methods have shown great promise for these two tasks in learning various traffic scenarios and improving average performance without assuming physical models. However, it remains a challenging problem for these methods to ensure that the generated/predicted trajectories are physically realistic. This challenge arises because learning-based approaches often function as opaque black boxes and do not adhere to physical laws. Conversely, existing model-based methods provide physically feasible results but are constrained by predefined model structures, limiting their capabilities to address complex scenarios. To address the limitations of these two types of approaches, we propose a new method that integrates kinematic knowledge into neural stochastic differential equations (SDE) and designs a variational autoencoder based on this latent kinematics-aware SDE (LK-SDE) to generate vehicle motions. Experimental results demonstrate that our method significantly outperforms both model-based and learning-based baselines in producing physically realistic and precisely controllable vehicle trajectories. Additionally, it performs well in predicting unobservable physical variables in the latent space.

相關內容

In finance, durations between successive transactions are usually modeled by the autoregressive conditional duration model based on a continuous distribution omitting zero values. Zero or close-to-zero durations can be caused by either split transactions or independent transactions. We propose a discrete model allowing for excessive zero values based on the zero-inflated negative binomial distribution with score dynamics. This model allows to distinguish between the processes generating split and standard transactions. We use the existing theory on score models to establish the invertibility of the score filter and verify that sufficient conditions hold for the consistency and asymptotic normality of the maximum likelihood of the model parameters. In an empirical study, we find that split transactions cause between 92 and 98 percent of zero and close-to-zero values. Furthermore, the loss of decimal places in the proposed approach is less severe than the incorrect treatment of zero values in continuous models.

Disentangled representation learning aims to learn low-dimensional representations of data, where each dimension corresponds to an underlying generative factor. Currently, Variational Auto-Encoder (VAE) are widely used for disentangled representation learning, with the majority of methods assuming independence among generative factors. However, in real-world scenarios, generative factors typically exhibit complex causal relationships. We thus design a new VAE-based framework named Disentangled Causal Variational Auto-Encoder (DCVAE), which includes a variant of autoregressive flows known as causal flows, capable of learning effective causal disentangled representations. We provide a theoretical analysis of the disentanglement identifiability of DCVAE, ensuring that our model can effectively learn causal disentangled representations. The performance of DCVAE is evaluated on both synthetic and real-world datasets, demonstrating its outstanding capability in achieving causal disentanglement and performing intervention experiments. Moreover, DCVAE exhibits remarkable performance on downstream tasks and has the potential to learn the true causal structure among factors.

When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d^2}{n^2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n^2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies.

Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world.

We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-type methods that integrate multiple agents (experts) for exploration in bandits with the assumption that rewards are bounded, we propose new algorithms, namely EXP4.P and EXP4-RL for exploration in the unbounded reward case, and demonstrate their effectiveness in these new settings. Unbounded rewards introduce challenges as the regret cannot be limited by the number of trials, and selecting suboptimal arms may lead to infinite regret. Specifically, we establish EXP4.P's regret upper bounds in both bounded and unbounded linear and stochastic contextual bandits. Surprisingly, we also find that by including one sufficiently competent expert, EXP4.P can achieve global optimality in the linear case. This unbounded reward result is also applicable to a revised version of EXP3.P in the Multi-armed Bandit scenario. In EXP4-RL, we extend EXP4.P from bandit scenarios to reinforcement learning to incentivize exploration by multiple agents, including one high-performing agent, for both efficiency and excellence. This algorithm has been tested on difficult-to-explore games and shows significant improvements in exploration compared to state-of-the-art.

Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.

Conventional entity typing approaches are based on independent classification paradigms, which make them difficult to recognize inter-dependent, long-tailed and fine-grained entity types. In this paper, we argue that the implicitly entailed extrinsic and intrinsic dependencies between labels can provide critical knowledge to tackle the above challenges. To this end, we propose \emph{Label Reasoning Network(LRN)}, which sequentially reasons fine-grained entity labels by discovering and exploiting label dependencies knowledge entailed in the data. Specifically, LRN utilizes an auto-regressive network to conduct deductive reasoning and a bipartite attribute graph to conduct inductive reasoning between labels, which can effectively model, learn and reason complex label dependencies in a sequence-to-set, end-to-end manner. Experiments show that LRN achieves the state-of-the-art performance on standard ultra fine-grained entity typing benchmarks, and can also resolve the long tail label problem effectively.

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin.

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.

北京阿比特科技有限公司