亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Lane marking detection is fundamental for both advanced driving assistance systems and traffic surveillance systems. However, detecting lane is highly challenging when the visibility of a road lane marking is low, obscured or often invisible due to real-life challenging environment and adverse weather. Most of the lane detection methods suffer from four types of challenges: (i) light effects i.e. shadow, glare of light, reflection etc. created by different light sources like streetlamp, tunnel-light, sun, wet road etc.; (ii) Obscured visibility of eroded, blurred, dashed, colored and cracked lane caused by natural disasters and adverse weather; (iii) lane marking occlusion by different objects from surroundings; and (iv) presence of confusing lines e.g., guardrails, pavement marking, road divider etc. In this paper, we proposed a simple, real-time, and robust lane detection and tracking method to detect and track lane marking. Here, we introduced three key technologies. First, we introduce a comprehensive intensity threshold range (CITR) to improve the performance of the canny operator in detecting lane edges of different intensity. Second, we propose a robust lane verification technique, the angle and length-based geometric constraint (ALGC) followed by Hough Transform, to verify the characteristics of lane marking and to prevent incorrect lane detection. Finally, we propose a novel lane tracking technique, to predict the lane position of next frame by defining a range of horizontal lane position which will be updating with respect to the lane position of previous frame. To evaluate the performance of the proposed method we used the DSDLDE [1] dataset with 1080x1920 resolutions at 24 frames/sec. Experimental results show that the average detection rate is 97.36%, and the average detection time is 29.06msec per frame, which outperformed the state-of-the-art method.

相關內容

Pose-based anomaly detection is a video-analysis technique for detecting anomalous events or behaviors by examining human pose extracted from the video frames. Utilizing pose data alleviates privacy and ethical issues. Also, computation-wise, the complexity of pose-based models is lower than pixel-based approaches. However, it introduces more challenges, such as noisy skeleton data, losing important pixel information, and not having enriched enough features. These problems are exacerbated by a lack of anomaly detection datasets that are good enough representatives of real-world scenarios. In this work, we analyze and quantify the characteristics of two well-known video anomaly datasets to better understand the difficulties of pose-based anomaly detection. We take a step forward, exploring the discriminating power of pose and trajectory for video anomaly detection and their effectiveness based on context. We believe these experiments are beneficial for a better comprehension of pose-based anomaly detection and the datasets currently available. This will aid researchers in tackling the task of anomaly detection with a more lucid perspective, accelerating the development of robust models with better performance.

Deformable Object Manipulation (DOM) is an important field of research as it contributes to practical tasks such as automatic cloth handling, cable routing, surgical operation, etc. Perception is considered one of the major challenges in DOM due to the complex dynamics and high degree of freedom of deformable objects. In this paper, we develop a novel image-processing algorithm based on Gabor filters to extract useful features from cloth, and based on this, devise a strategy for cloth flattening tasks. We evaluate the overall framework experimentally, and compare it with three human operators. The results show that our algorithm can determine the direction of wrinkles on the cloth accurately in the simulation as well as the real robot experiments. Besides, the robot executing the flattening tasks using the dewrinkling strategy given by our algorithm achieves satisfying performance compared to other baseline methods. The experiment video is available on //sites.google.com/view/robotic-fabric-flattening/home

Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate information and designing complex models. Although these methods achieved outstanding performance, their high consumption of memory and computational power make them difficult to be applied in resource-limited scenarios. Therefore, we construct a lightweight active speaker detection architecture by reducing input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent unit (GRU) with low computational complexity for cross-modal modeling. Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94.1% vs. 94.2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1.0M vs. 22.5M, about 23x) and FLOPs (0.6G vs. 2.6G, about 4x). In addition, our framework also performs well on the Columbia dataset showing good robustness. The code and model weights are available at //github.com/Junhua-Liao/Light-ASD.

In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 17 state-of-the-art tracking methods. We will release the dataset for free academic usage, dataset download link and code will be released soon.

Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS). Inspired by recent success of deep learning (DL), many advanced object detection and tracking approaches have been widely applied to various UAV-related tasks, such as environmental monitoring, precision agriculture, traffic management. This paper provides a comprehensive survey on the research progress and prospects of DL-based UAV object detection and tracking methods. More specifically, we first outline the challenges, statistics of existing methods, and provide solutions from the perspectives of DL-based models in three research topics: object detection from the image, object detection from the video, and object tracking from the video. Open datasets related to UAV-dominated object detection and tracking are exhausted, and four benchmark datasets are employed for performance evaluation using some state-of-the-art methods. Finally, prospects and considerations for the future work are discussed and summarized. It is expected that this survey can facilitate those researchers who come from remote sensing field with an overview of DL-based UAV object detection and tracking methods, along with some thoughts on their further developments.

Multi-object tracking (MOT) is a crucial component of situational awareness in military defense applications. With the growing use of unmanned aerial systems (UASs), MOT methods for aerial surveillance is in high demand. Application of MOT in UAS presents specific challenges such as moving sensor, changing zoom levels, dynamic background, illumination changes, obscurations and small objects. In this work, we present a robust object tracking architecture aimed to accommodate for the noise in real-time situations. We propose a kinematic prediction model, called Deep Extended Kalman Filter (DeepEKF), in which a sequence-to-sequence architecture is used to predict entity trajectories in latent space. DeepEKF utilizes a learned image embedding along with an attention mechanism trained to weight the importance of areas in an image to predict future states. For the visual scoring, we experiment with different similarity measures to calculate distance based on entity appearances, including a convolutional neural network (CNN) encoder, pre-trained using Siamese networks. In initial evaluation experiments, we show that our method, combining scoring structure of the kinematic and visual models within a MHT framework, has improved performance especially in edge cases where entity motion is unpredictable, or the data presents frames with significant gaps.

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: `Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as `unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE: Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyze the efficacy of ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterizing unknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-of-the-art performance, with no extra methodological effort. We hope that our work will attract further research into this newly identified, yet crucial research direction.

Object tracking is the cornerstone of many visual analytics systems. While considerable progress has been made in this area in recent years, robust, efficient, and accurate tracking in real-world video remains a challenge. In this paper, we present a hybrid tracker that leverages motion information from the compressed video stream and a general-purpose semantic object detector acting on decoded frames to construct a fast and efficient tracking engine suitable for a number of visual analytics applications. The proposed approach is compared with several well-known recent trackers on the OTB tracking dataset. The results indicate advantages of the proposed method in terms of speed and/or accuracy. Another advantage of the proposed method over most existing trackers is its simplicity and deployment efficiency, which stems from the fact that it reuses and re-purposes the resources and information that may already exist in the system for other reasons.

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.

In this paper, we present a new method for detecting road users in an urban environment which leads to an improvement in multiple object tracking. Our method takes as an input a foreground image and improves the object detection and segmentation. This new image can be used as an input to trackers that use foreground blobs from background subtraction. The first step is to create foreground images for all the frames in an urban video. Then, starting from the original blobs of the foreground image, we merge the blobs that are close to one another and that have similar optical flow. The next step is extracting the edges of the different objects to detect multiple objects that might be very close (and be merged in the same blob) and to adjust the size of the original blobs. At the same time, we use the optical flow to detect occlusion of objects that are moving in opposite directions. Finally, we make a decision on which information we keep in order to construct a new foreground image with blobs that can be used for tracking. The system is validated on four videos of an urban traffic dataset. Our method improves the recall and precision metrics for the object detection task compared to the vanilla background subtraction method and improves the CLEAR MOT metrics in the tracking tasks for most videos.

北京阿比特科技有限公司