亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios.

相關內容

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來,這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接: · SPL · 可約的 · Cognition · 情景 ·
2023 年 10 月 30 日

Analyses of a software product line (SPL) typically report variable results that are annotated with logical expressions indicating the set of product variants for which the results hold. These expressions can get complicated and difficult to reason about when the SPL has lots of features and product variants. Previous work introduced a visualizer that supports filters for highlighting the analysis results that apply to product variants of interest, but this work was weakly evaluated. In this paper, we report on a controlled user study that evaluates the effectiveness of this new visualizer in helping the user search variable results and compare the results of multiple variants. Our findings indicate that the use of the new visualizer significantly improves the correctness and efficiency of the user's work and reduces the user's cognitive load in working with variable results.

Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.

The openness and influence of video-sharing platforms (VSPs) such as YouTube and TikTok attracted creators to share videos on various social issues. Although social issue videos (SIVs) affect public opinions and breed misinformation, how VSP users obtain information and interact with SIVs is under-explored. This work surveyed 659 YouTube and 127 TikTok users to understand the motives for consuming SIVs on VSPs. We found that VSP users are primarily motivated by the information and entertainment gratifications to use the platform. VSP users use SIVs for information-seeking purposes and find YouTube and TikTok convenient to interact with SIVs. VSP users moderately watch SIVs for entertainment and inactively engage in social interactions. SIV consumption is associated with information and socialization gratifications of the platform. VSP users appreciate the diversity of information and opinions but would also do their own research and are concerned about the misinformation and echo chamber problems.

In the face of challenges encountered during femur fracture surgery, such as the high rates of malalignment and X-ray exposure to operating personnel, robot-assisted surgery has emerged as an alternative to conventional state-of-the-art surgical methods. This paper introduces the development of Robossis, a haptic system for robot-assisted femur fracture surgery. Robossis comprises a 7-DOF haptic controller and a 6-DOF surgical robot. A unilateral control architecture is developed to address the kinematic mismatch and the motion transfer between the haptic controller and the Robossis surgical robot. A real-time motion control pipeline is designed to address the motion transfer and evaluated through experimental testing. The analysis illustrates that the Robossis surgical robot can adhere to the desired trajectory from the haptic controller with an average translational error of 0.32 mm and a rotational error of 0.07 deg. Additionally, a haptic rendering pipeline is developed to resolve the kinematic mismatch by constraining the haptic controller (user hand) movement within the permissible joint limits of the Robossis surgical robot. Lastly, in a cadaveric lab test, the Robossis system assisted surgeons during a mock femur fracture surgery. The result shows that Robossis can provide an intuitive solution for surgeons to perform femur fracture surgery.

Recent years have witnessed an exponential increase in the demand for face video compression, and the success of artificial intelligence has expanded the boundaries beyond traditional hybrid video coding. Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs, leveraging the statistical priors of face videos. However, the great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA). In this paper, we introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos. The database contains 3,240 compressed face video clips in multiple compression levels, which are derived from 135 source videos with diversified content using six representative video codecs, including two traditional methods based on hybrid coding frameworks, two end-to-end methods, and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) index for face video compression was developed to measure the perceptual quality, considering the distinct content characteristics and temporal priors of the face videos. Experimental results exhibit its superior performance on the proposed CFVQA dataset. The benchmark is now made publicly available at: //github.com/Yixuan423/Compressed-Face-Videos-Quality-Assessment.

Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.

The emerging trend of AR/VR places great demands on 3D content. However, most existing software requires expertise and is difficult for novice users to use. In this paper, we aim to create sketch-based modeling tools for user-friendly 3D modeling. We introduce Reality3DSketch with a novel application of an immersive 3D modeling experience, in which a user can capture the surrounding scene using a monocular RGB camera and can draw a single sketch of an object in the real-time reconstructed 3D scene. A 3D object is generated and placed in the desired location, enabled by our novel neural network with the input of a single sketch. Our neural network can predict the pose of a drawing and can turn a single sketch into a 3D model with view and structural awareness, which addresses the challenge of sparse sketch input and view ambiguity. We conducted extensive experiments synthetic and real-world datasets and achieved state-of-the-art (SOTA) results in both sketch view estimation and 3D modeling performance. According to our user study, our method of performing 3D modeling in a scene is $>$5x faster than conventional methods. Users are also more satisfied with the generated 3D model than the results of existing methods.

Semantic text similarity plays an important role in software engineering tasks in which engineers are requested to clarify the semantics of descriptive labels (e.g., business terms, table column names) that are often consists of too short or too generic words and appears in their IT systems. We formulate this type of problem as a task of matching descriptive labels to glossary descriptions. We then propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization where the former is a method to retrieve sentences relevant to a given label and the latter is a method to compute similarity between two contexts each of which is derived from a set of texts (e.g., column names in the same table). We performed an experiment on two datasets derived from publicly available data sources. The result indicated that the proposed methods helped the underlying STS correctly match more descriptive labels with the descriptions.

Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

北京阿比特科技有限公司