亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Volumetric video offers a highly immersive viewing experience, but poses challenges in ensuring quality of experience (QoE) due to its high bandwidth requirements. In this paper, we explore the effect of viewing distance introduced by six degrees of freedom (6DoF) spatial navigation on user's perceived quality. By considering human visual resolution limitations, we propose a visual acuity model that describes the relationship between the virtual viewing distance and the tolerable boundary point cloud density. The proposed model satisfies spatial visual requirements during 6DoF exploration. Additionally, it dynamically adjusts quality levels to balance perceptual quality and bandwidth consumption. Furthermore, we present a QoE model to represent user's perceived quality at different viewing distances precisely. Extensive experimental results demonstrate that, the proposed scheme can effectively improve the overall average QoE by up to 26% over real networks and user traces, compared to existing baselines.

相關內容

The Motion Manifold Primitive (MMP) produces, for a given task, a continuous manifold of trajectories each of which can successfully complete the task. It consists of the decoder function that parametrizes the manifold and the probability density in the latent coordinate space. In this paper, we first show that the MMP performance can significantly degrade due to the geometric distortion in the latent space -- by distortion, we mean that similar motions are not located nearby in the latent space. We then propose {\it Isometric Motion Manifold Primitives (IMMP)} whose latent coordinate space preserves the geometry of the manifold. For this purpose, we formulate and use a Riemannian metric for the motion space (i.e., parametric curve space), which we call a {\it CurveGeom Riemannian metric}. Experiments with planar obstacle-avoiding motions and pushing manipulation tasks show that IMMP significantly outperforms existing MMP methods. Code is available at //github.com/Gabe-YHLee/IMMP-public.

Object Based Audio (OBA) provides a new kind of audio experience, delivered to the audience to personalize and customize their experience of listening and to give them choice of what and how to hear their audio content. OBA can be applied to different platforms such as broadcasting, streaming and cinema sound. This paper presents a novel approach for creating object-based audio on the production side. The approach here presents Sample-by-Sample Object Based Audio (SSOBA) embedding. SSOBA places audio object samples in such a way that allows audiences to easily individualize their chosen audio sources according to their interests and needs. SSOBA is an extra service and not an alternative, so it is also compliant with legacy audio players. The biggest advantage of SSOBA is that it does not require any special additional hardware in the broadcasting chain and it is therefore easy to implement and equip legacy players and decoders with enhanced ability. Input audio objects, number of output channels and sampling rates are three important factors affecting SSOBA performance and specifying it to be lossless or lossy. SSOBA adopts interpolation at the decoder side to compensate for eliminated samples. Both subjective and objective experiments are carried out to evaluate the output results at each step. MUSHRA subjective experiments conducted after the encoding step shows good-quality performance of SSOBA with up to five objects. SNR measurements and objective experiments, performed after decoding and interpolation, show significant successful recovery and separation of audio objects. Experimental results show that a minimum sampling rate of 96 kHz is indicated to encode up to five objects in a Stereo-mode channel to acquire good subjective and objective results simultaneously.

Weak supervision has emerged as a promising approach for rapid and large-scale dataset creation in response to the increasing demand for accelerated NLP development. By leveraging labeling functions, weak supervision allows practitioners to generate datasets quickly by creating learned label models that produce soft-labeled datasets. This paper aims to show how such an approach can be utilized to build an Indonesian NLP dataset from conservation news text. We construct two types of datasets: multi-class classification and sentiment classification. We then provide baseline experiments using various pretrained language models. These baseline results demonstrate test performances of 59.79% accuracy and 55.72% F1-score for sentiment classification, 66.87% F1-score-macro, 71.5% F1-score-micro, and 83.67% ROC-AUC for multi-class classification. Additionally, we release the datasets and labeling functions used in this work for further research and exploration.

Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources of information, depicting different tools and anatomical structures utilized during an extended amount of time. Despite containing crucial workflow information and being commonly recorded in many procedures, usage of surgical videos for automated surgical workflow understanding is still limited. In this work, we exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos while encoding all anatomical structures, tools, and their interactions. To properly evaluate the impact of our solutions, we create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets. We demonstrate that scene graphs can be leveraged through the use of graph convolutional networks (GCNs) to tackle surgical downstream tasks such as surgical workflow recognition with competitive performance. Moreover, we demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions, which are crucial in the clinical setting.

When developing policies for prevention of infectious diseases, policymakers often set specific, outcome-oriented targets to achieve. For example, when developing a vaccine allocation policy, policymakers may want to distribute them so that at least a certain fraction of individuals in a census block are disease-free and spillover effects due to interference within blocks are accounted for. The paper proposes methods to estimate a block-level treatment policy that achieves a pre-defined, outcome-oriented target while accounting for spillover effects due to interference. Our policy, the minimum resource threshold policy (MRTP), suggests the minimum fraction of treated units required within a block to meet or exceed the target level of the outcome. We estimate the MRTP from empirical risk minimization using a novel, nonparametric, doubly robust loss function. We then characterize statistical properties of the estimated MRTP in terms of the excess risk bound. We apply our methodology to design a water, sanitation, and hygiene allocation policy for Senegal with the goal of increasing the proportion of households with no children experiencing diarrhea to a level exceeding a specified threshold. Our policy outperforms competing policies and offers new approaches to design allocation policies, especially in international development for communicable diseases.

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pre-training and shows promising scaling behavior.

This paper presents a new approach for assembling graph neural networks based on framelet transforms. The latter provides a multi-scale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines a framelet-based graph convolution. The framelet decomposition naturally induces a graph pooling strategy by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. The graph neural networks with the proposed framelet convolution and pooling achieve state-of-the-art performance in many types of node and graph prediction tasks. Moreover, we propose shrinkage as a new activation for the framelet convolution, which thresholds the high-frequency information at different scales. Compared to ReLU, shrinkage in framelet convolution improves the graph neural network model in terms of denoising and signal compression: noises in both node and structure can be significantly reduced by accurately cutting off the high-pass coefficients from framelet decomposition, and the signal can be compressed to less than half its original size with the prediction performance well preserved.

Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness. In practice, attack algorithms are artificially selected and tuned by human experts to break a ML system. However, manual selection of attackers tends to be sub-optimal, leading to a mistakenly assessment of model security. In this paper, a new procedure called Composite Adversarial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms and their hyper-parameters from a candidate pool of \textbf{32 base attackers}. We design a search space where attack policy is represented as an attacking sequence, i.e., the output of the previous attacker is used as the initialization input for successors. Multi-objective NSGA-II genetic algorithm is adopted for finding the strongest attack policy with minimum complexity. The experimental result shows CAA beats 10 top attackers on 11 diverse defenses with less elapsed time (\textbf{6 $\times$ faster than AutoAttack}), and achieves the new state-of-the-art on $l_{\infty}$, $l_{2}$ and unrestricted adversarial attacks.

Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a high-level Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach significantly outperforms all the baseline methods on a newly introduced large-scale dataset for fine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widely-used MSR-VTT dataset.

Recent advance in fluorescence microscopy enables acquisition of 3D image volumes with better quality and deeper penetration into tissue. Segmentation is a required step to characterize and analyze biological structures in the images. 3D segmentation using deep learning has achieved promising results in microscopy images. One issue is that deep learning techniques require a large set of groundtruth data which is impractical to annotate manually for microscopy volumes. This paper describes a 3D nuclei segmentation method using 3D convolutional neural networks. A set of synthetic volumes and the corresponding groundtruth volumes are generated automatically using a generative adversarial network. Segmentation results demonstrate that our proposed method is capable of segmenting nuclei successfully in 3D for various data sets.

北京阿比特科技有限公司