亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='BFbBn'></li>

_{^{<dd id='qHsnD'><tbody id='gKq0a'><td id='d8EEs'><optgroup id='7olPL'><strong id='DkQ4T'></strong></optgroup><address id='baFaG'><ul id='M0HYC'></ul></address><big id='yNZPf'></big></td><table id='XtKvn'></table></tbody><pre id='Xr0iJ'></pre></dd><span id='MmbeL'><b id='f1zEB'></b></span>}}


<dfn id='0aDtE'><optgroup id='2y7H1'></optgroup></dfn><tfoot id='fJUjl'><bdo id='1EIsi'><div id='QVLcI'></div><i id='czTG2'><dt id='yHobG'></dt></i></bdo></tfoot>

_{<fieldset id='pTslP'></fieldset>}

·

邊 · VR · 虛擬現實（VR） · 流 · 服務器 ·

2023 年 11 月 17 日

User Dynamics-Aware Edge Caching and Computing for Mobile Virtual Reality

Mushu Li,Jie Gao,Conghao Zhou,Xuemin Shen,Weihua Zhuang

from arxiv, 38 pages, 13 figures, single column double spaced, published in IEEE Journal of Selected Topics in Signal Processing

In this paper, we present a novel content caching and delivery approach for mobile virtual reality (VR) video streaming. The proposed approach aims to maximize VR video streaming performance, i.e., minimizing video frame missing rate, by proactively caching popular VR video chunks and adaptively scheduling computing resources at an edge server based on user and network dynamics. First, we design a scalable content placement scheme for deciding which video chunks to cache at the edge server based on tradeoffs between computing and caching resource consumption. Second, we propose a machine learning-assisted VR video delivery scheme, which allocates computing resources at the edge server to satisfy video delivery requests from multiple VR headsets. A Whittle index-based method is adopted to reduce the video frame missing rate by identifying network and user dynamics with low signaling overhead. Simulation results demonstrate that the proposed approach can significantly improve VR video streaming performance over conventional caching and computing resource scheduling strategies.

相關內容

激活函數 · 泛函 · ReLU · Networking · Neural Networks ·

2024 年 1 月 9 日

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Yahong Yang,Qipin Chen,Wenrui Hao

In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.

多峰值 · 變換 · contrastive · Learning · 損失 ·

2024 年 1 月 8 日

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

from arxiv, Accepted by WACV 2024; well-formatted PDF is in //drive.google.com/file/d/10Zo_ydJZFAm7YsxHDgTjhyc4dEJbW_dk/view?usp=sharing

In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues. However, how to effectively leverage the two modalities is still under development. In this work, we develop a multiscale multimodal Transformer (MMT) that leverages hierarchical representation learning. Particularly, MMT is composed of a novel multiscale audio Transformer (MAT) and a multiscale video Transformer [43]. To learn a discriminative cross-modality fusion, we further design multimodal supervised contrastive objectives called audio-video contrastive loss (AVC) and intra-modal contrastive loss (IMC) that robustly align the two modalities. MMT surpasses previous state-of-the-art approaches by 7.3% and 2.1% on Kinetics-Sounds and VGGSound in terms of the top-1 accuracy without external training data. Moreover, the proposed MAT significantly outperforms AST [28] by 22.2%, 4.4% and 4.7% on three public benchmark datasets, and is about 3% more efficient based on the number of FLOPs and 9.8% more efficient based on GPU memory usage.

多峰值 · state-of-the-art · 變換 · Learning · 掩碼 ·

2024 年 1 月 8 日

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

from arxiv, Accepted by WACV 2024; well-formatted PDF is in //drive.google.com/file/d/1qvW52lamsvNGMCqPS7q8g8L4NaR_LlbR/view?usp=sharing. arXiv admin note: text overlap with arXiv:2401.04023

Audio and video are two most common modalities in the mainstream media platforms, e.g., YouTube. To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy. For multimodal fusion, simply concatenating multimodal tokens in a cross-modal Transformer requires large computational and memory resources, instead we reduce the cross-modality complexity through an audio-video bottleneck Transformer. To improve the learning efficiency of multimodal Transformer, we integrate self-supervised objectives, i.e., audio-video contrastive learning, audio-video matching, and masked audio and video learning, into AVT training, which maps diverse audio and video representations into a common multimodal representation space. We further propose a masked audio segment loss to learn semantic audio activities in AVT. Extensive experiments and ablation studies on three public datasets and two in-house datasets consistently demonstrate the effectiveness of the proposed AVT. Specifically, AVT outperforms its previous state-of-the-art counterparts on Kinetics-Sounds by 8%. AVT also surpasses one of the previous state-of-the-art video Transformers [25] by 10% on VGGSound by leveraging the audio signal. Compared to one of the previous state-of-the-art multimodal methods, MBT [32], AVT is 1.3% more efficient in terms of FLOPs and improves the accuracy by 3.8% on Epic-Kitchens-100.

Segment Anything · Performer · Attention · MoDELS · 掩碼 ·

2024 年 1 月 8 日

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Yiran Song,Qianyu Zhou,Xiangtai Li,Deng-Ping Fan,Xuequan Lu,Lizhuang Ma

from arxiv, Code://github.com/zongzi13545329/BA-SAM

In this paper, we address the challenge of image resolution variation for the Segment Anything Model (SAM). SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes. Previous approaches tend to resize the image to a fixed size or adopt structure modifications, hindering the preservation of SAM's rich prior knowledge. Besides, such task-specific tuning necessitates a complete retraining of the model, which is cost-expensive and unacceptable for deployment in the downstream tasks. In this paper, we reformulate this issue as a length extrapolation problem, where token sequence length varies while maintaining a consistent patch size for images of different sizes. To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications. Firstly, we introduce a new scaling factor to ensure consistent magnitude in the attention layer's dot product values when the token sequence length changes. Secondly, we present a bias-mode attention mask that allows each token to prioritize neighboring information, mitigating the impact of untrained distant information. Our BA-SAM demonstrates efficacy in two scenarios: zero-shot and fine-tuning. Extensive evaluation on diverse datasets, including DIS5K, DUTS, ISIC, COD10K, and COCO, reveals its ability to significantly mitigate performance degradation in the zero-shot setting and achieve state-of-the-art performance with minimal fine-tuning. Furthermore, we propose a generalized model and benchmark, showcasing BA-SAM's generalizability across all four datasets simultaneously.

大語言模型 · 控制器 · Integration · AI · Continuity ·

2024 年 1 月 6 日

LLMind: Orchestrating AI and IoT with LLMs for Complex Task Execution

Hongwei Cui,Yuyang Du,Qun Yang,Yulin Shao,Soung Chang Liew

from arxiv, Demo videos are available at //youtu.be/3Al5qRntEEU and //youtu.be/aTGD8EjQ8kM

In this paper, we introduce LLMind, an AI framework that utilizes large language models (LLMs) as a central orchestrator. The framework integrates LLMs with domain-specific AI modules, enabling IoT devices to collaborate effectively in executing complex tasks. The LLM engages in natural conversations with human users via a user-friendly social media platform to come up with a plan to execute complex tasks. In particular, the execution of a complex task, which may involve the collaborations of multiple domain-specific AI modules and IoT devices, is realized through a control script. The LLM generates the control script using a Language-Code transformation approach based on finite-state machines (FSMs). The framework also incorporates semantic analysis and response optimization techniques to enhance speed and effectiveness. Ultimately, this framework is designed not only to innovate IoT device control and enrich user experiences but also to foster an intelligent and integrated IoT device ecosystem that evolves and becomes more sophisticated through continuing user and machine interactions.

估計/估計量 · 非凸 · Integration · 閾值 · 平滑 ·

2024 年 1 月 5 日

Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations

Minseok Shin,Donggyu Kim

from arxiv, 54 pages, 5 figures

In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations. In high-frequency finance, we often observe that noises dominate a signal of an underlying true process. Thus, we cannot apply usual regression procedures to analyze noisy high-frequency observations. To handle this issue, we first employ a smoothing method for the observed variables. However, the smoothed variables still contain non-negligible noises. To manage these non-negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Thresholded dEbiased Nonconvex LASSO (TEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the TEN-LASSO estimator and discusses a nonconvex optimization algorithm.

分解的 · 線性的 · Tensor · 標準正交 · Analysis ·

2024 年 1 月 5 日

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

Zhen Qin,Michael B. Wakin,Zhihui Zhu

In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

無監督 · 表示學習 · 損失函數（機器學習） · 學成 · 未標記 ·

2020 年 2 月 26 日

Evolving Losses for Unsupervised Video Representation Learning

AJ Piergiovanni,Anelia Angelova,Michael S. Ryoo

from arxiv, arXiv admin note: text overlap with arXiv:1906.03248

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

圖像檢索 · 牛津大學 (University of Oxford) · Extensibility · 數據集 · Performer ·

2018 年 3 月 29 日

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Filip Radenovi?,Ahmet Iscen,Giorgos Tolias,Yannis Avrithis,Ond?ej Chum

from arxiv, CVPR 2018

In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi-automatically cleaned distractors is selected. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

虛擬現實（VR）

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191