姑娘日本电影免费观看全集中文,影888午夜理论不卡,欧美激情在线精品一二三区,欧美粗大激情一级视频,久久成人午夜电影MP4

We introduce MuVieCAST, a modular multi-view consistent style transfer network architecture that enables consistent style transfer between multiple viewpoints of the same scene. This network architecture supports both sparse and dense views, making it versatile enough to handle a wide range of multi-view image datasets. The approach consists of three modules that perform specific tasks related to style transfer, namely content preservation, image transformation, and multi-view consistency enforcement. We extensively evaluate our approach across multiple application domains including depth-map-based point cloud fusion, mesh reconstruction, and novel-view synthesis. Our experiments reveal that the proposed framework achieves an exceptional generation of stylized images, exhibiting consistent outcomes across perspectives. A user study focusing on novel-view synthesis further confirms these results, with approximately 68\% of cases participants expressing a preference for our generated outputs compared to the recent state-of-the-art method. Our modular framework is extensible and can easily be integrated with various backbone architectures, making it a flexible solution for multi-view style transfer. More results are demonstrated on our project page: muviecast.github.io

相關內容

Extensibility

關注 5

iOS 8 提供的應用間和應用跟系統的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source:

Performer · MoDELS · 位置編碼 · 可約的 · INFORMS ·

2024 年 1 月 30 日

MouSi: Poly-Visual-Expert Vision-Language Models

Xiaoran Fan,Tao Ji,Changhao Jiang,Shuo Li,Senjie Jin,Sirui Song,Junke Wang,Boyang Hong,Lu Chen,Guodong Zheng,Ming Zhang,Caishuang Huang,Rui Zheng,Zhiheng Xi,Yuhao Zhou,Shihan Dou,Junjie Ye,Hang Yan,Tao Gui,Qi Zhang,Xipeng Qiu,Xuanjing Huang,Zuxuan Wu,Yu-Gang Jiang

Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.

向量化 · 詞元分析器 · 表示 · INFORMS · 離散化 ·

2024 年 1 月 30 日

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Zecheng Tang,Chenfei Wu,Zekai Zhang,Mingheng Ni,Shengming Yin,Yu Liu,Zhengyuan Yang,Lijuan Wang,Zicheng Liu,Juntao Li,Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%.

Processing（編程語言） · 評論員 · 講稿 · 線性的 · Less ·

2024 年 1 月 30 日

BCM-Broadcast: A Byzantine-Tolerant Causal Broadcast Algorithm for Distributed Mobile Systems

Leila NamvariTazehkand,Saied Pashazadeh,Ali Ebnenasir

This paper presents an algorithm, called BCM-Broadcast, for the implementation of causal broadcast in distributed mobile systems in the presence of Byzantine failures. The BCM-Broadcast algorithm simultaneously focuses on three critical challenges in distributed systems: Byzantine failures, Causality, and Mobility. We first present a hierarchical architecture for BCM-Broadcast. Then, we define twelve properties for BCM-Broadcast, including validity, integrity, termination, and causality. We then show that BCM-Broadcast satisfies all these properties. We also prove the safety of BCM-Broadcast; i.e., no healthy process delivers a message from any Byzantine process, assuming that the number of Byzantine processes is less than a third of the total number of mobile nodes. Subsequently, we show that the message complexity of BCM-Broadcast is linear. Finally, using the Poisson process, we analyze the probability of the violation of the safety condition under different mobility scenarios.

MoDELS · 語言模型化 · 大語言模型 · 分解的 · Performance ·

2024 年 1 月 30 日

MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models

Wai-Chung Kwan,Xingshan Zeng,Yuxin Jiang,Yufei Wang,Liangyou Li,Lifeng Shang,Xin Jiang,Qun Liu,Kam-Fai Wong

from arxiv, Code and data are available at //github.com/KwanWaiChung/MT-Eval

Large language models (LLMs) are increasingly relied upon for complex multi-turn conversations across diverse real-world applications. However, existing benchmarks predominantly focus on single-turn evaluations, overlooking the models' capabilities in multi-turn interactions. To address this gap, we introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities. By analyzing human-LLM conversations, we categorize interaction patterns into four types: recollection, expansion, refinement, and follow-up. We construct multi-turn queries for each category either by augmenting existing datasets or by creating new examples with GPT-4 to avoid data leakage. To study the factors impacting multi-turn abilities, we create single-turn versions of the 1170 multi-turn queries and compare performance. Our evaluation of 11 well-known LLMs shows that while closed-source models generally surpass open-source ones, certain open-source models exceed GPT-3.5-Turbo in specific tasks. We observe significant performance degradation in multi-turn settings compared to single-turn settings in most models, which is not correlated with the models' fundamental capabilities. Moreover, we identify the distance to relevant content and susceptibility to error propagation as the key factors influencing multi-turn performance. MT-Eval is released publicly to encourage future research towards more robust conversational models.

模型評估 · MoDELS · Performer · state-of-the-art · 設計 ·

2024 年 1 月 29 日

GAPS: Geometry-Aware Problem Solver

Jiaxin Zhang,Yinghui Jiang,Yashar Moshfeghi

Geometry problem solving presents a formidable challenge within the NLP community. Existing approaches often rely on models designed for solving math word problems, neglecting the unique characteristics of geometry math problems. Additionally, the current research predominantly focuses on geometry calculation problems, while overlooking other essential aspects like proving. In this study, we address these limitations by proposing the Geometry-Aware Problem Solver (GAPS) model. GAPS is specifically designed to generate solution programs for geometry math problems of various types with the help of its unique problem-type classifier. To achieve this, GAPS treats the solution program as a composition of operators and operands, segregating their generation processes. Furthermore, we introduce the geometry elements enhancement method, which enhances the ability of GAPS to recognize geometry elements accurately. By leveraging these improvements, GAPS showcases remarkable performance in resolving geometry math problems. Our experiments conducted on the UniGeo dataset demonstrate the superiority of GAPS over the state-of-the-art model, Geoformer. Specifically, GAPS achieves an accuracy improvement of more than 5.3% for calculation tasks and an impressive 41.1% for proving tasks. Notably, GAPS achieves an impressive accuracy of 97.5% on proving problems, representing a significant advancement in solving geometry proving tasks.

知識 (knowledge) · MoDELS · 大語言模型 · Extensibility · 可辨認的 ·

2024 年 1 月 29 日

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning

Yuqiang Sun,Daoyuan Wu,Yue Xue,Han Liu,Wei Ma,Lyuye Zhang,Miaolei Shi,Yang Liu

from arxiv, This is a technical report by Nanyang Technological University

Large language models (LLMs) have demonstrated significant poten- tial for many downstream tasks, including those requiring human- level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still prelim- inary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability - whether it originates from the model itself or from external assistance, such as invoking tool sup- port and retrieving vulnerability knowledge. In this paper, we aim to decouple LLMs' vulnerability reason- ing capability from their other capabilities, including the ability to actively seek additional information (e.g., via function calling in SOTA models), adopt relevant vulnerability knowledge (e.g., via vector-based matching and retrieval), and follow instructions to out- put structured results. To this end, we propose a unified evaluation framework named LLM4Vuln, which separates LLMs' vulnerability reasoning from their other capabilities and evaluates how LLMs' vulnerability reasoning could be enhanced when combined with the enhancement of other capabilities. To demonstrate the effectiveness of LLM4Vuln, we have designed controlled experiments using 75 ground-truth smart contract vulnerabilities, which were extensively audited as high-risk on Code4rena from August to November 2023, and tested them in 4,950 different scenarios across three represen- tative LLMs (GPT-4, Mixtral, and Code Llama). Our results not only reveal ten findings regarding the varying effects of knowledge en- hancement, context supplementation, prompt schemes, and models but also enable us to identify 9 zero-day vulnerabilities in two pilot bug bounty programs with over 1,000 USD being awarded.

自編碼器 · 掩碼 · Learning · 穩健性 · INFORMS ·

2024 年 1 月 29 日

MV2MAE: Multi-View Video Masked Autoencoders

Ketul Shah,Robert Crandall,Jie Xu,Peng Zhou,Marian George,Mayank Bansal,Rama Chellappa

Videos captured from multiple viewpoints can help in perceiving the 3D structure of the world and benefit computer vision tasks such as action recognition, tracking, etc. In this paper, we present a method for self-supervised learning from synchronized multi-view videos. We use a cross-view reconstruction task to inject geometry information in the model. Our approach is based on the masked autoencoder (MAE) framework. In addition to the same-view decoder, we introduce a separate cross-view decoder which leverages cross-attention mechanism to reconstruct a target viewpoint video using a video from source viewpoint, to help representations robust to viewpoint changes. For videos, static regions can be reconstructed trivially which hinders learning meaningful representations. To tackle this, we introduce a motion-weighted reconstruction loss which improves temporal modeling. We report state-of-the-art results on the NTU-60, NTU-120 and ETRI datasets, as well as in the transfer learning setting on NUCLA, PKU-MMD-II and ROCOG-v2 datasets, demonstrating the robustness of our approach. Code will be made available.

流 · 推斷 · 模型評估 · state-of-the-art · INFORMS ·

2024 年 1 月 27 日

STAC: Leveraging Spatio-Temporal Data Associations For Efficient Cross-Camera Streaming and Analytics

Volodymyr Vakhniuk,Ayush Sarkar,Ragini Gupta

We propose an efficient cross-cameras surveillance system called,STAC, that leverages spatio-temporal associations between multiple cameras to provide real-time analytics and inference under constrained network environments. STAC is built using the proposed omni-scale feature learning people reidentification (reid) algorithm that allows accurate detection, tracking and re-identification of people across cameras using the spatio-temporal characteristics of video frames. We integrate STAC with frame filtering and state-of-the-art compression for streaming technique (that is, ffmpeg libx264 codec) to remove redundant information from cross-camera frames. This helps in optimizing the cost of video transmission as well as compute/processing, while maintaining high accuracy for real-time query inference. The introduction of AICity Challenge 2023 Data [1] by NVIDIA has allowed exploration of systems utilizing multi-camera people tracking algorithms. We evaluate the performance of STAC using this dataset to measure the accuracy metrics and inference rate for reid. Additionally, we quantify the reduction in video streams achieved through frame filtering and compression using FFmpeg compared to the raw camera streams. For completeness, we make available our repository to reproduce the results, available at //github.com/VolodymyrVakhniuk/CS444_Final_Project.

圖 · 圖形處理器 · 結點 · Neural Networks · Networking ·

2020 年 2 月 5 日

MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

Xinyu Fu,Jiani Zhang,Ziqiao Meng,Irwin King

from arxiv, To appear at WWW 2020; 11 pages, 4 figures

A large number of real-world graphs or networks are inherently heterogeneous, involving a diversity of node types and relation types. Heterogeneous graph embedding is to embed rich structural and semantic information of a heterogeneous graph into low-dimensional node representations. Existing models usually define multiple metapaths in a heterogeneous graph to capture the composite relations and guide neighbor selection. However, these models either omit node content features, discard intermediate nodes along the metapath, or only consider one metapath. To address these three limitations, we propose a new model named Metapath Aggregated Graph Neural Network (MAGNN) to boost the final performance. Specifically, MAGNN employs three major components, i.e., the node content transformation to encapsulate input node attributes, the intra-metapath aggregation to incorporate intermediate semantic nodes, and the inter-metapath aggregation to combine messages from multiple metapaths. Extensive experiments on three real-world heterogeneous graph datasets for node classification, node clustering, and link prediction show that MAGNN achieves more accurate prediction results than state-of-the-art baselines.

Capsule · CapsNet · Networking · 判別器 · MoDELS ·

2018 年 2 月 17 日

CapsuleGAN: Generative Adversarial Capsule Network

Ayush Jaiswal,Wael AbdAlmageed,Premkumar Natarajan

We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semi-supervised image classification.