曰本中文字幕一区二区三区高清_男女一边脱一边亲一边膜_国产日韩在线是看高清视频手机_国内外成人精品视频在线观看_91精品国产免费久久久久久婷婷_偷偷要色偷偷中文_欧美人在线观看全网

Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos. Going beyond existing methods that emphasize simple activities or objects, we propose a new model-agnostic approach for generating detailed textual descriptions that captures multimodal video information. Our method leverages the extensive knowledge learnt by large language models, such as GPT-3.5 or Llama2, to reason about textual descriptions of the visual and aural modalities, obtained from BLIP-2, Whisper and ImageBind. Without needing additional finetuning of video-text models or datasets, we demonstrate that available LLMs have the ability to use these multimodal textual descriptions as proxies for ``sight'' or ``hearing'' and perform zero-shot multimodal classification of videos in-context. Our evaluations on popular action recognition benchmarks, such as UCF-101 or Kinetics, show these context-rich descriptions can be successfully used in video understanding tasks. This method points towards a promising new research direction in multimodal classification, demonstrating how an interplay between textual, visual and auditory machine learning models can enable more holistic video understanding.

相關內容

多峰值

關注 2

變換 · Networking · 準則 · 推斷 · 操作 ·

2023 年 11 月 1 日

PAUMER: Patch Pausing Transformer for Semantic Segmentation

Evann Courdier,Prabhu Teja Sivaprasad,Fran?ois Fleuret

We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image. Our method, PAUMER, accomplishes this by pausing computation for patches that are deemed to not need any more computation before the final decoder. We use the entropy of predictions computed from intermediate activations as the pausing criterion, and find this aligns well with semantics of the image. Our method has a unique advantage that a single network trained with the proposed strategy can be effortlessly adapted at inference to various run-time requirements by modulating its pausing parameters. On two standard segmentation datasets, Cityscapes and ADE20K, we show that our method operates with about a $50\%$ higher throughput with an mIoU drop of about $0.65\%$ and $4.6\%$ respectively.

估計/估計量 · MoDELS · Learning · 穩健性 · 單元 ·

2023 年 11 月 1 日

A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

Ni Yao,Hang Hu,Kaicong Chen,Chen Zhao,Yuan Guo,Boya Li,Jiaofen Nan,Yanting Li,Chuang Han,Fubao Zhu,Weihua Zhou,Li Tian

from arxiv, 16 pages, 6 figures

Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC.

推斷 · Learning · 可交換的 · INFORMS · CASES ·

2023 年 11 月 1 日

On the Arithmetic and Geometric Fusion of Beliefs for Distributed Inference

Mert Kayaalp,Yunus Inan,Emre Telatar,Ali H. Sayed

from arxiv, Accepted for publication in IEEE Transactions on Automatic Control

We study the asymptotic learning rates under linear and log-linear combination rules of belief vectors in a distributed hypothesis testing problem. We show that under both combination strategies, agents are able to learn the truth exponentially fast, with a faster rate under log-linear fusion. We examine the gap between the rates in terms of network connectivity and information diversity. We also provide closed-form expressions for special cases involving federated architectures and exchangeable networks.

SURF · 泛化理論 · MoDELS · INTERACT · Principle ·

2023 年 11 月 1 日

SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics

Stefan Künzli,Florian Gr?tschla,Jo?l Mathys,Roger Wattenhofer

Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.

Networking · Markovian · MoDELS · 生成模型 · 邊緣化 ·

2023 年 10 月 31 日

DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series

Jase Clarkson,Mihai Cucuringu,Andrew Elliott,Gesine Reinert

Generative models for network time series (also known as dynamic graphs) have tremendous potential in fields such as epidemiology, biology and economics, where complex graph-based dynamics are core objects of study. Designing flexible and scalable generative models is a very challenging task due to the high dimensionality of the data, as well as the need to represent temporal dependencies and marginal network structure. Here we introduce DAMNETS, a scalable deep generative model for network time series. DAMNETS outperforms competing methods on all of our measures of sample quality, over both real and synthetic data sets.

MoDELS · BERT · 層 · GLUE · 可約的 ·

2023 年 10 月 31 日

EELBERT: Tiny Models through Dynamic Embeddings

Gabrielle Cohn,Rishika Agarwal,Deepanshu Gupta,Siddharth Patwardhan

from arxiv, EMNLP 2023, Industry Track 9 pages, 2 figures, 5 tables

We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants, replacing this layer with an embedding computation function helps us reduce the model size significantly. Empirical evaluation on the GLUE benchmark shows that our BERT variants (EELBERT) suffer minimal regression compared to the traditional BERT models. Through this approach, we are able to develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny, while being 15x smaller (1.2 MB) in size.

中位數 · 查準率/準確率 · 稀疏 · INFORMS · 統計量 ·

2023 年 10 月 31 日

BP3M: Bayesian Positions, Parallaxes, and Proper Motions derived from the Hubble Space Telescope and Gaia data

Kevin A. McKinnon,Andrés del Pino,Constance M. Rockosi,Miranda Apfel,Puragra Guhathakurta,Roeland P. van der Marel,Paul Bennet,Mark A. Fardal,Mattia Libralato,Eduardo Vitral,Laura L. Watkins

from arxiv, 33 pages, 25 figures, 3 tables

We present a hierarchical Bayesian pipeline, BP3M, that measures positions, parallaxes, and proper motions (PMs) for cross-matched sources between Hubble~Space~Telescope (HST) images and Gaia -- even for sparse fields ($N_*<10$ per image) -- expanding from the recent GaiaHub tool. This technique uses Gaia-measured astrometry as priors to predict the locations of sources in HST images, and is therefore able to put the HST images onto a global reference frame without the use of background galaxies/QSOs. Testing our publicly-available code in the Fornax and Draco dSphs, we measure accurate PMs that are a median of 8-13 times more precise than Gaia DR3 alone for $20.5<G<21~\mathrm{mag}$. We are able to explore the effect of observation strategies on BP3M astrometry using synthetic data, finding an optimal strategy to improve parallax and position precision at no cost to the PM uncertainty. Using 1619 HST images in the sparse COSMOS field (median 9 Gaia sources per HST image), we measure BP3M PMs for 2640 unique sources in the $16<G<21.5~\mathrm{mag}$ range, 25% of which have no Gaia PMs; the median BP3M PM uncertainty for $20.25<G<20.75~\mathrm{mag}$ sources is $0.44~$mas/yr compared to $1.03~$mas/yr from Gaia, while the median BP3M PM uncertainty for sources without Gaia-measured PMs ($20.75<G<21.5~\mathrm{mag}$) is $1.16~$mas/yr. The statistics that underpin the BP3M pipeline are a generalized way of combining position measurements from different images, epochs, and telescopes, which allows information to be shared between surveys and archives to achieve higher astrometric precision than that from each catalog alone.

Performer · Neural Networks · 圖 · 圖形處理器 · 動力系統 ·

2022 年 11 月 10 日

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Abishek Thangamuthu,Gunjan Kumar,Suresh Bishnoi,Ravinder Bhattoo,N M Anoop Krishnan,Sayan Ranu

from arxiv, Accepted at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

多峰值 · 情感分析 · MoDELS · AIM · Tumblr ·

2018 年 5 月 25 日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Anthony Hu,Seth Flaxman

from arxiv, Accepted as a conference paper at KDD 2018

We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.