精品自在线观看影片天天看_亚洲自偷拍狠无码_91最新久久精品人妻_欧美日韩精品偷拍一区二区_久久综合国产精品免费_97一区二区在线播放_国产一级爱做SS爱视频直播

In this article we present a visual gyroscope based on equirectangular panoramas. We propose a new pipeline where we take advantage of combining three different methods to obtain a robust and accurate estimation of the attitude of the camera. We quantitatively and qualitatively validate our method on two image sequences taken with a $360^\circ$ dual-fisheye camera mounted on different aerial vehicles.

相關內容

有向

關注 1

TDOA · 優化器 · 情景 · Performer · 統計效率 ·

2024 年 3 月 15 日

Multi-Source Localization and Data Association for Time-Difference of Arrival Measurements

Gabrielle Flood,Filip Elvander

In this work, we consider the problem of localizing multiple signal sources based on time-difference of arrival (TDOA) measurements. In the blind setting, in which the source signals are not known, the localization task is challenging due to the data association problem. That is, it is not known which of the TDOA measurements correspond to the same source. Herein, we propose to perform joint localization and data association by means of an optimal transport formulation. The method operates by finding optimal groupings of TDOA measurements and associating these with candidate source locations. To allow for computationally feasible localization in three-dimensional space, an efficient set of candidate locations is constructed using a minimal multilateration solver based on minimal sets of receiver pairs. In numerical simulations, we demonstrate that the proposed method is robust both to measurement noise and TDOA detection errors. Furthermore, it is shown that the data association provided by the proposed method allows for statistically efficient estimates of the source locations.

優化器 · 控制器 · 平穩的 · 離散化 · Lipschitz ·

2024 年 3 月 15 日

Optimal Control of Stationary Doubly Diffusive Flows on Two and Three Dimensional Bounded Lipschitz Domains: Numerical Analysis

Jai Tushar,Arbaz Khan,Manil T. Mohan

In this work, we propose fully nonconforming, locally exactly divergence-free discretizations based on lowest order Crouziex-Raviart finite element and piecewise constant spaces to study the optimal control of stationary double diffusion model presented in [B\"urger, M\'endez, Ruiz-Baier, SINUM (2019), 57:1318-1343]. The well-posedness of the discrete uncontrolled state and adjoint equations are discussed using discrete lifting and fixed point arguments, and convergence results are derived rigorously under minimal regularity. Building upon our recent work [Tushar, Khan, Mohan arXiv (2023)], we prove the local optimality of a reference control using second-order sufficient optimality condition for the control problem, and use it along with an optimize-then-discretize approach to prove optimal order a priori error estimates for the control, state and adjoint variables upto the regularity of the solution. The optimal control is computed using a primal-dual active set strategy as a semi-smooth Newton method and computational tests validate the predicted error decay rates and illustrate the proposed scheme's applicability to optimal control of thermohaline circulation problems.

估計/估計量 · 3D · Performer · Automator · 穩健性 ·

2024 年 3 月 14 日

ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image

Fangqiang Ding,Yunzhou Zhu,Xiangyu Wen,Chris Xiaoxuan Lu

from arxiv, 20 pages, 6 pages, 5 tables

In this work, we present ThermoHands, a new benchmark for thermal image-based egocentric 3D hand pose estimation, aimed at overcoming challenges like varying lighting and obstructions (e.g., handwear). The benchmark includes a diverse dataset from 28 subjects performing hand-object and hand-virtual interactions, accurately annotated with 3D hand poses through an automated process. We introduce a bespoken baseline method, TheFormer, utilizing dual transformer modules for effective egocentric 3D hand pose estimation in thermal imagery. Our experimental results highlight TheFormer's leading performance and affirm thermal imaging's effectiveness in enabling robust 3D hand pose estimation in adverse conditions.

MoDELS · 變換 · Transformer模型 · 圖片分類 · 矩 ·

2024 年 3 月 14 日

Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models

Akhil Kedia,Mohd Abbas Zaidi,Sushil Khyalia,Jungho Jung,Harshith Goka,Haejun Lee

from arxiv, Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia equal contribution. Source code is available at //github.com/akhilkedia/TranformersGetStable

In spite of their huge success, transformer models remain difficult to scale in depth. In this work, we develop a unified signal propagation theory and provide formulae that govern the moments of the forward and backward signal through the transformer model. Our framework can be used to understand and mitigate vanishing/exploding gradients, rank collapse, and instability associated with high attention scores. We also propose DeepScaleLM, an initialization and scaling scheme that conserves unit output/gradient moments throughout the model, enabling the training of very deep models with 100s of layers. We find that transformer models could be much deeper - our deep models with fewer parameters outperform shallow models in Language Modeling, Speech Translation, and Image Classification, across Encoder-only, Decoder-only and Encoder-Decoder variants, for both Pre-LN and Post-LN transformers, for multiple datasets and model sizes. These improvements also translate into improved performance on downstream Question Answering tasks and improved robustness for image classification.

多峰值 · 語言模型化 · MoDELS · 可理解性 · state-of-the-art ·

2024 年 3 月 13 日

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Minjie Zhu,Yichen Zhu,Xin Liu,Ning Liu,Zhiyuan Xu,Chaomin Shen,Yaxin Peng,Zhicai Ou,Feifei Feng,Jian Tang

Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning. Yet, their widespread application faces obstacles due to the high computational demands during both the training and inference phases, restricting their use to a limited audience within the research and user communities. In this paper, we investigate the design aspects of Multimodal Small Language Models (MSLMs) and propose an efficient multimodal assistant named Mipha, which is designed to create synergy among various aspects: visual representation, language models, and optimization strategies. We show that without increasing the volume of training data, our Mipha-3B outperforms the state-of-the-art large MLLMs, especially LLaVA-1.5-13B, on multiple benchmarks. Through detailed discussion, we provide insights and guidelines for developing strong MSLMs that rival the capabilities of MLLMs. Our code is available at //github.com/zhuyiche/Mipha.

Performer · Use Case · INFORMS · Integration · 查準率/準確率 ·

2024 年 3 月 13 日

SlicerTMS: Real-Time Visualization of Transcranial Magnetic Stimulation for Mental Health Treatment

Loraine Franke,Tae Young Park,Jie Luo,Yogesh Rathi,Steve Pieper,Lipeng Ning,Daniel Haehn

from arxiv, 11 pages, 4 figures, 2 tables, MICCAI

We present a real-time visualization system for Transcranial Magnetic Stimulation (TMS), a non-invasive neuromodulation technique for treating various brain disorders and mental health diseases. Our solution targets the current challenges of slow and labor-intensive practices in treatment planning. Integrating Deep Learning (DL), our system rapidly predicts electric field (E-field) distributions in 0.2 seconds for precise and effective brain stimulation. The core advancement lies in our tool's real-time neuronavigation visualization capabilities, which support clinicians in making more informed decisions quickly and effectively. We assess our system's performance through three studies: First, a real-world use case scenario in a clinical setting, providing concrete feedback on applicability and usability in a practical environment. Second, a comparative analysis with another TMS tool focusing on computational efficiency across various hardware platforms. Lastly, we conducted an expert user study to measure usability and influence in optimizing TMS treatment planning. The system is openly available for community use and further development on GitHub: \url{//github.com/lorifranke/SlicerTMS}.

有向 · 設計 · Performer · state-of-the-art · Better ·

2024 年 3 月 12 日

Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

Rachel Cummings,Damien Desfontaines,David Evans,Roxana Geambasu,Yangsibo Huang,Matthew Jagielski,Peter Kairouz,Gautam Kamath,Sewoong Oh,Olga Ohrimenko,Nicolas Papernot,Ryan Rogers,Milan Shen,Shuang Song,Weijie Su,Andreas Terzis,Abhradeep Thakurta,Sergei Vassilvitskii,Yu-Xiang Wang,Li Xiong,Sergey Yekhanin,Da Yu,Huanyu Zhang,Wanrong Zhang

In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 2022 with experts from industry, academia, and the public sector seeking answers to broad questions pertaining to privacy and its implications in the design of industry-grade systems. This article aims to provide a reference point for the algorithmic and design decisions within the realm of privacy, highlighting important challenges and potential research directions. Covering a wide spectrum of topics, this article delves into the infrastructure needs for designing private systems, methods for achieving better privacy/utility trade-offs, performing privacy attacks and auditing, as well as communicating privacy with broader audiences and stakeholders.

多峰值 · MoDELS · Performer · Integration · 語言模型化 ·

2024 年 2 月 19 日

The (R)Evolution of Multimodal Large Language Models: A Survey

Davide Caffagni,Federico Cocchi,Luca Barsellotti,Nicholas Moratelli,Sara Sarto,Lorenzo Baraldi,Lorenzo Baraldi,Marcella Cornia,Rita Cucchiara

Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal Large Language Models (MLLMs). These models can seamlessly integrate visual and textual modalities, both as input and output, while providing a dialogue-based interface and instruction-following capabilities. In this paper, we provide a comprehensive review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques. We also conduct a detailed analysis of these models across a wide range of tasks, including visual grounding, image generation and editing, visual understanding, and domain-specific applications. Additionally, we compile and describe training datasets and evaluation benchmarks, conducting comparisons among existing models in terms of performance and computational requirements. Overall, this survey offers a comprehensive overview of the current state of the art, laying the groundwork for future MLLMs.

Vision · 模型評估 · 可約的 · 計算機視覺 · DNN ·

2020 年 3 月 24 日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Abhinav Goel,Caleb Tung,Yung-Hsiang Lu,George K. Thiruvathukal

from arxiv, Accepted for publication at 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA 2020

Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.

優化器 · Extensibility · 最優化 · Automator · Neural Networks ·

2020 年 3 月 12 日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Tong Yu,Hong Zhu

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.