四虎亚洲精品高清在线观看_99久久国产精品综合久久国产_国产欧美一区二区免费看_99热99这里有免费精品_国产精品亚洲区二区三区_青青青国产在线观看播放_伊人亚洲福利一区二区三区

Most TextVQA approaches focus on the integration of objects, scene texts and question words by a simple transformer encoder. But this fails to capture the semantic relations between different modalities. The paper proposes a Scene Graph based co-Attention Network (SceneGATE) for TextVQA, which reveals the semantic relations among the objects, Optical Character Recognition (OCR) tokens and the question words. It is achieved by a TextVQA-based scene graph that discovers the underlying semantics of an image. We created a guided-attention module to capture the intra-modal interplay between the language and the vision as a guidance for inter-modal interactions. To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention. We conducted extensive experiments on two benchmark datasets, Text-VQA and ST-VQA. It is shown that our SceneGATE method outperformed existing ones because of the scene graph and its attention modules.

相關內容

Attention

關注 1

AI · Less · TOOLS · Processing（編程語言） · 操作 ·

2023 年 9 月 26 日

APPRAISE: a framework for managing AI compliance

Diptish Dey,Debarati Bhaumik

As AI systems increasingly impact society, the EU AI Act (AIA) is the first serious attempt to contain its less desired effects. Among others the act proposes audit as a mechanism and compliance products as tools for organizations to demonstrate compliance. In this paper, a framework for managing AI compliance, APPRAISE, is proposed. The framework is built upon the rationale that driving a balance between generating shareholder value through innovation in AI systems and managing compliance through organizational processes will eventually result in value that is responsible. By adhering to AIA compliance products, the framework operationalizes and hence safeguards compliance. Furthermore, a two-phase experiment with a limited scope is presented. The experiment aims to measure the extent to which companies coordinate technical elements of AI systems to ultimately comply with the AIA. In the first phase a survey is conducted and in the second phase the survey results are validated with a couple of respondents to generate additional in-depth insights and root causes.

潛在 · 支持向量 · 真實值 · 支持向量機 · 模型評估 ·

2023 年 9 月 26 日

Seafloor Classification based on an AUV Based Sub-bottom Acoustic Probe Data for Mn-crust survey

Umesh Neettiyath,Harumi Sugimatsu,Blair Thornton

from arxiv, This is a pre-print of the manuscript accepted for publication at IEEE/MTS Oceans Conference in GulfCoast, 2023. The final paper will be published in IEEE explore after the conference [2023/09/25]

The possibility of automatically classifying high frequency sub-bottom acoustic reflections collected from an Autonomous Underwater Robot is investigated in this paper. In field surveys of Cobalt-rich Manganese Crusts (Mn-crusts), existing methods relies on visual confirmation of seafloor from images and thickness measurements using the sub-bottom probe. Using these visual classification results as ground truth, an autoencoder is trained to extract latent features from bundled acoustic reflections. A Support Vector Machine classifier is then trained to classify the latent space to idetify seafloor classes. Results from data collected from seafloor at 1500m deep regions of Mn-crust showed an accuracy of about 70%.

樣本 · 優化器 · 可辨認的 · 假陰性 · 假陽性 ·

2023 年 9 月 25 日

Asymptotically optimal sequential anomaly identification with ordering sampling rules

Aristomenis Tsopelakos,Georgios Fellouris

The problem of sequential anomaly detection and identification is considered in the presence of a sampling constraint. Specifically, multiple data streams are generated by distinct sources and the goal is to quickly identify those that exhibit ``anomalous'' behavior, when it is not possible to sample every source at each time instant. Thus, in addition to a stopping rule, which determines when to stop sampling, and a decision rule, which indicates which sources to identify as anomalous upon stopping, one needs to specify a sampling rule that determines which sources to sample at each time instant. The focus of this work is on ordering sampling rules, which sample the data sources, among those currently estimated as anomalous (resp. non-anomalous), for which the corresponding local test statistics have the smallest (resp. largest) values. It is shown that with an appropriate design, which is specified explicitly, an ordering sampling rule leads to the optimal expected time for stopping, among all policies that satisfy the same sampling and error constraints, to a first-order asymptotic approximation as the false positive and false negative error rates under control both go to zero. This is the first asymptotic optimality result for ordering sampling rules when multiple sources can be sampled per time instant. Moreover, this is established under a general setup where the number of anomalies is not required to be a priori known. A novel proof technique is introduced, which unifies different versions of the problem regarding the homogeneity of the sources and prior information on the number of anomalies.

3D · 掩碼 · 點云 · 自編碼器 · Performer ·

2023 年 9 月 25 日

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

Ziyu Guo,Renrui Zhang,Longtian Qiu,Xianzhi Li,Pheng-Ann Heng

from arxiv, Accepted by IJCAI 2023

Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.

Networking · 數據獲取 · 基準 · 流 · 層 ·

2023 年 9 月 24 日

Scalable data concentrator with baseline interconnection network for triggerless data acquisition systems

Wojciech M. Zabo?otny

Triggerless Data Acquisition Systems (DAQs) require transmitting the data stream from multiple links to the processing node. The short input data words must be concentrated and packed into the longer bit vectors the output interface (e.g. PCI Express) uses. In that process, the unneeded data must be eliminated, and a dense stream of useful DAQ data must be created. Additionally, the time order of the data should be preserved. This paper presents a new solution using the Baseline Network with Reversed Outputs (BNRO)for high-speed data routing.A thorough analysis of the network operation enabled increased scalability compared to the previously published concentrator based on 8x8 network. The presented solution may be scaled by adding additional layers to the BNRO network while minimizing resource consumption. Simulations were done for 4 and 5 layers (16 and 32 inputs). The FPGA synthesis has been performed for 16inputs. The pipeline registers may be added in each network independently, shortening the critical path and increasing the maximum acceptable clock frequency.

查準率/準確率 · SimPLe · 在線 · 機器人 · 可辨認的 ·

2023 年 9 月 23 日

AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

Leonardo Saraceni,Ionut M. Motoi,Daniele Nardi,Thomas A. Ciarfuglia

from arxiv, 8 pages, 5 figures, submitted to International Conference on Robotics and Automation (ICRA) 2024. Code and dataset will be soon available on my github, after the acceptance to the conference

The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.

Analysis · Processing（編程語言） · ML · 欠估計 · Medical Image Analysis ·

2023 年 9 月 22 日

Metrics reloaded: Recommendations for image analysis validation

Lena Maier-Hein,Annika Reinke,Patrick Godau,Minu D. Tizabi,Florian Buettner,Evangelia Christodoulou,Ben Glocker,Fabian Isensee,Jens Kleesiek,Michal Kozubek,Mauricio Reyes,Michael A. Riegler,Manuel Wiesenfarth,A. Emre Kavur,Carole H. Sudre,Michael Baumgartner,Matthias Eisenmann,Doreen Heckmann-N?tzel,A. Tim R?dsch,Laura Acion,Michela Antonelli,Tal Arbel,Spyridon Bakas,Arriel Benis,Matthew Blaschko,M. Jorge Cardoso,Veronika Cheplygina,Beth A. Cimini,Gary S. Collins,Keyvan Farahani,Luciana Ferrer,Adrian Galdran,Bram van Ginneken,Robert Haase,Daniel A. Hashimoto,Michael M. Hoffman,Merel Huisman,Pierre Jannin,Charles E. Kahn,Dagmar Kainmueller,Bernhard Kainz,Alexandros Karargyris,Alan Karthikesalingam,Hannes Kenngott,Florian Kofler,Annette Kopp-Schneider,Anna Kreshuk,Tahsin Kurc,Bennett A. Landman,Geert Litjens,Amin Madani,Klaus Maier-Hein,Anne L. Martel,Peter Mattson,Erik Meijering,Bjoern Menze,Karel G. M. Moons,Henning Müller,Brennan Nichyporuk,Felix Nickel,Jens Petersen,Nasir Rajpoot,Nicola Rieke,Julio Saez-Rodriguez,Clara I. Sánchez,Shravya Shetty,Maarten van Smeden,Ronald M. Summers,Abdel A. Taha,Aleksei Tiulpin,Sotirios A. Tsaftaris,Ben Van Calster,Ga?l Varoquaux,Paul F. J?ger

from arxiv, Shared first authors: Lena Maier-Hein, Annika Reinke. arXiv admin note: substantial text overlap with arXiv:2104.05642

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.

GPT-4 · 代碼 · 語言模型化 · OpenAI · GPT3.5 ·

2023 年 9 月 22 日

OpenAi's GPT4 as coding assistant

Lefteris Moussiades,George Zografos

from arxiv, 10 pages

Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most potent Large Language Model from Openai. In this paper, we examine GPT3.5 and GPT4 as coding assistants. More specifically, we have constructed appropriate tests to check whether the two systems can a) answer typical questions that can arise during the code development, b) produce reliable code, and c) contribute to code debugging. The test results are impressive. The performance of GPT4 is outstanding and signals an increase in the productivity of programmers and the reorganization of software development procedures based on these new tools.

Microsoft Surface · Networking · 變換 · CNN · 可約的 ·

2023 年 9 月 22 日

CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

Xiaoheng Jiang,Kaiyi Guo,Yang Lu,Feng Yan,Hao Liu,Jiale Cao,Mingliang Xu,Dacheng Tao

Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. This can maintain the merit of CNN capturing detailed features and that of transformer depressing noises in the background, which facilitates accurate defect detection. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects, so as to further reduce the impact of the redundant background. Extensive experiments conducted on the surface defect datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer achieves state-of-the-art performance in defect detection.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.