欧美成人性色XXⅩXXA片在线,婷婷激情五月天中文字幕,高清视频免费一区二区三区,亚洲成人黄色在线,99精品高清国产一区二区

The BDD package Adiar manipulates Binary Decision Diagrams (BDDs) in external memory. This enables handling big BDDs, but the performance suffers when dealing with moderate-sized BDDs. This is mostly due to initializing expensive external memory data structures, even if their contents can fit entirely inside internal memory. The contents of these auxiliary data structures always correspond to a graph cut in an input or output BDD. Specifically, these cuts respect the levels of the BDD. We formalise the shape of these cuts and prove sound upper bounds on their maximum size for each BDD operation. We have implemented these upper bounds within Adiar. With these bounds, it can predict whether a faster internal memory variant of the auxiliary data structures can be used. In practice, this improves Adiar's running time across the board. Specifically for the moderate-sized BDDs, this results in an average reduction of the computation time by 86.1% (median of 89.7%). In some cases, the difference is even 99.9\%. When checking equivalence of hardware circuits from the EPFL Benchmark Suite, for one of the instances the time was decreased by 52 hours.

相關內容

圖

關注 6

塑造 · 模型評估 · 3D · state-of-the-art · HTTPS ·

2023 年 8 月 31 日

EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

Manuel Kaufmann,Jie Song,Chen Guo,Kaiyue Shen,Tianjian Jiang,Chengcheng Tang,Juan Zarate,Otmar Hilliges

from arxiv, Accepted to ICCV 2023

We present EMDB, the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. EMDB is a novel dataset that contains high-quality 3D SMPL pose and shape parameters with global body and camera trajectories for in-the-wild videos. We use body-worn, wireless electromagnetic (EM) sensors and a hand-held iPhone to record a total of 58 minutes of motion data, distributed over 81 indoor and outdoor sequences and 10 participants. Together with accurate body poses and shapes, we also provide global camera poses and body root trajectories. To construct EMDB, we propose a multi-stage optimization procedure, which first fits SMPL to the 6-DoF EM measurements and then refines the poses via image observations. To achieve high-quality results, we leverage a neural implicit avatar model to reconstruct detailed human surface geometry and appearance, which allows for improved alignment and smoothness via a dense pixel-level objective. Our evaluations, conducted with a multi-view volumetric capture system, indicate that EMDB has an expected accuracy of 2.3 cm positional and 10.6 degrees angular error, surpassing the accuracy of previous in-the-wild datasets. We evaluate existing state-of-the-art monocular RGB methods for camera-relative and global pose estimation on EMDB. EMDB is publicly available under //ait.ethz.ch/emdb

3D · Guidance · Learning · Boosting（一種模型訓練加速方式） · 有向 ·

2023 年 8 月 31 日

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Cuican Yu,Guansong Lu,Yihan Zeng,Jian Sun,Xiaodan Liang,Huibin Li,Zongben Xu,Songcen Xu,Wei Zhang,Hang Xu

from arxiv, accepted by ICCV 2023

Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie, and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.

數據集 · Analysis · Weight · Processing（編程語言） · 圖像處理 ·

2023 年 8 月 31 日

CongNaMul: A Dataset for Advanced Image Processing of Soybean Sprouts

Byunghyun Ban,Donghun Ryu,Su-won Hwang

from arxiv, Accepted to International Conference on ICT Convergence 2023

We present 'CongNaMul', a comprehensive dataset designed for various tasks in soybean sprouts image analysis. The CongNaMul dataset is curated to facilitate tasks such as image classification, semantic segmentation, decomposition, and measurement of length and weight. The classification task provides four classes to determine the quality of soybean sprouts: normal, broken, spotted, and broken and spotted, for the development of AI-aided automatic quality inspection technology. For semantic segmentation, images with varying complexity, from single sprout images to images with multiple sprouts, along with human-labelled mask images, are included. The label has 4 different classes: background, head, body, tail. The dataset also provides images and masks for the image decomposition task, including two separate sprout images and their combined form. Lastly, 5 physical features of sprouts (head length, body length, body thickness, tail length, weight) are provided for image-based measurement tasks. This dataset is expected to be a valuable resource for a wide range of research and applications in the advanced analysis of images of soybean sprouts. Also, we hope that this dataset can assist researchers studying classification, semantic segmentation, decomposition, and physical feature measurement in other industrial fields, in evaluating their models. The dataset is available at the authors' repository. (//bhban.kr/data)

AI · 查準率/準確率 · INFORMS · 可理解性 · 全 ·

2023 年 8 月 30 日

Strengthening the EU AI Act: Defining Key Terms on AI Manipulation

Matija Franklin,Philip Moreira Tomei,Rebecca Gorman

from arxiv, 10 pages

The European Union's Artificial Intelligence Act aims to regulate manipulative and harmful uses of AI, but lacks precise definitions for key concepts. This paper provides technical recommendations to improve the Act's conceptual clarity and enforceability. We review psychological models to define "personality traits," arguing the Act should protect full "psychometric profiles." We urge expanding "behavior" to include "preferences" since preferences causally influence and are influenced by behavior. Clear definitions are provided for "subliminal," "manipulative," and "deceptive" techniques, considering incentives, intent, and covertness. We distinguish "exploiting individuals" from "exploiting groups," emphasising different policy needs. An "informed decision" is defined by four facets: comprehension, accurate information, no manipulation, and understanding AI's influence. We caution the Act's therapeutic use exemption given the lack of regulation of digital therapeutics by the EMA. Overall, the recommendations strengthen definitions of vague concepts in the EU AI Act, enhancing precise applicability to regulate harmful AI manipulation.

數據集 · Performer · 縮放 · state-of-the-art · HTTPS ·

2023 年 8 月 30 日

Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

Basit Alawode,Fayaz Ali Dharejo,Mehnaz Ummar,Yuhang Guo,Arif Mahmood,Naoufel Werghi,Fahad Shahbaz Khan,Sajid Javed

This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT). Despite its significance, underwater tracking has remained unexplored due to data inaccessibility. It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. Performance of traditional tracking methods designed primarily for terrestrial or open-air scenarios drops in such conditions. We address the problem by proposing a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers. To develop robust and accurate UVOT methods, large-scale datasets are required. To this end, we introduce a large-scale UVOT benchmark dataset consisting of 400 video segments and 275,000 manually annotated frames enabling underwater training and evaluation of deep trackers. The videos are labelled with several underwater-specific tracking attributes including watercolor variation, target distractors, camouflage, target relative size, and low visibility conditions. The UVOT400 dataset, tracking results, and the code are publicly available on: //github.com/BasitAlawode/UWVOT400.

3D · 估計/估計量 · Ray · 跡 · Processing（編程語言） ·

2023 年 8 月 30 日

Estimating 3D Dental Structures using Simulated Panoramic Radiographs and Neural Ray Tracing

Sihwa Park,Seongjun Kim,Doeyoung Kwon,Yohan Jang,In-Seok Song,Seungjun Baek

from arxiv, 20 pages, 16 figures

Panoramic radiography (Panoramic X-ray, PX) is a widely used imaging modality for dental examination. However, PX only provides a flattened 2D image, lacking in a 3D view of the oral structure. In this paper, we propose a framework to estimate 3D oral structures from real-world PX. Our framework tackles full 3D reconstruction for varying subjects (patients) where each reconstruction is based only on a single panoramic image. We create an intermediate representation called simulated PX (SimPX) from 3D Cone-beam computed tomography (CBCT) data based on the Beer-Lambert law of X-ray rendering and rotational principles of PX imaging. SimPX aims at not only truthfully simulating PX, but also facilitates the reverting process back to 3D data. We propose a novel neural model based on ray tracing which exploits both global and local input features to convert SimPX to 3D output. At inference, a real PX image is translated to a SimPX-style image with semantic regularization, and the translated image is processed by generation module to produce high-quality outputs. Experiments show that our method outperforms prior state-of-the-art in reconstruction tasks both quantitatively and qualitatively. Unlike prior methods, Our method does not require any prior information such as the shape of dental arches, nor the matched PX-CBCT dataset for training, which is difficult to obtain in clinical practice.

層 · 變換 · Extensibility · MoDELS · HTTPS ·

2023 年 8 月 29 日

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Guillaume Le Moing,Jean Ponce,Cordelia Schmid

from arxiv, Accepted to ICCV 2023

This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. Individual images are decomposed into multiple layers combining object masks and a small set of control points. The layer structure is shared across all frames in each video to build dense inter-frame connections. Complex scene motions are modeled by combining parametric geometric transformations associated with individual layers, and video synthesis is broken down into discovering the layers associated with past frames, predicting the corresponding transformations for upcoming ones and warping the associated object regions accordingly, and filling in the remaining image parts. Extensive experiments on multiple benchmarks including urban videos (Cityscapes and KITTI) and videos featuring nonrigid motions (UCF-Sports and H3.6M), show that our method consistently outperforms the state of the art by a significant margin in every case. Code, pretrained models, and video samples synthesized by our approach can be found in the project webpage //16lemoing.github.io/waldo.

entity · 命名實體識別 · MoDELS · TOOLS · 得分 ·

2023 年 8 月 28 日

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

Abdelrahman "Boda" Sadallah,Omar Ahmed,Shimaa Mohamed,Omar Hatem,Doaa Hesham,Ahmed H. Yousef

One of the main tasks of Natural Language Processing (NLP), is Named Entity Recognition (NER). It is used in many applications and also can be used as an intermediate step for other tasks. We present ANER, a web-based named entity recognizer for the Arabic, and Arabizi languages. The model is built upon BERT, which is a transformer-based encoder. It can recognize 50 different entity classes, covering various fields. We trained our model on the WikiFANE\_Gold dataset which consists of Wikipedia articles. We achieved an F1 score of 88.7\%, which beats CAMeL Tools' F1 score of 83\% on the ANERcorp dataset, which has only 4 classes. We also got an F1 score of 77.7\% on the NewsFANE\_Gold dataset which contains out-of-domain data from News articles. The system is deployed on a user-friendly web interface that accepts users' inputs in Arabic, or Arabizi. It allows users to explore the entities in the text by highlighting them. It can also direct users to get information about entities through Wikipedia directly. We added the ability to do NER using our model, or CAMeL Tools' model through our website. ANER is publicly accessible at \url{//www.aner.online}. We also deployed our model on HuggingFace at //huggingface.co/boda/ANER, to allow developers to test and use it.

位置嵌入 · 變換 · Better · Vision · contrastive ·

2023 年 8 月 28 日

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Dahun Kim,Anelia Angelova,Weicheng Kuo

from arxiv, CVPR 2023 Highlight - //github.com/mcahny/rovit ; adds LAION-2B result

We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection. At the pretraining phase, we propose to randomly crop and resize regions of positional embeddings instead of using the whole image positional embeddings. This better matches the use of positional embeddings at region-level in the detection finetuning phase. In addition, we replace the common softmax cross entropy loss in contrastive learning with focal loss to better learn the informative yet difficult examples. Finally, we leverage recent advances in novel object proposals to improve open-vocabulary detection finetuning. We evaluate our full model on the LVIS and COCO open-vocabulary detection benchmarks and zero-shot transfer. RO-ViT achieves a state-of-the-art 34.1 $AP_r$ on LVIS, surpassing the best existing approach by +7.8 points in addition to competitive zero-shot transfer detection. Surprisingly, RO-ViT improves the image-level representation as well and achieves the state of the art on 9 out of 12 metrics on COCO and Flickr image-text retrieval benchmarks, outperforming competitive approaches with larger models.

Single-Shot · Branch · 目標檢測 · 推斷 · MS ·

2018 年 4 月 8 日

Single-Shot Object Detection with Enriched Semantics

Zhishuai Zhang,Siyuan Qiao,Cihang Xie,Wei Shen,Bo Wang,Alan L. Yuille

We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.