日韩在线精品小视频-中文熟妇亚洲视频观看

from arxiv, 14 pages, 8 figures, submitted to the International Conference Data Analysis, Optimization and Their Applications on the Occasion of Boris Mirkin's 80th Birthday January 30-31, 2023, Dolgoprudny, Moscow Region, Moscow Institute of Physics and Technology //mipt.ru/education/chairs/dm/conferences/data-analysis-optimization-and-their-applications-2023.php

We introduce a deterministic approach to edge detection and image segmentation by formulating pseudo-Boolean polynomials on image patches. The approach works by applying a binary classification of blob and edge regions in an image based on the degrees of pseudo-Boolean polynomials calculated on patches extracted from the provided image. We test our method on simple images containing primitive shapes of constant and contrasting colour and establish the feasibility before applying it to complex instances like aerial landscape images. The proposed method is based on the exploitation of the reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.

相關內容

邊

關注 1

約束 · Guidance · CASES · Performer · 正則化項 ·

2023 年 10 月 16 日

Enhanced Edge-Perceptual Guided Image Filtering

Jinyu Li

Due to the powerful edge-preserving ability and low computational complexity, Guided image filter (GIF) and its improved versions has been widely applied in computer vision and image processing. However, all of them are suffered halo artifacts to some degree, as the regularization parameter increase. In the case of inconsistent structure of guidance image and input image, edge-preserving ability degradation will also happen. In this paper, a novel guided image filter is proposed by integrating an explicit first-order edge-protect constraint and an explicit residual constraint which will improve the edge-preserving ability in both cases. To illustrate the efficiency of the proposed filter, the performances are shown in some typical applications, which are single image detail enhancement, multi-scale exposure fusion, hyper spectral images classification. Both theoretical analysis and experimental results prove that the powerful edge-preserving ability of the proposed filter.

state-of-the-art · HTTPS · 正則化項 · Networking · 優化器 ·

2023 年 10 月 16 日

Multi-Body Neural Scene Flow

Kavisha Vidanapathirana,Shin-Fang Chng,Xueqian Li,Simon Lucey

from arxiv, Accepted for 3DV'2024

The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the $SE(3)$ parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: \href{//github.com/kavisha725/MBNSF}{//github.com/kavisha725/MBNSF}.

多峰值 · MoDELS · Extensibility · INTERACT · HTTPS ·

2023 年 10 月 13 日

Making Multimodal Generation Easier: When Diffusion Models Meet LLMs

Xiangyu Zhao,Bo Liu,Qijiong Liu,Guangyuan Shi,Xiao-Ming Wu

We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge the gap between modalities, EasyGen is built upon a bidirectional conditional diffusion model named BiDiffuser, which promotes more efficient interactions between modalities. EasyGen handles image-to-text generation by integrating BiDiffuser and an LLM via a simple projection layer. Unlike most existing multimodal models that are limited to generating text responses, EasyGen can also facilitate text-to-image generation by leveraging the LLM to create textual descriptions, which can be interpreted by BiDiffuser to generate appropriate visual responses. Extensive quantitative and qualitative experiments demonstrate the effectiveness of EasyGen, whose training can be easily achieved in a lab setting. The source code is available at //github.com/zxy556677/EasyGen.

可約的 · SLAM · Agent · 估計/估計量 · 3D ·

2023 年 10 月 13 日

3D Active Metric-Semantic SLAM

Yuezhan Tao,Xu Liu,Igor Spasojevic,Saurav Agarwal,Vijay Kumar

from arxiv, Submitted to RA-L for review

In this letter, we address the problem of exploration and metric-semantic mapping of multi-floor GPS-denied indoor environments using Size Weight and Power (SWaP) constrained aerial robots. Most previous work in exploration assumes that robot localization is solved. However, neglecting the state uncertainty of the agent can ultimately lead to cascading errors both in the resulting map and in the state of the agent itself. Furthermore, actions that reduce localization errors may be at direct odds with the exploration task. We propose a framework that balances the efficiency of exploration with actions that reduce the state uncertainty of the agent. In particular, our algorithmic approach for active metric-semantic SLAM is built upon sparse information abstracted from raw problem data, to make it suitable for SWaP-constrained robots. Furthermore, we integrate this framework within a fully autonomous aerial robotic system that achieves autonomous exploration in cluttered, 3D environments. From extensive real-world experiments, we showed that by including Semantic Loop Closure (SLC), we can reduce the robot pose estimation errors by over 90% in translation and approximately 75% in yaw, and the uncertainties in pose estimates and semantic maps by over 70% and 65%, respectively. Although discussed in the context of indoor multi-floor exploration, our system can be used for various other applications, such as infrastructure inspection and precision agriculture where reliable GPS data may not be available.

塑造 · MoDELS · state-of-the-art · HTTPS · 模型復雜度 ·

2023 年 10 月 12 日

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu,Taoran Yi,Jiemin Fang,Lingxi Xie,Xiaopeng Zhang,Wei Wei,Wenyu Liu,Qi Tian,Xinggang Wang

from arxiv, Work in progress. Project page: //guanjunwu.github.io/4dgs/

Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800$\times$800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art methods. More demos and code are available at //guanjunwu.github.io/4dgs/.

神經元 · Networking · Neural Networks · 邊緣化 · 閾值 ·

2023 年 10 月 11 日

An On-Chip Trainable Neuron Circuit for SFQ-Based Spiking Neural Networks

Beyza Zeynep Ucpinar,Mustafa Altay Karamuftuoglu,Sasan Razmkhah,Massoud Pedram

from arxiv, 5 pages, 8 figures. The work was presented in EUCAS 2023

We present an on-chip trainable neuron circuit. Our proposed circuit suits bio-inspired spike-based time-dependent data computation for training spiking neural networks (SNN). The thresholds of neurons can be increased or decreased depending on the desired application-specific spike generation rate. This mechanism provides us with a flexible design and scalable circuit structure. We demonstrate the trainable neuron structure under different operating scenarios. The circuits are designed and optimized for the MIT LL SFQ5ee fabrication process. Margin values for all parameters are above 25\% with a 3GHz throughput for a 16-input neuron.

無監督 · 表示學習 · 學成 · CASES · state-of-the-art ·

2021 年 4 月 29 日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Christoph Feichtenhofer,Haoqi Fan,Bo Xiong,Ross Girshick,Kaiming He

from arxiv, CVPR 2021

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast

圖形處理器 · 圖 · Neural Networks · Networking · Performer ·

2021 年 2 月 13 日

How Framelets Enhance Graph Neural Networks

Xuebin Zheng,Bingxin Zhou,Junbin Gao,Yu Guang Wang,Pietro Lio,Ming Li,Guido Montufar

from arxiv, 24 pages, 17 figures, 6 tables

This paper presents a new approach for assembling graph neural networks based on framelet transforms. The latter provides a multi-scale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines a framelet-based graph convolution. The framelet decomposition naturally induces a graph pooling strategy by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. The graph neural networks with the proposed framelet convolution and pooling achieve state-of-the-art performance in many types of node and graph prediction tasks. Moreover, we propose shrinkage as a new activation for the framelet convolution, which thresholds the high-frequency information at different scales. Compared to ReLU, shrinkage in framelet convolution improves the graph neural network model in terms of denoising and signal compression: noises in both node and structure can be significantly reduced by accurately cutting off the high-pass coefficients from framelet decomposition, and the signal can be compressed to less than half its original size with the prediction performance well preserved.

圖片分類 · 小樣本學習 · 學成 · Extensibility · HTTPS ·

2019 年 11 月 14 日

Self-Supervised Learning For Few-Shot Image Classification

Da Chen,Yuefeng Chen,Yuhong Li,Feng Mao,Yuan He,Hui Xue

from arxiv, Submitted to ICASSP 2020, Implementation //github.com/phecy/ssl-few-shot

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[//github.com/phecy/SSL-FEW-SHOT.]{//github.com/phecy/ssl-few-shot.}

馬爾可夫隨機場 · 圖 · 圖像分割 · SSL · SC ·

2018 年 5 月 21 日

Consensus Based Medical Image Segmentation Using Semi-Supervised Learning And Graph Cuts

Dwarikanath Mahapatra

Medical image segmentation requires consensus ground truth segmentations to be derived from multiple expert annotations. A novel approach is proposed that obtains consensus segmentations from experts using graph cuts (GC) and semi supervised learning (SSL). Popular approaches use iterative Expectation Maximization (EM) to estimate the final annotation and quantify annotator's performance. Such techniques pose the risk of getting trapped in local minima. We propose a self consistency (SC) score to quantify annotator consistency using low level image features. SSL is used to predict missing annotations by considering global features and local image consistency. The SC score also serves as the penalty cost in a second order Markov random field (MRF) cost function optimized using graph cuts to derive the final consensus label. Graph cut obtains a global maximum without an iterative procedure. Experimental results on synthetic images, real data of Crohn's disease patients and retinal images show our final segmentation to be accurate and more consistent than competing methods.