在线亚洲91SE亚洲综合在线,黄片小视频色多多

We introduce the idea of AquaFuse, a physics-based method for synthesizing waterbody properties in underwater imagery. We formulate a closed-form solution for waterbody fusion that facilitates realistic data augmentation and geometrically consistent underwater scene rendering. AquaFuse leverages the physical characteristics of light propagation underwater to synthesize the waterbody from one scene to the object contents of another. Unlike data-driven style transfer, AquaFuse preserves the depth consistency and object geometry in an input scene. We validate this unique feature by comprehensive experiments over diverse underwater scenes. We find that the AquaFused images preserve over 94% depth consistency and 90-95% structural similarity of the input scenes. We also demonstrate that it generates accurate 3D view synthesis by preserving object geometry while adapting to the inherent waterbody fusion process. AquaFuse opens up a new research direction in data augmentation by geometry-preserving style transfer for underwater imaging and robot vision applications.

相關內容

數據增強

關注 31

數據增強在機器學習領域多指采用一些方法（比如數據蒸餾，正負樣本均衡等）來提高模型數據集的質量，增強數據。

MoDELS · 可理解性 · Apollo · 多峰值 · 評論員 ·

2024 年 12 月 13 日

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar,Xiaohan Wang,Yann Dubois,Nikhil Mehta,Tong Xiao,Philippe Hansen-Estruch,Licheng Yu,Xiaofang Wang,Felix Juefei-Xu,Ning Zhang,Serena Yeung-Levy,Xide Xia

from arxiv, //apollo-lmms.github.io

Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made without proper justification or analysis. The high computational cost of training and evaluating such models, coupled with limited open research, hinders the development of video-LMMs. To address this, we present a comprehensive study that helps uncover what effectively drives video understanding in LMMs. We begin by critically examining the primary contributors to the high computational requirements associated with video-LMM research and discover Scaling Consistency, wherein design and training decisions made on smaller models and datasets (up to a critical size) effectively transfer to larger models. Leveraging these insights, we explored many video-specific aspects of video-LMMs, including video sampling, architectures, data composition, training schedules, and more. For example, we demonstrated that fps sampling during training is vastly preferable to uniform frame sampling and which vision encoders are the best for video representation. Guided by these findings, we introduce Apollo, a state-of-the-art family of LMMs that achieve superior performance across different model sizes. Our models can perceive hour-long videos efficiently, with Apollo-3B outperforming most existing $7$B models with an impressive 55.1 on LongVideoBench. Apollo-7B is state-of-the-art compared to 7B LMMs with a 70.9 on MLVU, and 63.3 on Video-MME.

Microsoft Surface · 全 · 控制器 · 多樣性 · 門控 ·

2024 年 12 月 13 日

Iterating the Transient Light Transport Matrix for Non-Line-of-Sight Imaging

Talha Sultan,Eric Brandt,Khadijeh Masumnia-Bisheh,Simone Riccardo,Pavel Polynkin,Alberto Tosi,Andreas Velten

Active imaging systems sample the Transient Light Transport Matrix (TLTM) for a scene by sequentially illuminating various positions in this scene using a controllable light source, and then measuring the resulting spatiotemporal light transport with time of flight (ToF) sensors. Time-resolved Non-line-of-sight (NLOS) imaging employs an active imaging system that measures part of the TLTM of an intermediary relay surface, and uses the indirect reflections of light encoded within this TLTM to "see around corners". Such imaging systems have applications in diverse areas such as disaster response, remote surveillance, and autonomous navigation. While existing NLOS imaging systems usually measure a subset of the full TLTM, development of customized gated Single Photon Avalanche Diode (SPAD) arrays \cite{riccardo_fast-gated_2022} has made it feasible to probe the full measurement space. In this work, we demonstrate that the full TLTM on the relay surface can be processed with efficient algorithms to computationally focus and detect our illumination in different parts of the hidden scene, turning the relay surface into a second-order active imaging system. These algorithms allow us to iterate on the measured, first-order TLTM, and extract a \textbf{second order TLTM for surfaces in the hidden scene}. We showcase three applications of TLTMs in NLOS imaging: (1) Scene Relighting with novel illumination, (2) Separation of direct and indirect components of light transport in the hidden scene, and (3) Dual Photography. Additionally, we empirically demonstrate that SPAD arrays enable parallel acquisition of photons, effectively mitigating long acquisition times.

分解的 · Automator · 可約的 · binary · 優化器 ·

2024 年 12 月 13 日

Strong Structural Bounds for MaxSAT: The Fine Details of Using Neuromorphic and Quantum Hardware Accelerators

Max Bannach,Jai Grover,Markus Hecher

Hardware accelerators like quantum annealers or neuromorphic chips are capable of finding the ground state of a Hamiltonian. A promising route in utilizing these devices is via methods from automated reasoning: The problem at hand is first encoded into MaxSAT; then MaxSAT is reduced to Max2SAT; and finally, Max2SAT is translated into a Hamiltonian. It was observed that different encodings can dramatically affect the efficiency of the hardware accelerators. Yet, previous studies were only concerned with the size of the encodings rather than with syntactic or structural properties. We establish structure-aware reductions between MaxSAT, Max2SAT, and the quadratic unconstrained binary optimization problem (QUBO) that underlies such hardware accelerators. All these problems turn out to be equivalent under linear-time, treewidth-preserving reductions. As a consequence, we obtain tight lower bounds under ETH and SETH for Max2SAT and QUBO, as well as a new time-optimal fixed-parameter algorithm for QUBO. While our results are tight up to a constant additive factor for the primal treewidth, we require a constant multiplicative factor for the incidence treewidth. To close the emerging gap, we supplement our results with novel time-optimal algorithms for fragments of MaxSAT based on model counting.

簇 · 解碼 · LDPC · Performer · ML ·

2024 年 12 月 11 日

Cluster Decomposition for Improved Erasure Decoding of Quantum LDPC Codes

Hanwen Yao,Mert G?kduman,Henry D. Pfister

from arxiv, 12 pages, 8 figures

We introduce a new erasure decoder that applies to arbitrary quantum LDPC codes. Dubbed the cluster decoder, it generalizes the decomposition idea of Vertical-Horizontal (VH) decoding introduced by Connelly et al. in 2022. Like the VH decoder, the idea is to first run the peeling decoder and then post-process the resulting stopping set. The cluster decoder breaks the stopping set into a tree of clusters which can be solved sequentially via Gaussian Elimination (GE). By allowing clusters of unconstrained size, this decoder achieves maximum-likelihood (ML) performance with reduced complexity compared with full GE. When GE is applied only to clusters whose sizes are less than a constant, the performance is degraded but the complexity becomes linear in the block length. Our simulation results show that, for hypergraph product codes, the cluster decoder with constant cluster size achieves near-ML performance similar to VH decoding in the low-erasure-rate regime. For the general quantum LDPC codes we studied, the cluster decoder can be used to estimate the ML performance curve with reduced complexity over a wide range of erasure rates.

MoDELS · 推斷 · 樣本 · IR · 查準率/準確率 ·

2024 年 12 月 11 日

DAmodel: Hierarchical Bayesian Modelling of DA White Dwarfs for Spectrophotometric Calibration

Benjamin M. Boyd,Gautham Narayan,Kaisey S. Mandel,Matthew Grayling,Aidan Berres,Mai Li,Aaron Do,Abhijit Saha,Tim Axelrod,Thomas Matheson,Edward W. Olszewski,Ralph C. Bohlin,Annalisa Calamida,Jay B. Holberg,Ivan Hubeny,John W. Mackenty,Armin Rest,Elena Sabbi,Christopher W. Stubbs

from arxiv, 32 pages, 24 figures, 5 tables, submitted to MNRAS

We use hierarchical Bayesian modelling to calibrate a network of 32 all-sky faint DA white dwarf (DA WD) spectrophotometric standards ($16.5 < V < 19.5$) alongside the three CALSPEC standards, from 912 \r{A} to 32 $\mu$m. The framework is the first of its kind to jointly infer photometric zeropoints and WD parameters ($\log g$, $T_{\text{eff}}$, $A_V$, $R_V$) by simultaneously modelling both photometric and spectroscopic data. We model panchromatic HST/WFC3 UVIS and IR fluxes, HST/STIS UV spectroscopy and ground-based optical spectroscopy to sub-percent precision. Photometric residuals for the sample are the lowest yet yielding $<0.004$ mag RMS on average from the UV to the NIR, achieved by jointly inferring time-dependent changes in system sensitivity and WFC3/IR count-rate nonlinearity. Our GPU-accelerated implementation enables efficient sampling via Hamiltonian Monte Carlo, critical for exploring the high-dimensional posterior space. The hierarchical nature of the model enables population analysis of intrinsic WD and dust parameters. Inferred SEDs from this model will be essential for calibrating the James Webb Space Telescope as well as next-generation surveys, including Vera Rubin Observatory's Legacy Survey of Space and Time, and the Nancy Grace Roman Space Telescope.

噪聲 · 穩健性 · GROUP · state-of-the-art · INFORMS ·

2024 年 12 月 11 日

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Kasra Arabi,Benjamin Feuer,R. Teal Witter,Chinmay Hegde,Niv Cohen

As the quality of image generators continues to improve, deepfakes become a topic of considerable societal debate. Image watermarking allows responsible model owners to detect and label their AI-generated content, which can mitigate the harm. Yet, current state-of-the-art methods in image watermarking remain vulnerable to forgery and removal attacks. This vulnerability occurs in part because watermarks distort the distribution of generated images, unintentionally revealing information about the watermarking techniques. In this work, we first demonstrate a distortion-free watermarking method for images, based on a diffusion model's initial noise. However, detecting the watermark requires comparing the initial noise reconstructed for an image to all previously used initial noises. To mitigate these issues, we propose a two-stage watermarking framework for efficient detection. During generation, we augment the initial noise with generated Fourier patterns to embed information about the group of initial noises we used. For detection, we (i) retrieve the relevant group of noises, and (ii) search within the given group for an initial noise that might match our image. This watermarking approach achieves state-of-the-art robustness to forgery and removal against a large battery of attacks.

跳躍連接 · Neural Networks · 優化器 · 線性的 · 圖 ·

2021 年 5 月 10 日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Keyulu Xu,Mozhi Zhang,Stefanie Jegelka,Kenji Kawaguchi

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

Performer · 預測器/決策函數 · 數據集 · Better · 估計/估計量 ·

2021 年 3 月 10 日

ReNAS:Relativistic Evaluation of Neural Architecture Search

Yixing Xu,Yunhe Wang,Kai Han,Yehui Tang,Shangling Jui,Chunjing Xu,Chang Xu

An effective and efficient architecture performance evaluation scheme is essential for the success of Neural Architecture Search (NAS). To save computational cost, most of existing NAS algorithms often train and evaluate intermediate neural architectures on a small proxy dataset with limited training epochs. But it is difficult to expect an accurate performance estimation of an architecture in such a coarse evaluation way. This paper advocates a new neural architecture evaluation scheme, which aims to determine which architecture would perform better instead of accurately predict the absolute architecture performance. Therefore, we propose a \textbf{relativistic} architecture performance predictor in NAS (ReNAS). We encode neural architectures into feature tensors, and further refining the representations with the predictor. The proposed relativistic performance predictor can be deployed in discrete searching methods to search for the desired architectures without additional evaluation. Experimental results on NAS-Bench-101 dataset suggests that, sampling 424 ($0.1\%$ of the entire search space) neural architectures and their corresponding validation performance is already enough for learning an accurate architecture performance predictor. The accuracies of our searched neural architectures on NAS-Bench-101 and NAS-Bench-201 datasets are higher than that of the state-of-the-art methods and show the priority of the proposed method.

圖 · 結構化學習 · 穩健性 · 學成 · GNN ·

2021 年 3 月 4 日

Deep Graph Structure Learning for Robust Representations: A Survey

Yanqiao Zhu,Weizhi Xu,Jinghao Zhang,Qiang Liu,Shu Wu,Liang Wang

from arxiv, 8 pages, in submission to IJCAI 2021 (Survey Track)

Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.

Extensibility · 點云 · 隨機采樣 · 樣本 · state-of-the-art ·

2019 年 11 月 25 日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Qingyong Hu,Bo Yang,Linhai Xie,Stefano Rosa,Yulan Guo,Zhihua Wang,Niki Trigoni,Andrew Markham

from arxiv, Code and data are available at: //github.com/QingyongHu/RandLA-Net

We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.