高清一区二区三区视频在线观看,亚洲欧美日韩中文字幕精品,在线观看视频亚洲,99国产精品久久一区

from arxiv, 6 pages, 1 page for references, 6 figures, 1 table, IEEEtran format This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

We propose a standalone monocular visual Simultaneous Localization and Mapping (vSLAM) initialization pipeline for autonomous robots in space. Our method, a state-of-the-art factor graph optimization pipeline, enhances classical Structure from Small Motion (SfSM) to robustly initialize a monocular agent in weak-perspective projection scenes. Furthermore, it overcomes visual estimation challenges introduced by spacecraft inspection trajectories, such as: center-pointing motion, which exacerbates the bas-relief ambiguity, and the presence of a dominant plane in the scene, which causes motion estimation degeneracies in classical Structure from Motion (SfM). We validate our method on realistic, simulated satellite inspection images exhibiting weak-perspective projection, and we demonstrate its effectiveness and improved performance compared to other monocular initialization procedures.

相關內容

Agent

關注 15

優化器 · 有偏 · MoDELS · 線性的 · 線性模型 ·

2024 年 10 月 31 日

Implicit Optimization Bias of Next-Token Prediction in Linear Models

Christos Thrampoulidis

from arxiv, v2: fixed typos and writing in various parts; updated figures and future-work section

We initiate an investigation into the optimization properties of next-token prediction (NTP), the dominant training paradigm for modern language models. Specifically, we study the structural properties of the solutions selected by gradient-based optimizers among the many possible minimizers of the NTP objective. By framing NTP as cross-entropy minimization across distinct contexts, each tied with a sparse conditional probability distribution across a finite vocabulary of tokens, we introduce "NTP-separability conditions" that enable reaching the data-entropy lower bound. With this setup, and focusing on linear models with fixed context embeddings, we characterize the optimization bias of gradient descent (GD): Within the data subspace defined by the sparsity patterns of distinct contexts, GD selects parameters that equate the logits' differences of in-support tokens to their log-odds. In the orthogonal subspace, the GD parameters diverge in norm and select the direction that maximizes a margin specific to NTP. These findings extend previous research on implicit bias in one-hot classification to the NTP setting, highlighting key differences and prompting further research into the optimization and generalization properties of NTP, irrespective of the specific architecture used to generate the context embeddings.

解碼 · 集成 · LDPC · 信念傳播 · Performer ·

2024 年 10 月 31 日

A Comparative Study of Ensemble Decoding Methods for Short Length LDPC Codes

Felix Krieg,Jannis Clausius,Marvin Geiselhart,Stephan ten Brink

from arxiv, 6 pages, 7 figures, submitted to IEEE for possible publication

To alleviate the suboptimal performance of belief propagation (BP) decoding of short low-density parity-check (LDPC) codes, a plethora of improved decoding algorithms has been proposed over the last two decades. Many of these methods can be described using the same general framework, which we call ensemble decoding: A set of independent constituent decoders works in parallel on the received sequence, each proposing a codeword candidate. From this list, the maximum likelihood (ML) decision is designated as the decoder output. In this paper, we qualitatively and quantitatively compare different realizations of the ensemble decoder, namely multiple-bases belief propagation (MBBP), automorphism ensemble decoding (AED), scheduling ensemble decoding (SED), noise-aided ensemble decoding (NED) and saturated belief propagation (SBP). While all algorithms can provide gains over traditional BP decoding, ensemble methods that exploit the code structure, such as MBBP and AED, typically show greater performance improvements.

估計/估計量 · Learning · 線性的 · 多樣性 · 機器人 ·

2024 年 10 月 30 日

Learning for Deformable Linear Object Insertion Leveraging Flexibility Estimation from Visual Cues

Mingen Li,Changhyun Choi

from arxiv, 7 pages, 9 figures, 3 tables. 2024 IEEE International Conference on Robotics and Automation (ICRA)

Manipulation of deformable Linear objects (DLOs), including iron wire, rubber, silk, and nylon rope, is ubiquitous in daily life. These objects exhibit diverse physical properties, such as Young$'$s modulus and bending stiffness.Such diversity poses challenges for developing generalized manipulation policies. However, previous research limited their scope to single-material DLOs and engaged in time-consuming data collection for the state estimation. In this paper, we propose a two-stage manipulation approach consisting of a material property (e.g., flexibility) estimation and policy learning for DLO insertion with reinforcement learning. Firstly, we design a flexibility estimation scheme that characterizes the properties of different types of DLOs. The ground truth flexibility data is collected in simulation to train our flexibility estimation module. During the manipulation, the robot interacts with the DLOs to estimate flexibility by analyzing their visual configurations. Secondly, we train a policy conditioned on the estimated flexibility to perform challenging DLO insertion tasks. Our pipeline trained with diverse insertion scenarios achieves an 85.6% success rate in simulation and 66.67% in real robot experiments. Please refer to our project page: //lmeee.github.io/DLOInsert/

Better · 劃分 · Performer · GPUs · Processing（編程語言） ·

2024 年 10 月 30 日

An Evaluation of Massively Parallel Algorithms for DFA Minimization

Jan Martens,Anton Wijs

from arxiv, In Proceedings GandALF 2024, arXiv:2410.21884

We study parallel algorithms for the minimization of Deterministic Finite Automata (DFAs). In particular, we implement four different massively parallel algorithms on Graphics Processing Units (GPUs). Our results confirm the expectations that the algorithm with the theoretically best time complexity is not practically suitable to run on GPUs due to the large amount of resources needed. We empirically verify that parallel partition refinement algorithms from the literature perform better in practice, even though their time complexity is worse. Lastly, we introduce a novel algorithm based on partition refinement with an extra parallel partial transitive closure step and show that on specific benchmarks it has better run-time complexity and performs better in practice.

多峰值 · 語言模型化 · 數據集 · MoDELS · WEB ·

2024 年 10 月 30 日

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Keito Sasagawa,Koki Maeda,Issa Sugiura,Shuhei Kurita,Naoaki Okazaki,Daisuke Kawahara

from arxiv, 15 pages, 7 figures

To develop high-performing Visual Language Models (VLMs), it is essential to prepare multimodal resources, such as image-text pairs, interleaved data, and instruction data. While multimodal resources for English are abundant, there is a significant lack of corresponding resources for non-English languages, such as Japanese. To address this problem, we take Japanese as a non-English language and propose a method for rapidly creating Japanese multimodal datasets from scratch. We collect Japanese image-text pairs and interleaved data from web archives and generate Japanese instruction data directly from images using an existing VLM. Our experimental results show that a VLM trained on these native datasets outperforms those relying on machine-translated content.

APT · 級聯 · 可辨認的 · 規范化的 · Pattern Recognition ·

2024 年 10 月 29 日

A Cascade Approach for APT Campaign Attribution in System Event Logs: Technique Hunting and Subgraph Matching

Yi-Ting Huang,Ying-Ren Guo,Guo-Wei Wong,Meng Chang Chen

As Advanced Persistent Threats (APTs) grow increasingly sophisticated, the demand for effective detection methods has intensified. This study addresses the challenge of identifying APT campaign attacks through system event logs. A cascading approach, name SFM, combines Technique hunting and APT campaign attribution. Our approach assumes that real-world system event logs contain a vast majority of normal events interspersed with few suspiciously malicious ones and that these logs are annotated with Techniques of MITRE ATT&CK framework for attack pattern recognition. Then, we attribute APT campaign attacks by aligning detected Techniques with known attack sequences to determine the most likely APT campaign. Evaluations on five real-world APT campaigns indicate that the proposed approach demonstrates reliable performance.

設計 · TOOLS · CASE · Agent · INTERACT ·

2024 年 10 月 29 日

Enabling Generative Design Tools with LLM Agents for Mechanical Computation Devices: A Case Study

Qiuyu Lu,Jiawei Fang,Zhihao Yao,Yue Yang,Shiqing Lyu,Haipeng Mi,Lining Yao

from arxiv, 38 pages, 12 figures

In the field of Human-Computer Interaction (HCI), interactive devices with embedded mechanical computation are gaining attention. The rise of these cutting-edge devices has created a need for specialized design tools that democratize the prototyping process. While current tools streamline prototyping through parametric design and simulation, they often come with a steep learning curve and may not fully support creative ideation. In this study, we use fluidic computation interfaces as a case study to explore how design tools for such devices can be augmented by Large Language Model agents (LLMs). Integrated with LLMs, the Generative Design Tool (GDT) better understands the capabilities and limitations of new technologies, proposes diverse and practical applications, and suggests designs that are technically and contextually appropriate. Additionally, it generates design parameters for visualizing results and producing fabrication-ready support files. This paper details the GDT's framework, implementation, and performance while addressing its potential and challenges.

MoDELS · Performer · Processing（編程語言） · 學成 · 穩健性 ·

2021 年 9 月 3 日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Paul Michel

from arxiv, PhD thesis

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

Networking · 數據集 · 遷移學習 · 學成 · 可約的 ·

2018 年 5 月 10 日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Assia Benbihi,Matthieu Geist,Cédric Pradalier

Deep Convolutional Neural Networks have pushed the state-of-the art for semantic segmentation provided that a large amount of images together with pixel-wise annotations is available. Data collection is expensive and a solution to alleviate it is to use transfer learning. This reduces the amount of annotated data required for the network training but it does not get rid of this heavy processing step. We propose a method of transfer learning without annotations on the target task for datasets with redundant content and distinct pixel distributions. Our method takes advantage of the approximate content alignment of the images between two datasets when the approximation error prevents the reuse of annotation from one dataset to another. Given the annotations for only one dataset, we train a first network in a supervised manner. This network autonomously learns to generate deep data representations relevant to the semantic segmentation. Then the images in the new dataset, we train a new network to generate a deep data representation that matches the one from the first network on the previous dataset. The training consists in a regression between feature maps and does not require any annotations on the new dataset. We show that this method reaches performances similar to a classic transfer learning on the PASCAL VOC dataset with synthetic transformations.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.