91精品综合久久久久久五月天,亚洲AV永久无码精品九之

The Colored Bin Packing Problem (CBPP) is a generalization of the Bin Packing Problem (BPP). The CBPP consists of packing a set of items, each with a weight and a color, in bins of limited capacity, minimizing the number of used bins and satisfying the constraint that two items of the same color cannot be packed side by side in the same bin. In this article, we proposed an adaptation of BPP heuristics and new heuristics for the CBPP. Moreover, we propose a set of fast neighborhood search algorithms for CBPP. These neighborhoods are applied in a meta-heuristic approach based on the Variable Neighborhood Search (VNS) and a matheuristic approach that combines linear programming with the meta-heuristics VNS and Greedy Randomized Adaptive Search (GRASP). The results indicate that our matheuristic is superior to VNS and that both approaches can find near-optimal solutions for a large number of instances, even for those with many items.

相關內容

Packing

關注 0

語言模型化 · 大語言模型 · MoDELS · 可理解性 · 傳感器 ·

2024 年 8 月 19 日

Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving

Mehdi Azarafza,Mojtaba Nayyeri,Charles Steinmetz,Steffen Staab,Achim Rettberg

from arxiv, 12 pages, 5 figures

Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks. However, their ability to generalize this advanced reasoning with a combination of natural language text for decision-making in dynamic situations requires further exploration. In this study, we investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios. We hypothesize that LLMs hybrid reasoning abilities can improve autonomous driving by enabling them to analyze detected object and sensor data, understand driving regulations and physical laws, and offer additional context. This addresses complex scenarios, like decisions in low visibility (due to weather conditions), where traditional methods might fall short. We evaluated Large Language Models (LLMs) based on accuracy by comparing their answers with human-generated ground truth inside CARLA. The results showed that when a combination of images (detected objects) and sensor data is fed into the LLM, it can offer precise information for brake and throttle control in autonomous vehicles across various weather conditions. This formulation and answers can assist in decision-making for auto-pilot systems.

稀疏 · Performer · 估計/估計量 · INFORMS · Pair ·

2024 年 8 月 19 日

Sparse Global Matching for Video Frame Interpolation with Large Motion

Chunxu Liu,Guozhen Zhang,Rui Zhao,Limin Wang

from arxiv, Accepted by CVPR 2024. Project page: //sgm-vfi.github.io/

Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion. Specifically, we first estimate a pair of initial intermediate flows using a high-resolution feature map for extracting local details. Then, we incorporate a sparse global matching branch to compensate for flow estimation, which consists of identifying flaws in initial flows and generating sparse flow compensation with a global receptive field. Finally, we adaptively merge the initial flow estimation with global flow compensation, yielding a more accurate intermediate flow. To evaluate the effectiveness of our method in handling large motion, we carefully curate a more challenging subset from commonly used benchmarks. Our method demonstrates the state-of-the-art performance on these VFI subsets with large motion.

Pyramid · MoDELS · binary · INFORMS · Bag of words ·

2024 年 8 月 17 日

A Study of PHOC Spatial Region Configurations for Math Formula Retrieval

Matt Langsenkamp,Bryan Amador,Richard Zanibbi

A Pyramidal Histogram Of Characters (PHOC) represents the spatial location of symbols as binary vectors. The vectors are composed of levels that split a formula into equal-sized regions of one or more types (e.g., rectangles or ellipses). For each region type, this produces a pyramid of overlapping regions, where the first level contains the entire formula, and the final level the finest-grained regions. In this work, we introduce concentric rectangles for regions, and analyze whether subsequent PHOC levels encode redundant information by omitting levels from PHOC configurations. As a baseline, we include a bag of words PHOC containing only the first whole-formula level. Finally, using the ARQMath-3 formula retrieval benchmark, we demonstrate that some levels encoded in the original PHOC configurations are redundant, that PHOC models with rectangular regions outperform earlier PHOC models, and that despite their simplicity, PHOC models are surprisingly competitive with the state-of-the-art. PHOC is not math-specific, and might be used for chemical diagrams, charts, or other graphics.

Pair · MoDELS · 可辨認的 · CASES · INFORMS ·

2024 年 8 月 14 日

Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction

Yutong Hu,Quzhe Huang,Yansong Feng

Event Temporal Relation Extraction (ETRE) aims to identify the temporal relationship between two events, which plays an important role in natural language understanding. Most previous works follow a single-label classification style, classifying an event pair into either a specific temporal relation (e.g., \textit{Before}, \textit{After}), or a special label \textit{Vague} when there may be multiple possible temporal relations between the pair. In our work, instead of directly making predictions on \textit{Vague}, we propose a multi-label classification solution for ETRE (METRE) to infer the possibility of each temporal relation independently, where we treat \textit{Vague} as the cases when there is more than one possible relation between two events. We design a speculation mechanism to explore the possible relations hidden behind \textit{Vague}, which enables the latent information to be used efficiently. Experiments on TB-Dense, MATRES and UDS-T show that our method can effectively utilize the \textit{Vague} instances to improve the recognition for specific temporal relations and outperforms most state-of-the-art methods.

Projection · Notability · 可辨認的 · 統計量 · Weight ·

2024 年 8 月 13 日

The Complexities of Differential Privacy for Survey Data

J?rg Drechsler,James Bailie

from arxiv, 18 pages, 2 figures

The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project's key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies.

卷積 · Networking · 膨脹卷積 · Neural Networks · Learning ·

2024 年 8 月 10 日

Dilated Convolution with Learnable Spacings

Ismail Khalfaoui-Hassani

from arxiv, PhD Thesis

This thesis presents and evaluates the Dilated Convolution with Learnable Spacings (DCLS) method. Through various supervised learning experiments in the fields of computer vision, audio, and speech processing, the DCLS method proves to outperform both standard and advanced convolution techniques. The research is organized into several steps, starting with an analysis of the literature and existing convolution techniques that preceded the development of the DCLS method. We were particularly interested in the methods that are closely related to our own and that remain essential to capture the nuances and uniqueness of our approach. The cornerstone of our study is the introduction and application of the DCLS method to convolutional neural networks (CNNs), as well as to hybrid architectures that rely on both convolutional and visual attention approaches. DCLS is shown to be particularly effective in tasks such as classification, semantic segmentation, and object detection. Initially using bilinear interpolation, the study also explores other interpolation methods, finding that Gaussian interpolation slightly improves performance. The DCLS method is further applied to spiking neural networks (SNNs) to enable synaptic delay learning within a neural network that could eventually be transferred to so-called neuromorphic chips. The results show that the DCLS method stands out as a new state-of-the-art technique in SNN audio classification for certain benchmark tasks in this field. These tasks involve datasets with a high temporal component. In addition, we show that DCLS can significantly improve the accuracy of artificial neural networks for the multi-label audio classification task. We conclude with a discussion of the chosen experimental setup, its limitations, the limitations of our method, and our results.

論文 · Analysis · 講稿 · 數據可用性 · 基 ·

2024 年 8 月 7 日

The State of Reproducibility Stamps for Visualization Research Papers

Tobias Isenberg

from arxiv, 9 pages plus appendix; 12 figures plus 14 figures in the appendix

I analyze the evolution of papers certified by the Graphics Replicability Stamp Initiative (GRSI) to be reproducible, with a specific focus on the subset of publications that address visualization-related topics. With this analysis I show that, while the number of papers is increasing overall and within the visualization field, we still have to improve quite a bit to escape the replication crisis. I base my analysis on the data published by the GRSI as well as publication data for the different venues in visualization and lists of journal papers that have been presented at visualization-focused conferences. I also analyze the differences between the involved journals as well as the percentage of reproducible papers in the different presentation venues. Furthermore, I look at the authors of the publications and, in particular, their affiliation countries to see where most reproducible papers come from. Finally, I discuss potential reasons for the low reproducibility numbers and suggest possible ways to overcome these obstacles. This paper is reproducible itself, with source code and data available from github.com/tobiasisenberg/Visualization-Reproducibility as well as a free paper copy and all supplemental materials at osf.io/mvnbj.

可約的 · 3D · Storage · Performance · FAST ·

2024 年 8 月 7 日

Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields

Joo Chan Lee,Daniel Rho,Xiangyu Sun,Jong Hwan Ko,Eunbyung Park

from arxiv, Project page: //maincold2.github.io/c3dgs/

3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussian-based representation and introduces an approximated volumetric rendering, achieving very fast rendering speed and promising image quality. Furthermore, subsequent studies have successfully extended 3DGS to dynamic 3D scenes, demonstrating its wide range of applications. However, a significant drawback arises as 3DGS and its following methods entail a substantial number of Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric and temporal attributes by residual vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed compared to 3DGS for static scenes, while maintaining the quality of the scene representation. For dynamic scenes, our approach achieves more than 12x storage efficiency and retains a high-quality reconstruction compared to the existing state-of-the-art methods. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at //maincold2.github.io/c3dgs/.

詞元分析器 · MoDELS · 優化器 · 推斷 · Performer ·

2024 年 8 月 5 日

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Longrong Yang,Dong Shen,Chaoxiang Cai,Fan Yang,Size Li,Di Zhang,Xi Li

The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. However, the predictions are based solely on sample features and do not truly reveal the optimization directions of tokens. This may lead to severe optimization interference between different tokens assigned to an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC). Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at //github.com/longrongyang/STGC.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.