亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider the problem of evaluating arbitrary multivariate polynomials over a massive dataset containing multiple inputs, on a distributed computing system with a master node and multiple worker nodes. Generalized Lagrange Coded Computing (GLCC) codes are proposed to simultaneously provide resiliency against stragglers who do not return computation results in time, security against adversarial workers who deliberately modify results for their benefit, and information-theoretic privacy of the dataset amidst possible collusion of workers. GLCC codes are constructed by first partitioning the dataset into multiple groups, then encoding the dataset using carefully designed interpolating polynomials, and sharing multiple encoded data points to each worker, such that interference computation results across groups can be eliminated at the master. Particularly, GLCC codes include the state-of-the-art Lagrange Coded Computing (LCC) codes as a special case, and exhibit a more flexible tradeoff between communication and computation overheads in optimizing system efficiency. Furthermore, we apply GLCC to distributed training of machine learning models, and demonstrate that GLCC codes achieve a speedup of up to $2.5\text{--}3.9\times$ over LCC codes in training time, across experiments for training image classifiers on different datasets, model architectures, and straggler patterns.

相關內容

數據集,又稱為資料集、數據集合或資料集合,是一種由數據所組成的集合。
 Data set(或dataset)是一個數據的集合,通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量,如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數,該數據集的數據可能包括一個或多個成員。

As data volumes expand rapidly, distributed machine learning has become essential for addressing the growing computational demands of modern AI systems. However, training models in distributed environments is challenging with participants hold skew, Non-Independent-Identically distributed (Non-IID) data. Low-Rank Adaptation (LoRA) offers a promising solution to this problem by personalizing low-rank updates rather than optimizing the entire model, LoRA-enabled distributed learning minimizes computational and maximize personalization for each participant. Enabling more robust and efficient training in distributed learning settings, especially in large-scale, heterogeneous systems. Despite the strengths of current state-of-the-art methods, they often require manual configuration of the initial rank, which is increasingly impractical as the number of participants grows. This manual tuning is not only time-consuming but also prone to suboptimal configurations. To address this limitation, we propose AutoRank, an adaptive rank-setting algorithm inspired by the bias-variance trade-off. AutoRank leverages the MCDA method TOPSIS to dynamically assign local ranks based on the complexity of each participant's data. By evaluating data distribution and complexity through our proposed data complexity metrics, AutoRank provides fine-grained adjustments to the rank of each participant's local LoRA model. This adaptive approach effectively mitigates the challenges of double-imbalanced, non-IID data. Experimental results demonstrate that AutoRank significantly reduces computational overhead, enhances model performance, and accelerates convergence in highly heterogeneous federated learning environments. Through its strong adaptability, AutoRank offers a scalable and flexible solution for distributed machine learning.

The acquisition of substantial volumes of 3D articulated object data is expensive and time-consuming, and consequently the scarcity of 3D articulated object data becomes an obstacle for deep learning methods to achieve remarkable performance in various articulated object understanding tasks. Meanwhile, pairing these object data with detailed annotations to enable training for various tasks is also difficult and labor-intensive to achieve. In order to expeditiously gather a significant number of 3D articulated objects with comprehensive and detailed annotations for training, we propose Articulated Object Procedural Generation toolbox, a.k.a. Arti-PG toolbox. Arti-PG toolbox consists of i) descriptions of articulated objects by means of a generalized structure program along with their analytic correspondence to the objects' point cloud, ii) procedural rules about manipulations on the structure program to synthesize large-scale and diverse new articulated objects, and iii) mathematical descriptions of knowledge (e.g. affordance, semantics, etc.) to provide annotations to the synthesized object. Arti-PG has two appealing properties for providing training data for articulated object understanding tasks: i) objects are created with unlimited variations in shape through program-oriented structure manipulation, ii) Arti-PG is widely applicable to diverse tasks by easily providing comprehensive and detailed annotations. Arti-PG now supports the procedural generation of 26 categories of articulate objects and provides annotations across a wide range of both vision and manipulation tasks, and we provide exhaustive experiments which fully demonstrate its advantages. We will make Arti-PG toolbox publicly available for the community to use.

Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. To further enhance the efficiency of attention computation compared to SageAttention while maintaining precision, we propose SageAttention2, which utilizes significantly faster 4-bit matrix multiplication (Matmul) alongside additional precision-enhancing techniques. First, we propose to quantize matrixes $(Q, K)$ to INT4 in a hardware-friendly thread-level granularity and quantize matrixes $(\widetilde P, V)$ to FP8. Second, we propose a method to smooth $Q$, enhancing the accuracy of INT4 $QK$. Third, we propose to use an FP32 Matmul buffer for $PV$ to enhance the accuracy of FP8 $\widetilde PV$. The operations per second (OPS) of SageAttention2 surpass FlashAttention2 and xformers by about 3x and 5x on RTX4090, respectively. Comprehensive experiments confirm that our approach incurs negligible end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation. The codes are available at //github.com/thu-ml/SageAttention.

Although multiview fusion has demonstrated potential in LiDAR segmentation, its dependence on computationally intensive point-based interactions, arising from the lack of fixed correspondences between views such as range view and Bird's-Eye View (BEV), hinders its practical deployment. This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance. We demonstrate that significant gains can be realized by directly fusing Polar and Cartesian partitioning strategies within the BEV space. Our proposed BEV-only segmentation model leverages the inherent fixed grid correspondences between these partitioning schemes, enabling a fusion process that is orders of magnitude faster (170$\times$ speedup) than conventional point-based methods. Furthermore, our approach facilitates dense feature fusion, preserving richer contextual information compared to sparse point-based alternatives. To enhance scene understanding while maintaining inference efficiency, we also introduce a hybrid Transformer-CNN architecture. Extensive evaluation on the SemanticKITTI and nuScenes datasets provides compelling evidence that our method outperforms previous multiview fusion approaches in terms of both performance and inference speed, highlighting the potential of BEV-based fusion for LiDAR segmentation. Code is available at \url{//github.com/skyshoumeng/PC-BEV.}

In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.

Synthesizing diverse dexterous grasps from uncertain partial observation is an important yet challenging task for physically intelligent embodiments. Previous works on generative grasp synthesis fell short of precisely capturing the complex grasp distribution and reasoning about shape uncertainty in the unstructured and often partially perceived reality. In this work, we introduce a novel model that can generate diverse grasps for a multi-fingered hand while introspectively handling perceptual uncertainty and recognizing unknown object geometry to avoid performance degradation. Specifically, we devise a Deep Latent Variable Model (DLVM) based on Normalizing Flows (NFs), facilitating hierarchical and expressive latent representation for modeling versatile grasps. Our model design counteracts typical pitfalls of its popular alternative in generative grasping, i.e., conditional Variational Autoencoders (cVAEs) whose performance is limited by mode collapse and miss-specified prior issues. Moreover, the resultant feature hierarchy and the exact flow likelihood computation endow our model with shape-aware introspective capabilities, enabling it to quantify the shape uncertainty of partial point clouds and detect objects of novel geometry. We further achieve performance gain by fusing this information with a discriminative grasp evaluator, facilitating a novel hybrid way for grasp evaluation. Comprehensive simulated and real-world experiments show that the proposed idea gains superior performance and higher run-time efficiency against strong baselines, including diffusion models. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.

Multimodal multihop question answering is a complex task that requires reasoning over multiple sources of information, such as images and text, to answer questions. While there has been significant progress in visual question answering, the multihop setting remains unexplored due to the lack of high-quality datasets. Current methods focus on single-hop question answering or a single modality, which makes them unsuitable for real-world scenarios such as analyzing multimodal educational materials, summarizing lengthy academic articles, or interpreting scientific studies that combine charts, images, and text. To address this gap, we propose a novel methodology, introducing the first framework for creating a high-quality dataset that enables training models for multimodal multihop question answering. Our approach consists of a 5-stage pipeline that involves acquiring relevant multimodal documents from Wikipedia, synthetically generating high-level questions and answers, and validating them through rigorous criteria to ensure quality data. We evaluate our methodology by training models on our synthesized dataset and testing on two benchmarks, our results demonstrate that, with an equal sample size, models trained on our synthesized data outperform those trained on human-collected data by 1.9 in exact match (EM) on average. We believe our data synthesis method will serve as a strong foundation for training and evaluating multimodal multihop question answering models.

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA), Natural Language for Visual Reasoning (NLVR), and Vision Language Retrieval (VLR). Among these applications, cross-modal interaction and complementary information from different modalities are crucial for advanced models to perform any multimodal task, e.g., understand, recognize, retrieve, or generate optimally. Researchers have proposed diverse methods to address these tasks. The different variants of transformer-based architectures performed extraordinarily on multiple modalities. This survey presents the comprehensive literature on the evolution and enhancement of deep learning multimodal architectures to deal with textual, visual and audio features for diverse cross-modal and modern multimodal tasks. This study summarizes the (i) recent task-specific deep learning methodologies, (ii) the pretraining types and multimodal pretraining objectives, (iii) from state-of-the-art pretrained multimodal approaches to unifying architectures, and (iv) multimodal task categories and possible future improvements that can be devised for better multimodal learning. Moreover, we prepare a dataset section for new researchers that covers most of the benchmarks for pretraining and finetuning. Finally, major challenges, gaps, and potential research topics are explored. A constantly-updated paperlist related to our survey is maintained at //github.com/marslanm/multimodality-representation-learning.

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.

北京阿比特科技有限公司