亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational and privacy concerns. Our study addresses these issues by exploring the potential of MoE models, known for their computational efficiency, and the application of DP, a standard for privacy preservation. We present the first known attempt to train MoE models under the constraints of DP, addressing the unique challenges posed by their architecture and the complexities of DP integration. Our initial experimental studies demonstrate that MoE models can be effectively trained with DP, achieving performance that is competitive with their non-private counterparts. This initial study aims to provide valuable insights and ignite further research in the domain of privacy-preserving MoE models, softly laying the groundwork for prospective developments in this evolving field.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 區塊鏈 · 容差 · TransAct · Less ·
2024 年 3 月 25 日

Decentralized Finance (DeFi) has emerged as a contemporary competitive as well as complementary to traditional centralized finance systems. As of 23rd January 2024, per Defillama approximately USD 55 billion is the total value locked on the DeFi applications on all blockchains put together. A Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) protocol, popularly known as the consensus protocol, is the central component of a blockchain. If forks are possible in a consensus protocol, they can be misused to carry out double spending attacks and can be catastrophic given high volumes of finance that are transacted on blockchains. Formal verification of the safety of consensus protocols is the golden standard for guaranteeing that forks are not possible. However, it is considered complex and challenging to do. This is reflected by the fact that not many complex consensus protocols are formally verified except for Tendermint and QBFT. We focus on Supra's Pipelined Moonshot consensus protocol. Similar to Tendermint's formal verification, we too model Pipelined Moonshot using IVy and formally prove that for all network sizes, as long as the number of Byzantine validators is less than one thirds, the protocol does not allow forks, thus proving that Pipelined Moonshot is safe and double spending cannot be done using forks. The IVy model and proof of safety is available on Github.

Explainable Artificial Intelligence (XAI) strategies play a crucial part in increasing the understanding and trustworthiness of neural networks. Nonetheless, these techniques could potentially generate misleading explanations. Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation, providing misleading information by adding visually unnoticeable artifacts into the input, while maintaining the model's accuracy. It poses a serious challenge in ensuring the reliability of XAI methods. To ensure the reliability of XAI methods poses a real challenge, we leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks. We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase, avoiding the need for extra training. The method we suggest defences against most modern explanation-aware adversarial attacks, achieving an approximate decrease of ~99\% in the Attack Success Rate (ASR) and a ~91\% reduction in the Mean Square Error (MSE) between the original explanation and the defended (post-attack) explanation across three unique types of attacks.

This study delves into the potential use of Large Language Models (LLMs) for generating Library of Congress Subject Headings (LCSH). The authors employed ChatGPT to generate subject headings for electronic theses and dissertations (ETDs) based on their titles and summaries. The results revealed that although some generated subject headings were valid, there were issues regarding specificity and exhaustiveness. The study showcases that LLMs can serve as a strategic response to the backlog of items awaiting cataloging in academic libraries, while also offering a cost-effective approach for promptly generating LCSH. Nonetheless, human catalogers remain essential for verifying and enhancing the validity, exhaustiveness, and specificity of LCSH generated by LLMs.

Stress models are a promising approach for graph drawing. They minimize the weighted sum of the squared errors of the Euclidean and desired distances for each node pair. The desired distance typically uses the graph-theoretic distances obtained from the all-node pair shortest path problem. In a minimized stress function, the obtained coordinates are affected by the non-Euclidean property and the high-dimensionality of the graph-theoretic distance matrix. Therefore, the graph-theoretic distances used in stress models may not necessarily be the best metric for determining the node coordinates. In this study, we propose two different methods of adjusting the graph-theoretical distance matrix to a distance matrix suitable for graph drawing while preserving its structure. The first method is the application of eigenvalue decomposition to the inner product matrix obtained from the distance matrix and the obtainment of a new distance matrix by setting some eigenvalues with small absolute values to zero. The second approach is the usage of a stress model modified by adding a term that minimizes the Frobenius norm between the adjusted and original distance matrices. We perform computational experiments using several benchmark graphs to demonstrate that the proposed method improves some quality metrics, including the node resolution and the Gabriel graph property, when compared to conventional stress models.

Amidst the rapid evolution of LLMs, the significance of evaluation in comprehending and propelling these models forward is increasingly paramount. Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. However, the extent and nature of these impacts continue to be subjects of debate because most assessments have been restricted to a limited number of models and data points. Clarifying the effects of these factors on performance scores can be more effectively achieved through a statistical lens. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. With the advent of a uniform evaluation framework, our research leverages an expansive dataset of evaluation results, introducing a comprehensive statistical methodology. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique, offering a robust and transparent approach to deciphering LLM performance data. Contrary to prevailing findings, our results challenge assumptions about emergent abilities and the influence of given training types and architectures in LLMs. These findings furnish new perspectives on the characteristics, intrinsic nature, and developmental trajectories of LLMs. By providing straightforward and reliable methods to scrutinize and reassess LLM performance data, this study contributes a nuanced perspective on LLM efficiency and potentials.

The present work aims at describing hysteresis behaviour arising from cyclic bending experiments on cables by means of the Preisach operator. Pure bending experiments conducted in previous work show that slender structures such as electric cables behave inelastically and open hysteresis loops arise, with noticeable difference between the first load cycle and the following ones. The Preisach operator plays an important role in describing the input-output relation in hysteresis behaviours and it can be expressed as a superposition of relay operators. Here, we utilise data collected from pure bending experiments for a first approach. We introduce a mathematical formulation of the problem, and starting from the curvature of the cable specimen, we recursively define the Preisach plane for this specific case. Therefore, we derive a suitable kernel function in a way that the integration of such function over the Preisach plane results in the bending moment of the specimen.

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.

Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes compared to conventional KBs (18x more nodes in ATOMIC compared to Freebase (FB15K-237)). Importantly, this implies significantly sparser graph structures - a major challenge for existing KB completion methods that assume densely connected graphs over a relatively smaller set of nodes. In this paper, we present novel KB completion models that can address these challenges by exploiting the structural and semantic context of nodes. Specifically, we investigate two key ideas: (1) learning from local graph structure, using graph convolutional networks and automatic graph densification and (2) transfer learning from pre-trained language models to knowledge graphs for enhanced contextual representation of knowledge. We describe our method to incorporate information from both these sources in a joint model and provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on ConceptNet. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for ConceptNet) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

北京阿比特科技有限公司