国产白浆一区二区无码视频在线,18禁不卡无毒免费网站入口,国产亚洲欧美日韩综合另类,亚洲视精品美女在线宅男,久久久一本线一区二区

Katie Everett,Lechao Xiao,Mitchell Wortsman,Alexander A. Alemi,Roman Novak,Peter J. Liu,Izzeddin Gur,Jascha Sohl-Dickstein,Leslie Pack Kaelbling,Jaehoon Lee,Jeffrey Pennington

from arxiv, 63 pages, International Conference on Machine Learning 2024

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results under weaker assumptions and a broader set of optimizers. Our extensive empirical investigation includes tens of thousands of models trained with all combinations of three optimizers, four parameterizations, several alignment assumptions, more than a dozen learning rates, and fourteen model sizes up to 26.8B parameters. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work. Our results show that all parameterizations, not just maximal update parameterization (muP), can achieve hyperparameter transfer; moreover, our novel per-layer learning rate prescription for standard parameterization outperforms muP. Finally, we demonstrate that an overlooked aspect of parameterization, the epsilon parameter in Adam, must be scaled correctly to avoid gradient underflow and propose Adam-atan2, a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter entirely.

相關內容

縮放

關注 0

Networking · MoDELS · Learning · 優化器 · Weight ·

2024 年 8 月 21 日

On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

Ziwei Zheng,Huizhi Liang,Vaclav Snasel,Vito Latora,Panos Pardalos,Giuseppe Nicosia,Varun Ojha

We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the functionality and efficiency of DNNs. Our findings reveal that successful networks, irrespective of datasets or models, are invariably similar to other successful networks in their converged weights statistics and distribution, while poor-performing networks vary in their weights. In addition, our research shows that the learnable parameters of widely varied deep learning models such as DNN, CNN, and ViT exhibit similar learning characteristics.

賭博機/老虎機 · 統計量 · 推斷 · 試驗 · 估計/估計量 ·

2024 年 8 月 16 日

Replicable Bandits for Digital Health Interventions

Kelly W. Zhang,Nowell Closser,Anna L. Trella,Susan A. Murphy

Adaptive treatment assignment algorithms, such as bandit and reinforcement learning algorithms, are increasingly used in digital health intervention clinical trials. Causal inference and related data analyses are critical for evaluating digital health interventions, deciding how to refine the intervention, and deciding whether to roll-out the intervention more broadly. However the replicability of these analyses has received relatively little attention. This work investigates the replicability of statistical analyses from trials deploying adaptive treatment assignment algorithms. We demonstrate that many standard statistical estimators can be inconsistent and fail to be replicable across repetitions of the clinical trial, even as the sample size grows large. We show that this non-replicability is intimately related to properties of the adaptive algorithm itself. We introduce a formal definition of a "replicable bandit algorithm" and prove that under such algorithms, a wide variety of common statistical analyses are guaranteed to be consistent. We present both theoretical results and simulation studies based on a mobile health oral health self-care intervention. Our findings underscore the importance of designing adaptive algorithms with replicability in mind, especially for settings like digital health where deployment decisions rely heavily on replicated evidence. We conclude by discussing open questions on the connections between algorithm design, statistical inference, and experimental replicability.

Automator · Agent · Projection · 大語言模型 · motivation ·

2024 年 8 月 16 日

Automated Phishing Detection Using URLs and Webpages

Huilin Wang,Bryan Hooi

Phishing detection is a critical cybersecurity task that involves the identification and neutralization of fraudulent attempts to obtain sensitive information, thereby safeguarding individuals and organizations from data breaches and financial loss. In this project, we address the constraints of traditional reference-based phishing detection by developing an LLM agent framework. This agent harnesses Large Language Models to actively fetch and utilize online information, thus providing a dynamic reference system for more accurate phishing detection. This innovation circumvents the need for a static knowledge base, offering a significant enhancement in adaptability and efficiency for automated security measures. The project report includes an initial study and problem analysis of existing solutions, which motivated us to develop a new framework. We demonstrate the framework with LLMs simulated as agents and detail the techniques required for construction, followed by a complete implementation with a proof-of-concept as well as experiments to evaluate our solution's performance against other similar solutions. The results show that our approach has achieved with accuracy of 0.945, significantly outperforms the existing solution(DynaPhish) by 0.445. Furthermore, we discuss the limitations of our approach and suggest improvements that could make it more effective. Overall, the proposed framework has the potential to enhance the effectiveness of current reference-based phishing detection approaches and could be adapted for real-world applications.

SOFT · Learning · MoDELS · 表示 · 表示學習 ·

2024 年 8 月 14 日

Implicit Causal Representation Learning via Switchable Mechanisms

Shayan Shirahmad Gale Bagi,Zahra Gharaee,Oliver Schulte,Mark Crowley

Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modelling. We propose ICLR-SM, which models the effects of soft interventions by employing a causal mechanism switch variable designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches.

Projection · 正則化項 · 優化器 · 評論員 · 后向 ·

2024 年 8 月 14 日

A Sequential Homotopy Method for Mathematical Programming Problems

Andreas Potschka,Hans Georg Bock

from arxiv, 29 pages, 6 figures

We propose a sequential homotopy method for the solution of mathematical programming problems formulated in abstract Hilbert spaces under the Guignard constraint qualification. The method is equivalent to performing projected backward Euler timestepping on a projected gradient/antigradient flow of the augmented Lagrangian. The projected backward Euler equations can be interpreted as the necessary optimality conditions of a primal-dual proximal regularization of the original problem. The regularized problems are always feasible, satisfy a strong constraint qualification guaranteeing uniqueness of Lagrange multipliers, yield unique primal solutions provided that the stepsize is sufficiently small, and can be solved by a continuation in the stepsize. We show that equilibria of the projected gradient/antigradient flow and critical points of the optimization problem are identical, provide sufficient conditions for the existence of global flow solutions, and show that critical points with emanating descent curves cannot be asymptotically stable equilibria of the projected gradient/antigradient flow, practically eradicating convergence to saddle points and maxima. The sequential homotopy method can be used to globalize any locally convergent optimization method that can be used in a homotopy framework. We demonstrate its efficiency for a class of highly nonlinear and badly conditioned control constrained elliptic optimal control problems with a semismooth Newton approach for the regularized subproblems. In contrast to the published article, this version contains a correction that the associate editor considers as too insignificant to justify publication in the journal.

INFORMS · MoDELS · 損失函數（機器學習） · 離散化 · Learning ·

2024 年 8 月 13 日

Physics Informed Deep Learning for Strain Gradient Continuum Plasticity

Ankit Tyagi,Uttam Suman,Mariya Mamajiwala,Debasish Roy

We use a space-time discretization based on physics informed deep learning (PIDL) to approximate solutions of a class of rate-dependent strain gradient plasticity models. The differential equation governing the plastic flow, the so-called microforce balance for this class of yield-free plasticity models, is very stiff, often leading to numerical corruption and a consequent lack of accuracy or convergence by finite element (FE) methods. Indeed, setting up the discretized framework, especially with an elaborate meshing around the propagating plastic bands whose locations are often unknown a-priori, also scales up the computational effort significantly. Taking inspiration from physics informed neural networks, we modify the loss function of a PIDL model in several novel ways to account for the balance laws, either through energetics or via the resulting PDEs once a variational scheme is applied, and the constitutive equations. The initial and the boundary conditions may either be imposed strictly by encoding them within the PIDL architecture, or enforced weakly as a part of the loss function. The flexibility in the implementation of a PIDL technique often makes for its ready interface with powerful optimization schemes, and this in turn provides for many possibilities in posing the problem. We have used freely available open-source libraries that perform fast, parallel computations on GPUs. Using numerical illustrations, we demonstrate how PIDL methods could address the computational challenges posed by strain gradient plasticity models. Also, PIDL methods offer abundant potentialities, vis-\'a-vis a somewhat straitjacketed and poorer approximant of FE methods, in customizing the formulation as per the problem objective.

多峰值 · MoDELS · 可辨認的 · 層 · 模態 ·

2024 年 5 月 28 日

The Evolution of Multimodal Model Architectures

Shakti N. Wadekar,Abhishek Chaurasia,Aman Chadha,Eugenio Culurciello

from arxiv, 30 pages, 6 tables, 7 figures

This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.

contrastive · 對比學習 · 相似度 · MoDELS · 學成 ·

2021 年 9 月 24 日

Sequence Level Contrastive Learning for Text Summarization

Shusheng Xu,Xingxing Zhang,Yi Wu,Furu Wei

from arxiv, 2 figures, 12 tables

Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.

多峰值 · 模態 · INFORMS · MoDELS · 可約的 ·

2021 年 6 月 30 日

Attention Bottlenecks for Multimodal Fusion

Arsha Nagrani,Shan Yang,Anurag Arnab,Aren Jansen,Cordelia Schmid,Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.

entity · 圖 · 知識圖譜 · MoDELS · 相似度 ·

2019 年 9 月 11 日

Domain Representation for Knowledge Graph Embedding

Cunxiang Wang,Feiliang Ren,Zhichao Lin,Chenxv Zhao,Tian Xie,Yue Zhang

from arxiv, Acceptted by NLPCC2019

Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.