无码人妻一区二区三区在线不卡-啊在线不卡视频无码

We propose a sequential homotopy method for the solution of mathematical programming problems formulated in abstract Hilbert spaces under the Guignard constraint qualification. The method is equivalent to performing projected backward Euler timestepping on a projected gradient/antigradient flow of the augmented Lagrangian. The projected backward Euler equations can be interpreted as the necessary optimality conditions of a primal-dual proximal regularization of the original problem. The regularized problems are always feasible, satisfy a strong constraint qualification guaranteeing uniqueness of Lagrange multipliers, yield unique primal solutions provided that the stepsize is sufficiently small, and can be solved by a continuation in the stepsize. We show that equilibria of the projected gradient/antigradient flow and critical points of the optimization problem are identical, provide sufficient conditions for the existence of global flow solutions, and show that critical points with emanating descent curves cannot be asymptotically stable equilibria of the projected gradient/antigradient flow, practically eradicating convergence to saddle points and maxima. The sequential homotopy method can be used to globalize any locally convergent optimization method that can be used in a homotopy framework. We demonstrate its efficiency for a class of highly nonlinear and badly conditioned control constrained elliptic optimal control problems with a semismooth Newton approach for the regularized subproblems. In contrast to the published article, this version contains a correction that the associate editor considers as too insignificant to justify publication in the journal.

相關內容

Projection

關注 1

操作 · 數據集 · 可理解性 · MoDELS · 縮放 ·

2024 年 10 月 2 日

Strategies for Pretraining Neural Operators

Anthony Zhou,Cooper Lorsung,AmirPouya Hemmasian,Amir Barati Farimani

from arxiv, 29 pages, 5 figures

Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining can be additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.

MoDELS · Learning · Performer · Neural Networks · Networking ·

2024 年 10 月 2 日

Towards Model Discovery Using Domain Decomposition and PINNs

Tirtho S. Saha,Alexander Heinlein,Cordula Reisch

We enhance machine learning algorithms for learning model parameters in complex systems represented by ordinary differential equations (ODEs) with domain decomposition methods. The study evaluates the performance of two approaches, namely (vanilla) Physics-Informed Neural Networks (PINNs) and Finite Basis Physics-Informed Neural Networks (FBPINNs), in learning the dynamics of test models with a quasi-stationary longtime behavior. We test the approaches for data sets in different dynamical regions and with varying noise level. As results, we find a better performance for the FBPINN approach compared to the vanilla PINN approach, even in cases with data from only a quasi-stationary time domain with few dynamics.

估計/估計量 · MoDELS · 樣例 · 可辨認的 · 全 ·

2024 年 9 月 30 日

Physics-Regularized Multi-Modal Image Assimilation for Brain Tumor Localization

Michal Balcerak,Tamaz Amiranashvili,Andreas Wagner,Jonas Weidner,Petr Karnakov,Johannes C. Paetzold,Ivan Ezhov,Petros Koumoutsakos,Benedikt Wiestler,Bjoern Menze

from arxiv, Accepted to NeurIPS 2024

Physical models in the form of partial differential equations represent an important prior for many under-constrained problems. One example is tumor treatment planning, which heavily depends on accurate estimates of the spatial distribution of tumor cells in a patient's anatomy. Medical imaging scans can identify the bulk of the tumor, but they cannot reveal its full spatial distribution. Tumor cells at low concentrations remain undetectable, for example, in the most frequent type of primary brain tumors, glioblastoma. Deep-learning-based approaches fail to estimate the complete tumor cell distribution due to a lack of reliable training data. Most existing works therefore rely on physics-based simulations to match observed tumors, providing anatomically and physiologically plausible estimations. However, these approaches struggle with complex and unknown initial conditions and are limited by overly rigid physical models. In this work, we present a novel method that balances data-driven and physics-based cost functions. In particular, we propose a unique discretization scheme that quantifies the adherence of our learned spatiotemporal tumor and brain tissue distributions to their corresponding growth and elasticity equations. This quantification, serving as a regularization term rather than a hard constraint, enables greater flexibility and proficiency in assimilating patient data than existing models. We demonstrate improved coverage of tumor recurrence areas compared to existing techniques on real-world data from a cohort of patients. The method holds the potential to enhance clinical adoption of model-driven treatment planning for glioblastoma.

穩健性 · Performer · 示例 · 部分可觀測馬爾可夫決策過程 · Markov ·

2024 年 9 月 30 日

Pessimistic Iterative Planning for Robust POMDPs

Maris F. L. Galesloot,Marnix Suilen,Thiago D. Sim?o,Steven Carr,Matthijs T. J. Spaan,Ufuk Topcu,Nils Jansen

Robust partially observable Markov decision processes (robust POMDPs) extend classical POMDPs to handle additional uncertainty on the transition and observation probabilities via so-called uncertainty sets. Policies for robust POMDPs must not only be memory-based to account for partial observability but also robust against model uncertainty to account for the worst-case instances from the uncertainty sets. We propose the pessimistic iterative planning (PIP) framework, which finds robust memory-based policies for robust POMDPs. PIP alternates between two main steps: (1) selecting an adversarial (non-robust) POMDP via worst-case probability instances from the uncertainty sets; and (2) computing a finite-state controller (FSC) for this adversarial POMDP. We evaluate the performance of this FSC on the original robust POMDP and use this evaluation in step (1) to select the next adversarial POMDP. Within PIP, we propose the rFSCNet algorithm. In each iteration, rFSCNet finds an FSC through a recurrent neural network by using supervision policies optimized for the adversarial POMDP. The empirical evaluation in four benchmark environments showcases improved robustness against several baseline methods and competitive performance compared to a state-of-the-art robust POMDP solver.

Neural Networks · Networking · 方陣 · PDE · 確切的 ·

2024 年 9 月 30 日

First Order System Least Squares Neural Networks

Joost A. A. Opschoor,Philipp C. Petersen,Christoph Schwab

We introduce a conceptual framework for numerically solving linear elliptic, parabolic, and hyperbolic PDEs on bounded, polytopal domains in euclidean spaces by deep neural networks. The PDEs are recast as minimization of a least-squares (LSQ for short) residual of an equivalent, well-posed first-order system, over parametric families of deep neural networks. The associated LSQ residual is a) equal or proportional to a weak residual of the PDE, b) additive in terms of contributions from localized subnetworks, indicating locally ``out-of-equilibrium'' of neural networks with respect to the PDE residual, c) serves as numerical loss function for neural network training, and d) constitutes, even with incomplete training, a computable, (quasi-)optimal numerical error estimator in the context of adaptive LSQ finite element methods. In addition, an adaptive neural network growth strategy is proposed which, assuming exact numerical minimization of the LSQ loss functional, yields sequences of neural networks with realizations that converge rate-optimally to the exact solution of the first order system LSQ formulation.

寬度 · 團 · 近似 · 分離的 · Better ·

2024 年 9 月 30 日

Efficient Approximation of Fractional Hypertree Width

Viktoriia Korchemna,Daniel Lokshtanov,Saket Saurabh,Vaishali Surianarayanan,Jie Xue

from arxiv, 28 pages, 1 figure, preliminary version accepted at FOCS 2024

We give two new approximation algorithms to compute the fractional hypertree width of an input hypergraph. The first algorithm takes as input $n$-vertex $m$-edge hypergraph $H$ of fractional hypertree width at most $\omega$, runs in polynomial time and produces a tree decomposition of $H$ of fractional hypertree width $O(\omega \log n \log \omega)$. As an immediate corollary this yields polynomial time $O(\log^2 n \log \omega)$-approximation algorithms for (generalized) hypertree width as well. To the best of our knowledge our algorithm is the first non-trivial polynomial-time approximation algorithm for fractional hypertree width and (generalized) hypertree width, as opposed to algorithms that run in polynomial time only when $\omega$ is considered a constant. For hypergraphs with the bounded intersection property we get better bounds, comparable with that recent algorithm of Lanzinger and Razgon [STACS 2024]. The second algorithm runs in time $n^{\omega}m^{O(1)}$ and produces a tree decomposition of $H$ of fractional hypertree width $O(\omega \log^2 \omega)$. This significantly improves over the $(n+m)^{O(\omega^3)}$ time algorithm of Marx [ACM TALG 2010], which produces a tree decomposition of fractional hypertree width $O(\omega^3)$, both in terms of running time and the approximation ratio. Our main technical contribution, and the key insight behind both algorithms, is a variant of the classic Menger's Theorem for clique separators in graphs: For every graph $G$, vertex sets $A$ and $B$, family ${\cal F}$ of cliques in $G$, and positive rational $f$, either there exists a sub-family of $O(f \cdot \log^2 n)$ cliques in ${\cal F}$ whose union separates $A$ from $B$, or there exist $f \cdot \log |{\cal F}|$ paths from $A$ to $B$ such that no clique in ${\cal F}$ intersects more than $\log |{\cal F}|$ paths.

向量化 · 離散化 · CASE · 標量 · state-of-the-art ·

2024 年 9 月 27 日

Localized Evaluation for Constructing Discrete Vector Fields

Tanner Finken,Julien Tierny,Joshua A Levine

from arxiv, 11 pages, Accepted at IEEE Vis Conference 2024

Topological abstractions offer a method to summarize the behavior of vector fields but computing them robustly can be challenging due to numerical precision issues. One alternative is to represent the vector field using a discrete approach, which constructs a collection of pairs of simplices in the input mesh that satisfies criteria introduced by Forman's discrete Morse theory. While numerous approaches exist to compute pairs in the restricted case of the gradient of a scalar field, state-of-the-art algorithms for the general case of vector fields require expensive optimization procedures. This paper introduces a fast, novel approach for pairing simplices of two-dimensional, triangulated vector fields that do not vary in time. The key insight of our approach is that we can employ a local evaluation, inspired by the approach used to construct a discrete gradient field, where every simplex in a mesh is considered by no more than one of its vertices. Specifically, we observe that for any edge in the input mesh, we can uniquely assign an outward direction of flow. We can further expand this consistent notion of outward flow at each vertex, which corresponds to the concept of a downhill flow in the case of scalar fields. Working with outward flow enables a linear-time algorithm that processes the (outward) neighborhoods of each vertex one-by-one, similar to the approach used for scalar fields. We couple our approach to constructing discrete vector fields with a method to extract, simplify, and visualize topological features. Empirical results on analytic and simulation data demonstrate drastic improvements in running time, produce features similar to the current state-of-the-art, and show the application of simplification to large, complex flows.

多峰值 · MoDELS · 可辨認的 · 層 · 模態 ·

2024 年 5 月 28 日

The Evolution of Multimodal Model Architectures

Shakti N. Wadekar,Abhishek Chaurasia,Aman Chadha,Eugenio Culurciello

from arxiv, 30 pages, 6 tables, 7 figures

This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.

多峰值 · 模態 · INFORMS · MoDELS · 可約的 ·

2021 年 6 月 30 日

Attention Bottlenecks for Multimodal Fusion

Arsha Nagrani,Shan Yang,Anurag Arnab,Aren Jansen,Cordelia Schmid,Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.

估計/估計量 · contrastive · INFORMS · 互信息 · 表示學習 ·

2021 年 6 月 25 日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet

from arxiv, ICML 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.