2020久久精品亚洲热综合-国产一区二区三区日本韩国

We consider an extension of the Newton-MR algorithm for nonconvex unconstrained optimization to the settings where Hessian information is approximated. Under a particular noise model on the Hessian matrix, we investigate the iteration and operation complexities of this variant to achieve appropriate sub-optimality criteria in several nonconvex settings. We do this by first considering functions that satisfy the (generalized) Polyak-\L ojasiewicz condition, a special sub-class of nonconvex functions. We show that, under certain conditions, our algorithm achieves global linear convergence rate. We then consider more general nonconvex settings where the rate to obtain first order sub-optimality is shown to be sub-linear. In all these settings, we show that our algorithm converges regardless of the degree of approximation of the Hessian as well as the accuracy of the solution to the sub-problem. Finally, we compare the performance of our algorithm with several alternatives on a few machine learning problems.

相關內容

非凸

關注 0

展開 · Performer · 潛在 · 全 · MoDELS ·

2024 年 10 月 30 日

Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Alexander Shmakov,Kevin Greif,Michael James Fenton,Aishik Ghosh,Pierre Baldi,Daniel Whiteson

from arxiv, Submission to SciPost

The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider.

潛在 · 操作 · 數據集 · PDE · 查準率/準確率 ·

2024 年 10 月 26 日

Latent Neural Operator Pretraining for Solving Time-Dependent PDEs

Tian Wang,Chuang Wang

Pretraining methods gain increasing attraction recently for solving PDEs with neural operators. It alleviates the data scarcity problem encountered by neural operator learning when solving single PDE via training on large-scale datasets consisting of various PDEs and utilizing shared patterns among different PDEs to improve the solution precision. In this work, we propose the Latent Neural Operator Pretraining (LNOP) framework based on the Latent Neural Operator (LNO) backbone. We achieve universal transformation through pretraining on hybrid time-dependent PDE dataset to extract representations of different physical systems and solve various time-dependent PDEs in the latent space through finetuning on single PDE dataset. Our proposed LNOP framework reduces the solution error by 31.7% on four problems and can be further improved to 57.1% after finetuning. On out-of-distribution dataset, our LNOP model achieves roughly 50% lower error and 3$\times$ data efficiency on average across different dataset sizes. These results show that our method is more competitive in terms of solution precision, transfer capability and data efficiency compared to non-pretrained neural operators.

MoDELS · 推斷 · 單峰值 · contrastive · 特征向量 ·

2024 年 10 月 24 日

A Unimodal Speaker-Level Membership Inference Detector for Contrastive Pretraining

Ruoxi Cheng,Yizhong Ding,Shuirong Cao,Shitong Shao,Zhiqiang Wang

Audio can disclose PII, particularly when combined with related text data. Therefore, it is essential to develop tools to detect privacy leakage in Contrastive Language-Audio Pretraining(CLAP). Existing MIAs need audio as input, risking exposure of voiceprint and requiring costly shadow models. To address these challenges, we propose USMID, a textual unimodal speaker-level membership inference detector for CLAP models, which queries the target model using only text data and does not require training shadow models. We randomly generate textual gibberish that are clearly not in training dataset. Then we extract feature vectors from these texts using the CLAP model and train a set of anomaly detectors on them. During inference, the feature vector of each test text is input into the anomaly detector to determine if the speaker is in the training set (anomalous) or not (normal). If available, USMID can further enhance detection by integrating real audio of the tested speaker. Extensive experiments on various CLAP model architectures and datasets demonstrate that USMID outperforms baseline methods using only text data.

推斷 · 優化器 · 向量空間 · 貝葉斯推斷 · 縮放 ·

2024 年 10 月 21 日

A Trust-Region Method for Graphical Stein Variational Inference

Liam Pavlovic,David M. Rosen

Stein variational inference (SVI) is a sample-based approximate Bayesian inference technique that generates a sample set by jointly optimizing the samples' locations to minimize an information-theoretic measure of discrepancy with the target probability distribution. SVI thus provides a fast and significantly more sample-efficient approach to Bayesian inference than traditional (random-sampling-based) alternatives. However, the optimization techniques employed in existing SVI methods struggle to address problems in which the target distribution is high-dimensional, poorly-conditioned, or non-convex, which severely limits the range of their practical applicability. In this paper, we propose a novel trust-region optimization approach for SVI that successfully addresses each of these challenges. Our method builds upon prior work in SVI by leveraging conditional independences in the target distribution (to achieve high-dimensional scaling) and second-order information (to address poor conditioning), while additionally providing an effective adaptive step control procedure, which is essential for ensuring convergence on challenging non-convex optimization problems. Experimental results show our method achieves superior numerical performance, both in convergence rate and sample accuracy, and scales better in high-dimensional distributions, than previous SVI techniques.

優化器 · 全局優化 · 分離的 · Integration · 時間步 ·

2024 年 10 月 21 日

An Efficient Local Optimizer-Tracking Solver for Differential-Algebriac Equations with Optimization Criteria

Alexander Fleming,Jens Deussen,Uwe Naumann

from arxiv, 8 pages, 5 figures

A sequential solver for differential-algebraic equations with embedded optimization criteria (DAEOs) was developed to take advantage of the theoretical work done by Deussen et al. Solvers of this type separate the optimization problem from the differential equation and solve each individually. The new solver relies on the reduction of a DAEO to a sequence of differential inclusions separated by jump events. These jump events occur when the global solution to the optimization problem jumps to a new value. Without explicit treatment, these events will reduce the order of convergence of the integration step to one. The solver implements a "local optimizer tracking" procedure to detect and correct these jump events. Local optimizer tracking is much less expensive than running a deterministic global optimizer at every time step. This preserves the order of convergence of the integrator component without sacrificing performance to perform deterministic global optimization at every time step. The newly developed solver produces correct solutions to DAEOs and runs much faster than sequential DAEO solvers that rely only on global optimization.

泛函 · 優化器 · 推斷 · 圖 · 可辨認的 ·

2024 年 10 月 20 日

Simultaneous Inference in Multiple Matrix-Variate Graphs for High-Dimensional Neural Recordings

Zongge Liu,Heejong Bong,Zhao Ren,Matthew A. Smith,Robert E. Kass

As large-scale neural recordings become common, many neuroscientific investigations are focused on identifying functional connectivity from spatio-temporal measurements in two or more brain areas across multiple sessions. Spatial-temporal data in neural recordings can be represented as matrix-variate data, with time as the first dimension and space as the second. In this paper, we exploit the multiple matrix-variate Gaussian Graphical model to encode the common underlying spatial functional connectivity across multiple sessions of neural recordings. By effectively integrating information across multiple graphs, we develop a novel inferential framework that allows simultaneous testing to detect meaningful connectivity for a target edge subset of arbitrary size. Our test statistics are based on a group penalized regression approach and a high-dimensional Gaussian approximation technique. The validity of simultaneous testing is demonstrated theoretically under mild assumptions on sample size and non-stationary autoregressive temporal dependence. Our test is nearly optimal in achieving the testable region boundary. Additionally, our method involves only convex optimization and parametric bootstrap, making it computationally attractive. We demonstrate the efficacy of the new method through both simulations and an experimental study involving multiple local field potential (LFP) recordings in the Prefrontal Cortex (PFC) and visual area V4 during a memory-guided saccade task.

相似度 · 數據點 · 相互獨立的 · 值域 · Nuance ·

2024 年 10 月 18 日

Rethinking Distance Metrics for Counterfactual Explainability

Joshua Nathaniel Williams,Anurag Katakkar,Hoda Heidari,J. Zico Kolter

from arxiv, 13 pages, 3 figures, 1 table

Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.

Learning · MoDELS · 結構化學習 · Markov · 馬爾可夫鏈 ·

2024 年 10 月 16 日

Scalable Structure Learning for Sparse Context-Specific Systems

Felix Leopoldo Rios,Alex Markham,Liam Solus

from arxiv, 34 pages, 6 figures; for associated code, see //cstrees.readthedocs.io

Several approaches to graphically representing context-specific relations among jointly distributed categorical variables have been proposed, along with structure learning algorithms. While existing optimization-based methods have limited scalability due to the large number of context-specific models, the constraint-based methods are more prone to error than even constraint-based directed acyclic graph learning algorithms since more relations must be tested. We present an algorithm for learning context-specific models that scales to hundreds of variables. Scalable learning is achieved through a combination of an order-based Markov chain Monte-Carlo search and a novel, context-specific sparsity assumption that is analogous to those typically invoked for directed acyclic graphical models. Unlike previous Markov chain Monte-Carlo search methods, our Markov chain is guaranteed to have the true posterior of the variable orderings as the stationary distribution. To implement the method, we solve a first case of an open problem recently posed by Alon and Balogh. Future work solving increasingly general instances of this problem would allow our methods to learn increasingly dense models. The method is shown to perform well on synthetic data and real world examples, in terms of both accuracy and scalability.

論文 · 確切的 · Machine Learning · TensorFlow 2.0 · Neural Networks ·

2024 年 10 月 14 日

ALM-PINNs Algorithms for Solving Nonlinear PDEs and Parameter Inversion Problems

Yimeng Tian,Dinghua Xu

This paper focuses on the PINNs algorithm by proposing the ALM-PINNs computational framework to solve various nonlinear partial differential equations and corresponding parameters identification problems. The numerical solutions obtained by the ALM-PINNs algorithm are compared with both the exact solutions and the numerical solutions implemented from the PINNs algorithm. This demonstrates that under the same machine learning framework (TensorFlow 2.0) and neural network architecture, the ALM-PINNs algorithm achieves higher accuracy compared to the standard PINNs algorithm. Additionally, this paper systematically analyzes the construction principles of the loss function by introducing the probability distribution of random errors as prior information, and provides a theoretical basis for algorithm improvement.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 1 月 15 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.