2021精品一级毛片一区二区,日韩一区国产二区不卡,中文一级无码黄片

An increasingly common viewpoint is that protein dynamics data sets reside in a non-linear subspace of low conformational energy. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich structure to account for a wide range of geometries that can be modelled after an energy landscape. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised to data on a Riemannian manifold. In the context of protein dynamics, a conceptual challenge comes from the lack of a suitable smooth manifold and the lack of guidelines for constructing a smooth Riemannian structure based on an energy landscape. In addition, computational feasibility in computing geodesics and related mappings poses a major challenge. This work considers these challenges. The first part of the paper develops a novel local approximation technique for computing geodesics and related mappings on Riemannian manifolds in a computationally feasible manner. The second part constructs a smooth manifold of point clouds modulo rigid body group actions and a Riemannian structure that is based on an energy landscape for protein conformations. The resulting Riemannian geometry is tested on several data analysis tasks relevant for protein dynamics data. It performs exceptionally well on coarse-grained molecular dynamics simulated data. In particular, the geodesics with given start- and end-points approximately recover corresponding molecular dynamics trajectories for proteins that undergo relatively ordered transitions with medium sized deformations. The Riemannian protein geometry also gives physically realistic summary statistics and retrieves the underlying dimension even for large-sized deformations within seconds on a laptop.

相關內容

Analysis

關注 2

移動平均 · Learning · 預測器/決策函數 · Performer · 批量學習 ·

2023 年 12 月 13 日

Mixed moving average field guided learning for spatio-temporal data

Imma Valentina Curato,Orkun Furat,Lorenzo Proietti,Bennet Stroeh

Influenced mixed moving average fields are a versatile modeling class for spatio-temporal data. However, their predictive distribution is not generally known. Under this modeling assumption, we define a novel spatio-temporal embedding and a theory-guided machine learning approach that employs a generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz predictors and determine fixed-time and any-time PAC Bayesian bounds in the batch learning setting. Performing causal forecast is a highlight of our methodology as its potential application to data with spatial and temporal short and long-range dependence. We then test the performance of our learning methodology by using linear predictors and data sets simulated from a spatio-temporal Ornstein-Uhlenbeck process.

INTERACT · MoDELS · 可辨認的 · BERT · Performer ·

2023 年 12 月 13 日

Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text

Hasin Rehana,Nur Bengisu ?am,Mert Basmaci,Jie Zheng,Christianah Jemiyo,Yongqun He,Arzucan ?zgür,Junguk Hur

Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. However, with the fast-paced growth of biomedical literature, there is a growing need for automated and accurate extraction of PPIs to facilitate scientific knowledge discovery. Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora: Learning Language in Logic (LLL) with 164 PPIs in 77 sentences, Human Protein Reference Database with 163 PPIs in 145 sentences, and Interaction Extraction Performance Assessment with 335 PPIs in 486 sentences. BERT-based models achieved the best overall performance, with BioBERT achieving the highest recall (91.95%) and F1-score (86.84%) and PubMedBERT achieving the highest precision (85.25%). Interestingly, despite not being explicitly trained for biomedical texts, GPT-4 achieved commendable performance, comparable to the top-performing BERT models. It achieved a precision of 88.37%, a recall of 85.14%, and an F1-score of 86.49% on the LLL dataset. These results suggest that GPT models can effectively detect PPIs from text data, offering promising avenues for application in biomedical literature mining. Further research could explore how these models might be fine-tuned for even more specialized tasks within the biomedical domain.

Integration · CASE · 變換 · 得分 · 統計方法 ·

2023 年 12 月 12 日

Score-based calibration testing for multivariate forecast distributions

Malte Knüppel,Fabian Krüger,Marc-Oliver Pohle

Calibration tests based on the probability integral transform (PIT) are routinely used to assess the quality of univariate distributional forecasts. However, PIT-based calibration tests for multivariate distributional forecasts face various challenges. We propose two new types of tests based on proper scoring rules, which overcome these challenges. They arise from a general framework for calibration testing in the multivariate case, introduced in this work. The new tests have good size and power properties in simulations and solve various problems of existing tests. We apply the tests to forecast distributions for macroeconomic and financial time series data.

異常點 · 論文 · 異常檢測 · Extensibility · MINE ·

2023 年 12 月 12 日

Meta-survey on outlier and anomaly detection

Madalina Olteanu,Fabrice Rossi,Florian Yger

The impact of outliers and anomalies on model estimation and data processing is of paramount importance, as evidenced by the extensive body of research spanning various fields over several decades: thousands of research papers have been published on the subject. As a consequence, numerous reviews, surveys, and textbooks have sought to summarize the existing literature, encompassing a wide range of methods from both the statistical and data mining communities. While these endeavors to organize and summarize the research are invaluable, they face inherent challenges due to the pervasive nature of outliers and anomalies in all data-intensive applications, irrespective of the specific application field or scientific discipline. As a result, the resulting collection of papers remains voluminous and somewhat heterogeneous. To address the need for knowledge organization in this domain, this paper implements the first systematic meta-survey of general surveys and reviews on outlier and anomaly detection. Employing a classical systematic survey approach, the study collects nearly 500 papers using two specialized scientific search engines. From this comprehensive collection, a subset of 56 papers that claim to be general surveys on outlier detection is selected using a snowball search technique to enhance field coverage. A meticulous quality assessment phase further refines the selection to a subset of 25 high-quality general surveys. Using this curated collection, the paper investigates the evolution of the outlier detection field over a 20-year period, revealing emerging themes and methods. Furthermore, an analysis of the surveys sheds light on the survey writing practices adopted by scholars from different communities who have contributed to this field. Finally, the paper delves into several topics where consensus has emerged from the literature. These include taxonomies of outlier types, challenges posed by high-dimensional data, the importance of anomaly scores, the impact of learning conditions, difficulties in benchmarking, and the significance of neural networks. Non-consensual aspects are also discussed, particularly the distinction between local and global outliers and the challenges in organizing detection methods into meaningful taxonomies.

INFORMS · MoDELS · 隨機動力系統 · 動力系統 · 信息理論 ·

2023 年 12 月 11 日

Information theory for model reduction in stochastic dynamical systems

Matthew S. Schmitt,Maciej Koch-Janusz,Michel Fruchart,Daniel S. Seara,Vincenzo Vitelli

from arxiv, 24 pages, 10 figures

Model reduction is the construction of simple yet predictive descriptions of the dynamics of many-body systems in terms of a few relevant variables. A prerequisite to model reduction is the identification of these relevant variables, a task for which no general method exists. Here, we develop a systematic approach based on the information bottleneck to identify the relevant variables, defined as those most predictive of the future. We elucidate analytically the relation between these relevant variables and the eigenfunctions of the transfer operator describing the dynamics. Further, we show that in the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions. Our information-based approach indicates when to optimally stop increasing the complexity of the reduced model. Further, it provides a firm foundation to construct interpretable deep learning tools that perform model reduction. We illustrate how these tools work on benchmark dynamical systems and deploy them on uncurated datasets, such as satellite movies of atmospheric flows downloaded directly from YouTube.

TIP · 儲層計算 · Machine Learning · Learning · 動力系統 ·

2023 年 12 月 11 日

Extrapolating tipping points and simulating non-stationary dynamics of complex systems using efficient machine learning

Daniel K?glmayr,Christoph R?th

Model-free and data-driven prediction of tipping point transitions in nonlinear dynamical systems is a challenging and outstanding task in complex systems science. We propose a novel, fully data-driven machine learning algorithm based on next-generation reservoir computing to extrapolate the bifurcation behavior of nonlinear dynamical systems using stationary training data samples. We show that this method can extrapolate tipping point transitions. Furthermore, it is demonstrated that the trained next-generation reservoir computing architecture can be used to predict non-stationary dynamics with time-varying bifurcation parameters. In doing so, post-tipping point dynamics of unseen parameter regions can be simulated.

層 · 近似 · Extensibility · Performer · 奇異的 ·

2023 年 12 月 11 日

Semi-analytic PINN methods for boundary layer problems in a rectangular domain

Gung-Min Gie,Youngjoon Hong,Chang-Yeol Jung,Tselmuun Munkhjin

from arxiv, 22 pages, 6 figures

Singularly perturbed boundary value problems pose a significant challenge for their numerical approximations because of the presence of sharp boundary layers. These sharp boundary layers are responsible for the stiffness of solutions, which leads to large computational errors, if not properly handled. It is well-known that the classical numerical methods as well as the Physics-Informed Neural Networks (PINNs) require some special treatments near the boundary, e.g., using extensive mesh refinements or finer collocation points, in order to obtain an accurate approximate solution especially inside of the stiff boundary layer. In this article, we modify the PINNs and construct our new semi-analytic SL-PINNs suitable for singularly perturbed boundary value problems. Performing the boundary layer analysis, we first find the corrector functions describing the singular behavior of the stiff solutions inside boundary layers. Then we obtain the SL-PINN approximations of the singularly perturbed problems by embedding the explicit correctors in the structure of PINNs or by training the correctors together with the PINN approximations. Our numerical experiments confirm that our new SL-PINN methods produce stable and accurate approximations for stiff solutions.

Unstructured · Extensibility · Performer · Integration · SimPLe ·

2023 年 12 月 9 日

Rapid evaluation of Newtonian potentials on planar domains

Zewen Shen,Kirill Serkh

from arxiv, 25 pages, 5 tables, 11 figures. Accepted by SIAM J. Sci. Comput

The accurate and efficient evaluation of Newtonian potentials over general 2-D domains is important for the numerical solution of Poisson's equation and volume integral equations. In this paper, we present a simple and efficient high-order algorithm for computing the Newtonian potential over a planar domain discretized by an unstructured mesh. The algorithm is based on the use of Green's third identity for transforming the Newtonian potential into a collection of layer potentials over the boundaries of the mesh elements, which can be easily evaluated by the Helsing-Ojala method. One important component of our algorithm is the use of high-order (up to order 20) bivariate polynomial interpolation in the monomial basis, for which we provide extensive justification. The performance of our algorithm is illustrated through several numerical experiments.

泛函 · 損失 · 情景 · 無限 · 統計理論 ·

2023 年 12 月 8 日

Coherence and avoidance of sure loss for standardized functions and semicopulas

Erich Peter Klement,Damjana Kokol Bukov?ek,Bla? Moj?kerc,Matja? Omladi?,Susanne Saminger-Platz,Nik Stopar

from arxiv, 32 pages, 2 figures, Paper was revised, some additional explanations were provided, some additiaonal references were added

We discuss avoidance of sure loss and coherence results for semicopulas and standardized functions, i.e., for grounded, 1-increasing functions with value $1$ at $(1,1,\ldots, 1)$. We characterize the existence of a $k$-increasing $n$-variate function $C$ fulfilling $A\leq C\leq B$ for standardized $n$-variate functions $A,B$ and discuss the method for constructing this function. Our proofs also include procedures for extending functions on some countably infinite mesh to functions on the unit box. We provide a characterization when $A$ respectively $B$ coincides with the pointwise infimum respectively supremum of the set of all $k$-increasing $n$-variate functions $C$ fulfilling $A\leq C\leq B$.

哈希學習 · 圖像檢索 · 卷積神經網絡 · 優化器 · 損失函數（機器學習） ·

2020 年 6 月 10 日

A survey on deep hashing for image retrieval

Xiaopeng Zhang

Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.