丰满人妻被公侵犯高清版,欧美人在线一区二区三区,3D动漫精品啪啪一区二区中

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

相關內容

泛函

關注 0

GROUP · INFORMS · 信息幾何 · 優化器 · MASS ·

2024 年 11 月 5 日

Information geometry of diffeomorphism groups

Boris Khesin,Gerard Misio?ek,Klas Modin

from arxiv, 89 pages, 10 figures

The study of diffeomorphism groups and their applications to problems in analysis and geometry has a long history. In geometric hydrodynamics, pioneered by V.~Arnold in the 1960s, one considers an ideal fluid flow as the geodesic motion on the infinite-dimensional group of volume-preserving diffeomorphisms of the fluid domain with respect to the metric defined by the kinetic energy. Similar considerations on the space of densities lead to a geometric description of optimal mass transport and the Kantorovich-Wasserstein metric. Likewise, information geometry associated with the Fisher-Rao metric and the Hellinger distance has an equally beautiful infinite-dimensional geometric description and can be regarded as a higher-order Sobolev analogue of optimal transportation. In this work we review various metrics on diffeomorphism groups relevant to this approach and introduce appropriate topology, smooth structures and dynamics on the corresponding infinite-dimensional manifolds. Our main goal is to demonstrate how, alongside topological hydrodynamics, Hamiltonian dynamics and optimal mass transport, information geometry with its elaborate toolbox has become yet another exciting field for applications of geometric analysis on diffeomorphism groups.

數據增強 · MASS · 動量 · 線性的 · 講稿 ·

2024 年 11 月 4 日

Data augmentation for the POD formulation of the parametric laminar incompressible Navier-Stokes equations

Alba Muixí,Sergio Zlotnik,Matteo Giacomini,Pedro Díez

from arxiv, 36 pages, 13 figures, 4 tables

A posteriori reduced-order models (ROM), e.g. based on proper orthogonal decomposition (POD), are essential to affordably tackle realistic parametric problems. They rely on a trustful training set, that is a family of full-order solutions (snapshots) representative of all possible outcomes of the parametric problem. Having such a rich collection of snapshots is not, in many cases, computationally viable. A strategy for data augmentation, designed for parametric laminar incompressible flows, is proposed to enrich poorly populated training sets. The goal is to include in the new, artificial snapshots emerging features, not present in the original basis, that do enhance the quality of the reduced basis (RB) constructed using POD dimensionality reduction. The methodologies devised are based on exploiting basic physical principles, such as mass and momentum conservation, to construct physically-relevant, artificial snapshots at a fraction of the cost of additional full-order solutions. Interestingly, the numerical results show that the ideas exploiting only mass conservation (i.e., incompressibility) are not producing significant added value with respect to the standard linear combinations of snapshots. Conversely, accounting for the linearized momentum balance via the Oseen equation does improve the quality of the resulting approximation and therefore is an effective data augmentation strategy in the framework of viscous incompressible laminar flows. Numerical experiments of parametric flow problems, in two and three dimensions, at low and moderate values of the Reynolds number are presented to showcase the superior performance of the data-enriched POD-RB with respect to the standard ROM in terms of both accuracy and efficiency.

2024 年 11 月 4 日

Unsupervised detection of semantic correlations in big data

Santiago Acevedo,Alex Rodriguez,Alessandro Laio

In real-world data, information is stored in extremely large feature vectors. These variables are typically correlated due to complex interactions involving many features simultaneously. Such correlations qualitatively correspond to semantic roles and are naturally recognized by both the human brain and artificial neural networks. This recognition enables, for instance, the prediction of missing parts of an image or text based on their context. We present a method to detect these correlations in high-dimensional data represented as binary numbers. We estimate the binary intrinsic dimension of a dataset, which quantifies the minimum number of independent coordinates needed to describe the data, and is therefore a proxy of semantic complexity. The proposed algorithm is largely insensitive to the so-called curse of dimensionality, and can therefore be used in big data analysis. We test this approach identifying phase transitions in model magnetic systems and we then apply it to the detection of semantic correlations of images and text inside deep neural networks.

MoDELS · fMRI · 語言模型化 · Processing（編程語言） · 預測器/決策函數 ·

2024 年 11 月 4 日

fMRI predictors based on language models of increasing complexity recover brain left lateralization

Laurent Bonnasse-Gahot,Christophe Pallier

from arxiv, 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Over the past decade, studies of naturalistic language processing where participants are scanned while listening to continuous text have flourished. Using word embeddings at first, then large language models, researchers have created encoding models to analyze the brain signals. Presenting these models with the same text as the participants allows to identify brain areas where there is a significant correlation between the functional magnetic resonance imaging (fMRI) time series and the ones predicted by the models' artificial neurons. One intriguing finding from these studies is that they have revealed highly symmetric bilateral activation patterns, somewhat at odds with the well-known left lateralization of language processing. Here, we report analyses of an fMRI dataset where we manipulate the complexity of large language models, testing 28 pretrained models from 8 different families, ranging from 124M to 14.2B parameters. First, we observe that the performance of models in predicting brain responses follows a scaling law, where the fit with brain activity increases linearly with the logarithm of the number of parameters of the model (and its performance on natural language processing tasks). Second, although this effect is present in both hemispheres, it is stronger in the left than in the right hemisphere. Specifically, the left-right difference in brain correlation follows a scaling law with the number of parameters. This finding reconciles computational analyses of brain activity using large language models with the classic observation from aphasic patients showing left hemisphere dominance for language.

優化器 · Memetic · MoDELS · GROUP · BASIC ·

2024 年 11 月 4 日

Deep memetic models for combinatorial optimization problems: application to the tool switching problem

Jhon Edgar Amaya,Carlos Cotta,Antonio J. Fernández-Leiva,Pablo García-Sánchez

from arxiv, 32 pages, 5 figures

Memetic algorithms are techniques that orchestrate the interplay between population-based and trajectory-based algorithmic components. In particular, some memetic models can be regarded under this broad interpretation as a group of autonomous basic optimization algorithms that interact among them in a cooperative way in order to deal with a specific optimization problem, aiming to obtain better results than the algorithms that constitute it separately. Going one step beyond this traditional view of cooperative optimization algorithms, this work tackles deep meta-cooperation, namely the use of cooperative optimization algorithms in which some components can in turn be cooperative methods themselves, thus exhibiting a deep algorithmic architecture. The objective of this paper is to demonstrate that such models can be considered as an efficient alternative to other traditional forms of cooperative algorithms. To validate this claim, different structural parameters, such as the communication topology between the agents, or the parameter that influences the depth of the cooperative effort (the depth of meta-cooperation), have been analyzed. To do this, a comparison with the state-of-the-art cooperative methods to solve a specific combinatorial problem, the Tool Switching Problem, has been performed. Results show that deep models are effective to solve this problem, outperforming metaheuristics proposed in the literature.

MoDELS · 圖 · Attention · 原點 · 結點 ·

2024 年 11 月 4 日

FedASTA: Federated adaptive spatial-temporal attention for traffic flow prediction

Kaiyuan Li,Yihan Zhang,Huandong Wang,Yan Zhuo,Xinlei Chen

Mobile devices and the Internet of Things (IoT) devices nowadays generate a large amount of heterogeneous spatial-temporal data. It remains a challenging problem to model the spatial-temporal dynamics under privacy concern. Federated learning (FL) has been proposed as a framework to enable model training across distributed devices without sharing original data which reduce privacy concern. Personalized federated learning (PFL) methods further address data heterogenous problem. However, these methods don't consider natural spatial relations among nodes. For the sake of modeling spatial relations, Graph Neural Netowork (GNN) based FL approach have been proposed. But dynamic spatial-temporal relations among edge nodes are not taken into account. Several approaches model spatial-temporal dynamics in a centralized environment, while less effort has been made under federated setting. To overcome these challeges, we propose a novel Federated Adaptive Spatial-Temporal Attention (FedASTA) framework to model the dynamic spatial-temporal relations. On the client node, FedASTA extracts temporal relations and trend patterns from the decomposed terms of original time series. Then, on the server node, FedASTA utilize trend patterns from clients to construct adaptive temporal-spatial aware graph which captures dynamic correlation between clients. Besides, we design a masked spatial attention module with both static graph and constructed adaptive graph to model spatial dependencies among clients. Extensive experiments on five real-world public traffic flow datasets demonstrate that our method achieves state-of-art performance in federated scenario. In addition, the experiments made in centralized setting show the effectiveness of our novel adaptive graph construction approach compared with other popular dynamic spatial-temporal aware methods.

Integration · Tensor · 流形 · 向量化 · 正交 ·

2024 年 11 月 1 日

Collocation methods for nonlinear differential equations on low-rank manifolds

Alec Dektor

from arxiv, 31 pages, 8 figures

We introduce new methods for integrating nonlinear differential equations on low-rank manifolds. These methods rely on interpolatory projections onto the tangent space, enabling low-rank time integration of vector fields that can be evaluated entry-wise. A key advantage of our approach is that it does not require the vector field to exhibit low-rank structure, thereby overcoming significant limitations of traditional dynamical low-rank methods based on orthogonal projection. To construct the interpolatory projectors, we develop a sparse tensor sampling algorithm based on the discrete empirical interpolation method (DEIM) that parameterizes tensor train manifolds and their tangent spaces with cross interpolation. Using these projectors, we propose two time integration schemes on low-rank tensor train manifolds. The first scheme integrates the solution at selected interpolation indices and constructs the solution with cross interpolation. The second scheme generalizes the well-known orthogonal projector-splitting integrator to interpolatory projectors. We demonstrate the proposed methods with applications to several tensor differential equations arising from the discretization of partial differential equations.

MoDELS · 條件獨立的 · 相互獨立的 · 操作 · BASIC ·

2024 年 11 月 1 日

Self-adhesivity in lattices of abstract conditional independence models

Tobias Boege,Janneke H. Bolt,Milan Studeny

from arxiv, 39 pages, 4 figures; minor revision, final version

We introduce an algebraic concept of the frame for abstract conditional independence (CI) models, together with basic operations with respect to which such a frame should be closed: copying and marginalization. Three standard examples of such frames are (discrete) probabilistic CI structures, semi-graphoids and structural semi-graphoids. We concentrate on those frames which are closed under the operation of set-theoretical intersection because, for these, the respective families of CI models are lattices. This allows one to apply the results from lattice theory and formal concept analysis to describe such families in terms of implications among CI statements. The central concept of this paper is that of self-adhesivity defined in algebraic terms, which is a combinatorial reflection of the self-adhesivity concept studied earlier in context of polymatroids and information theory. The generalization also leads to a self-adhesivity operator defined on the hyper-level of CI frames. We answer some of the questions related to this approach and raise other open questions. The core of the paper is in computations. The combinatorial approach to computation might overcome some memory and space limitation of software packages based on polyhedral geometry, in particular, if SAT solvers are utilized. We characterize some basic CI families over 4 variables in terms of canonical implications among CI statements. We apply our method in information-theoretical context to the task of entropic region demarcation over 5 variables.

列 · 原點 · 操作 · binary · BASIC ·

2024 年 11 月 1 日

Apriori_Goal algorithm for constructing association rules for a database with a given classification

Vladimir Billig

An efficient algorithm, Apriori_Goal, is proposed for constructing association rules for a relational database with a given classification. The algorithm's features are related to the specifics of the database and the method of encoding its records. The algorithm proposes five criteria that characterize the quality of the rules being constructed. Different criteria are also proposed for filtering the sets used when constructing association rules. The proposed method of encoding records allows for an efficient implementation of the basic operation underlying the computation of rule characteristics. The algorithm works with a relational database, where the columns can be of different types, both continuous and discrete. Among the columns, a target discrete column is distinguished, which defines the classification of the records. This allows the original database to be divided into $n$ subsets according to the number of categories of the target parameter. A classical example of such databases is medical databases, where the target parameter is the diagnosis established by doctors. A preprocessor, which is an important part of the algorithm, converts the properties of the objects represented by the columns of the original database into binary properties and encodes each record as a single integer. In addition to saving memory, the proposed format allows the complete preservation of information about the binary properties representing the original record. More importantly, the computationally intensive operations on records, required for calculating rule characteristics, are performed almost instantly in this format using a pair of logical operations on integers.

XAI · 查準率/準確率 · 相似度 · 顯著圖 · 泛化理論 ·

2022 年 5 月 17 日

A psychological theory of explainability

Scott Cheng-Hsin Yang,Tomas Folke,Patrick Shafto

from arxiv, 14 pages, 2 figures, ICML (accepted, pre camera-ready version)

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.