亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training.

相關內容

Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.

In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. However, race logic is limited in its applications due to inherent restrictions. The new HTC framework overcomes these limitations by encoding signals in both temporal and pulse rate formats for multiplication and in temporal format for propagation. This approach maintains reduced switch energy while being general enough to implement a wide range of arithmetic operations. We demonstrate how HTC multiplication is performed for both unipolar and bipolar data encoding and present the basic designs for multipliers, adders, and MAC units. Additionally, we implement two hardware accelerators: a Finite Impulse Response (FIR) filter and a Discrete Cosine Transform (DCT)/iDCT engine for image compression and DSP applications. Experimental results show that the HTC MAC has a significantly smaller power and area footprint compared to the Unary MAC design and is orders of magnitude faster. Compared to the CBSC MAC, the HTC MAC reduces power consumption by $45.2\%$ and area footprint by $50.13\%$. For the FIR design, the HTC design significantly outperforms the Unary design on all metrics. Compared to the CBSC design, the HTC-based FIR filter reduces power consumption by $36.61\%$ and area cost by $45.85\%$. The HTC-based DCT filter retains the quality of the original image with a decent PSNR, while consuming $23.34\%$ less power and occupying $18.20\%$ less area than the CBSC MAC-based DCT filter.

In this paper, we introduce the Deep Finite Volume Method (DFVM), an innovative deep learning framework tailored for solving high-order (order \(\geq 2\)) partial differential equations (PDEs). Our approach centers on a novel loss function crafted from local conservation laws derived from the original PDE, distinguishing DFVM from traditional deep learning methods. By formulating DFVM in the weak form of the PDE rather than the strong form, we enhance accuracy, particularly beneficial for PDEs with less smooth solutions compared to strong-form-based methods like Physics-Informed Neural Networks (PINNs). A key technique of DFVM lies in its transformation of all second-order or higher derivatives of neural networks into first-order derivatives which can be comupted directly using Automatic Differentiation (AD). This adaptation significantly reduces computational overhead, particularly advantageous for solving high-dimensional PDEs. Numerical experiments demonstrate that DFVM achieves equal or superior solution accuracy compared to existing deep learning methods such as PINN, Deep Ritz Method (DRM), and Weak Adversarial Networks (WAN), while drastically reducing computational costs. Notably, for PDEs with nonsmooth solutions, DFVM yields approximate solutions with relative errors up to two orders of magnitude lower than those obtained by PINN. The implementation of DFVM is available on GitHub at \href{//github.com/Sysuzqs/DFVM}{//github.com/Sysuzqs/DFVM}.

This paper develops methods for proving Lyapunov stability of dynamical systems subject to disturbances with an unknown distribution. We assume only a finite set of disturbance samples is available and that the true online disturbance realization may be drawn from a different distribution than the given samples. We formulate an optimization problem to search for a sum-of-squares (SOS) Lyapunov function and introduce a distributionally robust version of the Lyapunov function derivative constraint. We show that this constraint may be reformulated as several SOS constraints, ensuring that the search for a Lyapunov function remains in the class of SOS polynomial optimization problems. For general systems, we provide a distributionally robust chance-constrained formulation for neural network Lyapunov function search. Simulations demonstrate the validity and efficiency of either formulation on non-linear uncertain dynamical systems.

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.

In this paper we consider contamination by code generation test sets, in particular in their use in modern large language models. We discuss three possible sources of such contamination and show findings supporting each of them: (i) direct data leakage, (ii) indirect data leakage through the use of synthetic data and (iii) overfitting to evaluation sets during model selection. Key to our findings is a new dataset of 161 prompts with their associated python solutions, dataset which is released at //huggingface.co/datasets/CohereForAI/lbpp .

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax

北京阿比特科技有限公司