亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work, we conduct a systematic study of stochastic saddle point problems (SSP) and stochastic variational inequalities (SVI) under the constraint of $(\epsilon,\delta)$-differential privacy (DP) in both Euclidean and non-Euclidean setups. We first consider Lipschitz convex-concave SSPs in the $\ell_p/\ell_q$ setup, $p,q\in[1,2]$. Here, we obtain a bound of $\tilde{O}\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$ on the strong SP-gap, where $n$ is the number of samples and $d$ is the dimension. This rate is nearly optimal for any $p,q\in[1,2]$. Without additional assumptions, such as smoothness or linearity requirements, prior work under DP has only obtained this rate when $p=q=2$ (i.e., only in the Euclidean setup). Further, existing algorithms have each only been shown to work for specific settings of $p$ and $q$ and under certain assumptions on the loss and the feasible set, whereas we provide a general algorithm for DP SSPs whenever $p,q\in[1,2]$. Our result is obtained via a novel analysis of the recursive regularization algorithm. In particular, we develop new tools for analyzing generalization, which may be of independent interest. Next, we turn our attention towards SVIs with a monotone, bounded and Lipschitz operator and consider $\ell_p$-setups, $p\in[1,2]$. Here, we provide the first analysis which obtains a bound on the strong VI-gap of $\tilde{O}\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$. For $p-1=\Omega(1)$, this rate is near optimal due to existing lower bounds. To obtain this result, we develop a modified version of recursive regularization. Our analysis builds on the techniques we develop for SSPs as well as employing additional novel components which handle difficulties arising from adapting the recursive regularization framework to SVIs.

相關內容

在數學中,鞍(an)點(dian)或極(ji)大極(ji)小(xiao)(xiao)點(dian)是函數圖形表(biao)面上的(de)一(yi)點(dian),其正(zheng)交(jiao)方向上的(de)斜率(導數)都為零(ling),但它(ta)不是函數的(de)局(ju)部極(ji)值。鞍(an)點(dian)是在某(mou)一(yi)軸向(峰值之間)有(you)一(yi)個(ge)相對最(zui)小(xiao)(xiao)的(de)臨界(jie)點(dian),在交(jiao)叉(cha)軸上有(you)一(yi)個(ge)相對最(zui)大的(de)臨界(jie)點(dian)。

In this work, we propose a novel activation mechanism called LayerAct for CNNs. This approach is motivated by our theoretical and experimental analyses, which demonstrate that Layer Normalization (LN) can mitigate a limitation of existing activation functions regarding noise robustness. However, LN is known to be disadvantageous in CNNs due to its tendency to make activation outputs homogeneous. The proposed method is designed to be more robust than existing activation functions by reducing the upper bound of influence caused by input shifts without inheriting LN's limitation. We provide analyses and experiments showing that LayerAct functions exhibit superior robustness compared to ElementAct functions. Experimental results on three clean and noisy benchmark datasets for image classification tasks indicate that LayerAct functions outperform other activation functions in handling noisy datasets while achieving superior performance on clean datasets in most cases.

In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. Experimental results show that MOSA-Net can improve the linear correlation coefficient (LCC) by 0.026 (0.990 vs 0.964 in seen noise environments) and 0.012 (0.969 vs 0.957 in unseen noise environments) in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC by 0.021 (0.985 vs 0.964 in seen noise environments) and 0.047 (0.836 vs 0.789 in unseen noise environments) in short-time objective intelligibility (STOI) prediction, compared to STOI-Net (based on CRNN), an existing single-task model for STOI prediction. Moreover, MOSA-Net, originally trained to assess objective scores, can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC by 0.018 (0.805 vs 0.787) in mean opinion score (MOS) prediction, compared to MOS-SSL, a strong single-task model for MOS prediction. In light of the confirmed prediction capability, we further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach accordingly. Experimental results show that QIA-SE provides superior enhancement performance compared with the baseline SE system in terms of objective evaluation metrics and qualitative evaluation test. For example, QIA-SE can improve PESQ by 0.301 (2.953 vs 2.652 in seen noise environments) and 0.18 (2.658 vs 2.478 in unseen noise environments) over a CNN-based baseline SE model.

We propose scaling up 3D scene reconstruction by training with synthesized data. At the core of our work is MegaSynth, a procedurally generated 3D dataset comprising 700K scenes - over 50 times larger than the prior real dataset DL3DV - dramatically scaling the training data. To enable scalable data generation, our key idea is eliminating semantic information, removing the need to model complex semantic priors such as object affordances and scene composition. Instead, we model scenes with basic spatial structures and geometry primitives, offering scalability. Besides, we control data complexity to facilitate training while loosely aligning it with real-world data distribution to benefit real-world generalization. We explore training LRMs with both MegaSynth and available real data. Experiment results show that joint training or pre-training with MegaSynth improves reconstruction quality by 1.2 to 1.8 dB PSNR across diverse image domains. Moreover, models trained solely on MegaSynth perform comparably to those trained on real data, underscoring the low-level nature of 3D reconstruction. Additionally, we provide an in-depth analysis of MegaSynth's properties for enhancing model capability, training stability, and generalization.

In this work, we consider an online robust Markov Decision Process (MDP) where we have the information of finitely many prototypes of the underlying transition kernel. We consider an adaptively updated ambiguity set of the prototypes and propose an algorithm that efficiently identifies the true underlying transition kernel while guaranteeing the performance of the corresponding robust policy. To be more specific, we provide a sublinear regret of the subsequent optimal robust policy. We also provide an early stopping mechanism and a worst-case performance bound of the value function. In numerical experiments, we demonstrate that our method outperforms existing approaches, particularly in the early stage with limited data. This work contributes to robust MDPs by considering possible prior information about the underlying transition probability and online learning, offering both theoretical insights and practical algorithms for improved decision-making under uncertainty.

The reasoning abilities of Large Language Models (LLMs) are attracting increasing attention. In this work, we focus on causal reasoning and address the task of establishing causal relationships based on correlation information, a highly challenging problem on which several LLMs have shown poor performance. We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions, with each subquestion corresponding to one step of a formal causal discovery algorithm, the PC algorithm. The proposed prompting strategy, PC-SubQ, guides the LLM to follow these algorithmic steps, by sequentially prompting it with one subquestion at a time, augmenting the next subquestion's prompt with the answer to the previous one(s). We evaluate our approach on an existing causal benchmark, Corr2Cause: our experiments indicate a performance improvement across five LLMs when comparing PC-SubQ to baseline prompting strategies. Results are robust to causal query perturbations, when modifying the variable names or paraphrasing the expressions.

In this study, we aim to explore the effect of pre-trained conditional generative speech models for the first time on dysarthric speech due to Parkinson's disease recorded in an ideal/non-noisy condition. Considering one category of generative models, i.e., diffusion-based speech enhancement, these models are previously trained to learn the distribution of clean (i.e, recorded in a noise-free environment) typical speech signals. Therefore, we hypothesized that when being exposed to dysarthric speech they might remove the unseen atypical paralinguistic cues during the enhancement process. By considering the automatic dysarthric speech detection task, in this study, we experimentally show that during the enhancement process of dysarthric speech data recorded in an ideal non-noisy environment, some of the acoustic dysarthric speech cues are lost. Therefore such pre-trained models are not yet suitable in the context of dysarthric speech enhancement since they manipulate the pathological speech cues when they process clean dysarthric speech. Furthermore, we show that the removed acoustics cues by the enhancement models in the form of residue speech signal can provide complementary dysarthric cues when fused with the original input speech signal in the feature space.

In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlling which text prompt tokens attend to which visual tokens. We optimize a learnable visual token based on an energy function, enhancing the strength of referential regions in the attention map. This enables detailed region description and reasoning without the need for substantial training costs or model retraining. Our method offers a promising direction for integrating referential abilities into MLLMs. Our method support referring with box, mask, scribble and point. The results demonstrate that our method exhibits controllability and interpretability.

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swapping, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$\alpha$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at //github.com/gojasper/flash-diffusion.

In the past decade, we have witnessed the rise of deep learning to dominate the field of artificial intelligence. Advances in artificial neural networks alongside corresponding advances in hardware accelerators with large memory capacity, together with the availability of large datasets enabled researchers and practitioners alike to train and deploy sophisticated neural network models that achieve state-of-the-art performance on tasks across several fields spanning computer vision, natural language processing, and reinforcement learning. However, as these neural networks become bigger, more complex, and more widely used, fundamental problems with current deep learning models become more apparent. State-of-the-art deep learning models are known to suffer from issues that range from poor robustness, inability to adapt to novel task settings, to requiring rigid and inflexible configuration assumptions. Ideas from collective intelligence, in particular concepts from complex systems such as self-organization, emergent behavior, swarm optimization, and cellular systems tend to produce solutions that are robust, adaptable, and have less rigid assumptions about the environment configuration. It is therefore natural to see these ideas incorporated into newer deep learning methods. In this review, we will provide a historical context of neural network research's involvement with complex systems, and highlight several active areas in modern deep learning research that incorporate the principles of collective intelligence to advance its current capabilities. To facilitate a bi-directional flow of ideas, we also discuss work that utilize modern deep learning models to help advance complex systems research. We hope this review can serve as a bridge between complex systems and deep learning communities to facilitate the cross pollination of ideas and foster new collaborations across disciplines.

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.

北京阿比特科技有限公司