国产欧美日韩综合在线,无码人妻综合精品一区色欲AV

We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method$^+$ (DGFM$^{+}$). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of $\mathcal O(d^{3/2}\delta^{-1}\varepsilon ^{-4})$ for obtaining an $(\delta,\varepsilon)$-Goldstein stationary point. DGFM$^{+}$, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to $\mathcal O(d^{3/2}\delta^{-1} \varepsilon^{-3})$. Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets.

相關內容

優化器

關注 4

判別器 · Prompt · Learning · 優化器 · 大語言模型 ·

2023 年 12 月 5 日

Prompt Optimization via Adversarial In-Context Learning

Xuan Long Do,Yiran Zhao,Hannah Brown,Yuxi Xie,James Xu Zhao,Nancy F. Chen,Kenji Kawaguchi,Michael Qizhe Xie,Junxian He

from arxiv, 21 pages

We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.

SR · 知識 (knowledge) · MoDELS · 蒸餾 · Extensibility ·

2023 年 12 月 5 日

Data Upcycling Knowledge Distillation for Image Super-Resolution

Yun Zhang,Wei Li,Simiao Li,Jie Hu,Hanting Chen,Hailing Wang,Zhijun Tu,Wenjia Wang,Bingyi Jing,Yunhe Wang

Knowledge distillation (KD) emerges as a promising yet challenging technique for compressing deep neural networks, aiming to transfer extensive learning representations from proficient and computationally intensive teacher models to compact student models. However, current KD methods for super-resolution (SR) models have limited performance and restricted applications, since the characteristics of SR tasks are overlooked. In this paper, we put forth an approach from the perspective of effective data utilization, namely, the Data Upcycling Knowledge Distillation (DUKD), which facilitates the student model by the prior knowledge the teacher provided through the upcycled in-domain data derived from the input images. Besides, for the first time, we realize the label consistency regularization in KD for SR models, which is implemented by the paired invertible data augmentations. It constrains the training process of KD and leads to better generalization capability of the student model. The DUKD, due to its versatility, can be applied across a broad spectrum of teacher-student architectures (e.g., CNN and Transformer models) and SR tasks, such as single image SR, real-world SR, and SR quantization, and is in parallel with other compression techniques. Comprehensive experiments on diverse benchmarks demonstrate that the DUKD method significantly outperforms previous art.

估計/估計量 · 成比例 · MoDELS · 均值 · 稀疏 ·

2023 年 12 月 5 日

A Two-Stage Bayesian Small Area Estimation Approach for Proportions

James Hogg,Jessica Cameron,Susanna Cramb,Peter Baade,Kerrie Mengersen

from arxiv, Currently in second round of review at the International Statistical Review

With the rise in popularity of digital Atlases to communicate spatial variation, there is an increasing need for robust small-area estimates. However, current small-area estimation methods suffer from various modeling problems when data are very sparse or when estimates are required for areas with very small populations. These issues are particularly heightened when modeling proportions. Additionally, recent work has shown significant benefits in modeling at both the individual and area levels. We propose a two-stage Bayesian hierarchical small area estimation approach for proportions that can: account for survey design; reduce direct estimate instability; and generate prevalence estimates for small areas with no survey data. Using a simulation study we show that, compared with existing Bayesian small area estimation methods, our approach can provide optimal predictive performance (Bayesian mean relative root mean squared error, mean absolute relative bias and coverage) of proportions under a variety of data conditions, including very sparse and unstable data. To assess the model in practice, we compare modeled estimates of current smoking prevalence for 1,630 small areas in Australia using the 2017-2018 National Health Survey data combined with 2016 census data.

代價 · 模型評估 · 人工神經網絡 · Networking · 泛函 ·

2023 年 12 月 4 日

Maximising Quantum-Computing Expressive Power through Randomised Circuits

Yingli Yang,Zongkang Zhang,Anbang Wang,Xiaosi Xu,Xiaoting Wang,Ying Li

from arxiv, 19 pages, 10 figures

In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach for VQAs, utilizing randomised quantum circuits to generate the variational wavefunction. We parameterize the distribution function of these random circuits using artificial neural networks and optimize it to find the solution. This random-circuit approach presents a trade-off between the expressive power of the variational wavefunction and time cost, in terms of the sampling cost of quantum circuits. Given a fixed gate number, we can systematically increase the expressive power by extending the quantum-computing time. With a sufficiently large permissible time cost, the variational wavefunction can approximate any quantum state with arbitrary accuracy. Furthermore, we establish explicit relationships between expressive power, time cost, and gate number for variational quantum eigensolvers. These results highlight the promising potential of the random-circuit approach in achieving a high expressive power in quantum computing.

泛函 · GROUP · 同質 · 塑造 · 正則化項 ·

2023 年 12 月 4 日

Coefficient Shape Alignment in Multivariate Functional Regression

Shuhao Jiao,Ngai-Hang Chan

In multivariate functional data analysis, different functional covariates can be homogeneous in some sense. The hidden homogeneity structure is informative about the connectivity or association of different covariates. The covariates with pronounced homogeneity can be analyzed jointly in the same group and this gives rise to a way of parsimoniously modeling multivariate functional data. In this paper, we develop a multivariate functional regression technique by a new regularization approach termed "coefficient shape alignment" to tackle the potential homogeneity of different functional covariates. The modeling procedure includes two main steps: first the unknown grouping structure is detected with a new regularization approach to aggregate covariates into disjoint groups; and then a grouped multivariate functional regression model is established based on the detected grouping structure. In this new grouped model, the coefficient functions of covariates in the same homogeneous group share the same shape invariant to scaling. The new regularization approach builds on penalizing the discrepancy of coefficient shape. The consistency property of the detected grouping structure is thoroughly investigated, and the conditions that guarantee uncovering the underlying true grouping structure are developed. The asymptotic properties of the model estimates are also developed. Extensive simulation studies are conducted to investigate the finite-sample properties of the developed methods. The practical utility of the proposed methods is illustrated in an analysis on sugar quality evaluation. This work provides a novel means for analyzing the underlying homogeneity of functional covariates and developing parsimonious model structures for multivariate functional data.

分布式表示 · 核化 · 表示 · 極大 · 穩健性 ·

2023 年 11 月 30 日

Robust Concept Erasure via Kernelized Rate-Distortion Maximization

Somnath Basu Roy Chowdhury,Nicholas Monath,Avinava Dubey,Amr Ahmed,Snigdha Chaturvedi

from arxiv, NeurIPS 2023

Distributed representations provide a vector space that captures meaningful relationships between data instances. The distributed nature of these representations, however, entangles together multiple attributes or concepts of data instances (e.g., the topic or sentiment of a text, characteristics of the author (age, gender, etc), etc). Recent work has proposed the task of concept erasure, in which rather than making a concept predictable, the goal is to remove an attribute from distributed representations while retaining other information from the original representation space as much as possible. In this paper, we propose a new distance metric learning-based objective, the Kernelized Rate-Distortion Maximizer (KRaM), for performing concept erasure. KRaM fits a transformation of representations to match a specified distance measure (defined by a labeled concept to erase) using a modified rate-distortion function. Specifically, KRaM's objective function aims to make instances with similar concept labels dissimilar in the learned representation space while retaining other information. We find that optimizing KRaM effectively erases various types of concepts: categorical, continuous, and vector-valued variables from data representations across diverse domains. We also provide a theoretical analysis of several properties of KRaM's objective. To assess the quality of the learned representations, we propose an alignment score to evaluate their similarity with the original representation space. Additionally, we conduct experiments to showcase KRaM's efficacy in various settings, from erasing binary gender variables in word embeddings to vector-valued variables in GPT-3 representations.

Learning · 未標記 · 情景 · Extensibility · 類別 ·

2023 年 11 月 29 日

Improving Open-Set Semi-Supervised Learning with Self-Supervision

Erik Wallin,Lennart Svensson,Fredrik Kahl,Lars Hammarstrand

from arxiv, WACV2024

Open-set semi-supervised learning (OSSL) embodies a practical scenario within semi-supervised learning, wherein the unlabeled training set encompasses classes absent from the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data belonging to unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations that our method yields state-of-the-art results on many of the evaluated benchmark problems in terms of closed-set accuracy and open-set recognition when compared with existing methods for OSSL. Our code is available at //github.com/walline/ssl-tf2-sefoss.

INFORMS · 回合 · 變換 · Things · 操作 ·

2021 年 6 月 3 日

Image-Audio Encoding to Improve C2 Decision-Making in Multi-Domain Environment

Piyush K. Sharma,Adrienne Raglin

from arxiv, Published in: The 25th International Command and Control Research and Technology Symposium (ICCRTS - 2020)

The military is investigating methods to improve communication and agility in its multi-domain operations (MDO). Nascent popularity of Internet of Things (IoT) has gained traction in public and government domains. Its usage in MDO may revolutionize future battlefields and may enable strategic advantage. While this technology offers leverage to military capabilities, it comes with challenges where one is the uncertainty and associated risk. A key question is how can these uncertainties be addressed. Recently published studies proposed information camouflage to transform information from one data domain to another. As this is comparatively a new approach, we investigate challenges of such transformations and how these associated uncertainties can be detected and addressed, specifically unknown-unknowns to improve decision-making.

多峰值 · 可辨認的 · MAML · 圖片分類 · 小樣本學習 ·

2019 年 10 月 30 日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Risto Vuorio,Shao-Hua Sun,Hexiang Hu,Joseph J. Lim

Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates. With the flexibility in the choice of models, those frameworks demonstrate appealing performance on a variety of domains such as few-shot image classification and reinforcement learning. However, one important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from. In this paper, we augment MAML with the capability to identify the mode of tasks sampled from a multimodal task distribution and adapt quickly through gradient updates. Specifically, we propose a multimodal MAML (MMAML) framework, which is able to modulate its meta-learned prior parameters according to the identified mode, allowing more efficient fast adaptation. We evaluate the proposed model on a diverse set of few-shot learning tasks, including regression, image classification, and reinforcement learning. The results not only demonstrate the effectiveness of our model in modulating the meta-learned prior in response to the characteristics of tasks but also show that training on a multimodal distribution can produce an improvement over unimodal training.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 1 月 15 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.