曰本中文字幕一区二区三区高清,高清国产三级在线播放,一级黄视频大黄片,1024手机在线观看你懂的,亚洲婷婷视频免费在线播放

We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using $\alpha$-distance covariance (dCov) in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free advantage without estimating link function based on the projection on the Stiefel manifold. We establish the convergence property of the proposed estimation under some regularity conditions. We compare the performance of our method with existing SDR methods by simulation and real data analysis and show that our algorithm improves the computational efficiency and effectiveness.

相關內容

估計/估計量

關注 3

MoDELS · 語言模型化 · Performer · 大語言模型 · 數學 ·

2024 年 3 月 15 日

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

Ning Ding,Yulin Chen,Ganqu Cui,Xingtai Lv,Ruobing Xie,Bowen Zhou,Zhiyuan Liu,Maosong Sun

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typically accompanied by a sacrifice in performance in other domains. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics. A token-level gating mechanism is introduced to blend the specialists' outputs. A two-stage training strategy accompanied by balanced sampling is designed to ensure stability. To effectively train the fused model, we further construct a high-quality supervised instruction tuning dataset, UltraChat 2, which includes text, code, and mathematical content. This dataset comprises approximately 300,000 instructions and covers a wide range of topics in each domain. Experiments show that our model could simultaneously achieve mastery of the three crucial domains.

大語言模型 · MoDELS · INFORMS · 輸出 · 估計/估計量 ·

2024 年 3 月 15 日

Logits of API-Protected LLMs Leak Proprietary Information

Matthew Finlayson,Xiang Ren,Swabha Swayamdipta

The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1,000 for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We show that this lends itself to a model image or a model signature which unlocks several capabilities with affordable cost: efficiently discovering the LLM's hidden size, obtaining full-vocabulary outputs, detecting and disambiguating different model updates, identifying the source LLM given a single full LLM output, and even estimating the output layer parameters. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.

蒸餾 · Performer · 單峰值 · 模型評估 · 表示 ·

2024 年 3 月 14 日

Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations

Debaditya Shome,Ali Etemad

from arxiv, Accepted at ICASSP 2024

We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal knowledge distillation during training to learn strong linguistic and prosodic representations of emotion from speech. During inference, our method only uses a stream of speech signals to perform unimodal SER thus reducing computation overhead and avoiding run-time transcription and prosodic feature extraction errors. During training, our method distills information at both embedding and logit levels from a pair of pre-trained Prosodic and Linguistic teachers that are fine-tuned for SER. Experiments on the IEMOCAP benchmark demonstrate that our method outperforms other unimodal and multimodal techniques by a considerable margin, and achieves state-of-the-art performance of 77.49% unweighted accuracy and 78.91% weighted accuracy. Detailed ablation studies demonstrate the impact of each component of our method.

約束 · CC · binary · 講稿 · 符號學 ·

2024 年 3 月 14 日

Complexity Classification of Complex-Weighted Counting Acyclic Constraint Satisfaction Problems

Tomoyuki Yamakami

from arxiv, (A4, 10pt, 17 pages) An extended abstract of this current article is scheduled to appear in the Proceedings of the 12th Computing Conference, London, UK, July 11--12, 2024, Lecture Notes in Networks and Systems, Springer-Verlag, 2024

We study the computational complexity of counting constraint satisfaction problems (#CSPs) whose constraints assign complex numbers to Boolean inputs when the corresponding constraint hypergraphs are acyclic. These problems are called acyclic #CSPs or succinctly, #ACSPs. We wish to determine the computational complexity of all such #ACSPs when arbitrary unary constraints are freely available. Depending on whether we further allow or disallow the free use of the specific constraint XOR (binary disequality), we present two complexity classifications of the #ACSPs according to the types of constraints used for the problems. When XOR is freely available, we first obtain a complete dichotomy classification. On the contrary, when XOR is not available for free, we then obtain a trichotomy classification. To deal with an acyclic nature of constraints in those classifications, we develop a new technical tool called acyclic-T-constructibility or AT-constructibility, and we exploit it to analyze a complexity upper bound of each #ACSPs.

MoDELS · INFORMS · 優化器 · 上下文窗口 · 語言模型化 ·

2024 年 3 月 14 日

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Jennifer Hsia,Afreen Shaikh,Zhiruo Wang,Graham Neubig

Retrieval-augmented generation (RAG) greatly benefits language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). Despite its potential, the power of RAG is highly dependent on its configuration, raising the question: What is the optimal RAG configuration? To answer this, we introduce the RAGGED framework to analyze and optimize RAG systems. On a set of representative DBQA tasks, we study two classic sparse and dense retrievers, and four top-performing LMs in encoder-decoder and decoder-only architectures. Through RAGGED, we uncover that different models suit substantially varied RAG setups. While encoder-decoder models monotonically improve with more documents, we find decoder-only models can only effectively use < 5 documents, despite often having a longer context window. RAGGED offers further insights into LMs' context utilization habits, where we find that encoder-decoder models rely more on contexts and are thus more sensitive to retrieval quality, while decoder-only models tend to rely on knowledge memorized during training.

Prompt · Better · MoDELS · INTERACT · Performer ·

2024 年 3 月 14 日

Better Zero-Shot Reasoning with Role-Play Prompting

Aobo Kong,Shiwan Zhao,Hao Chen,Qicheng Li,Yong Qin,Ruiqi Sun,Xin Zhou,Enzhi Wang,Xiaohang Dong

from arxiv, NAACL 2024, Main Conference

Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs' reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%.Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to "think step by step", our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process. This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at //github.com/NKU-HLT/Role-Play-Prompting.

樣本 · 離散化 · Batch Size · 計算學習理論 ·

2024 年 3 月 13 日

Tight Group-Level DP Guarantees for DP-SGD with Sampling via Mixture of Gaussians Mechanisms

Arun Ganesh

from arxiv, v2: Added links to open-source implementation of PLD accounting for MoG mechanisms

We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).

Principle · 類別 · 情景 · CASES · 表示 ·

2024 年 3 月 13 日

Point-to-set Principle and Constructive Dimension Faithfulness

Satyadev Nandakumar,Subin Pulari,Akhil S

We introduce a constructive analogue of $\Phi$-dimension, a notion of Hausdorff dimension developed using a restricted class of coverings of a set. A class of coverings $\Phi$ is said to be "faithful" to Hausdorff dimension if the $\Phi$-dimension and Hausdorff dimension coincide for every set. We prove a Point-to-Set Principle for $\Phi$-dimension, through which we get Point-to-Set Principles for Hausdorff Dimension, continued-fraction dimension and dimension of Cantor Coverings as special cases. Using the Point-to-Set Principle for Cantor coverings and a new technique for the construction of sequences satisfying a certain Kolmogorov complexity condition, we show that the notions of faithfulness of Cantor coverings at the Hausdorff and constructive levels are equivalent. We adapt the result by Albeverio, Ivanenko, Lebid, and Torbin to derive the necessary and sufficient conditions for the constructive dimension faithfulness of the coverings generated by the Cantor series expansion. This condition yields two general classes of representations of reals, one whose constructive dimensions that are equivalent to the constructive Hausdorff dimensions, and another, whose effective dimensions are different from the effective Hausdorff dimensions, completely classifying Cantor series expansions of reals.

INFORMS · 穩健性 · 優化器 · 聯合分布 · binary ·

2024 年 3 月 13 日

Robust Decision Aggregation with Adversarial Experts

Yongkang Guo,Yuqing Kong

We consider a binary decision aggregation problem in the presence of both truthful and adversarial experts. The truthful experts will report their private signals truthfully with proper incentive, while the adversarial experts can report arbitrarily. The decision maker needs to design a robust aggregator to forecast the true state of the world based on the reports of experts. The decision maker does not know the specific information structure, which is a joint distribution of signals, states, and strategies of adversarial experts. We want to find the optimal aggregator minimizing regret under the worst information structure. The regret is defined by the difference in expected loss between the aggregator and a benchmark who makes the optimal decision given the joint distribution and reports of truthful experts. We prove that when the truthful experts are symmetric and adversarial experts are not too numerous, the truncated mean is optimal, which means that we remove some lowest reports and highest reports and take averaging among the left reports. Moreover, for many settings, the optimal aggregators are in the family of piecewise linear functions. The regret is independent of the total number of experts but only depends on the ratio of adversaries. We evaluate our aggregators by numerical experiment in an ensemble learning task. We also obtain some negative results for the aggregation problem with adversarial experts under some more general information structures and experts' report space.

INTERACT · 峰值 · FAST · 圖卷積神經網絡/圖卷積網絡 · MoDELS ·

2019 年 3 月 16 日

Fast Interactive Object Annotation with Curve-GCN

Huan Ling,Jun Gao,Amlan Kar,Wenzheng Chen,Sanja Fidler

from arxiv, In Computer Vision and Pattern Recognition (CVPR), Long Beach, US, 2019

Manually labeling objects by tracing their boundaries is a laborious process. In Polygon-RNN++ the authors proposed Polygon-RNN that produces polygonal annotations in a recurrent manner using a CNN-RNN architecture, allowing interactive correction via humans-in-the-loop. We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN). Our model is trained end-to-end. It supports object annotation by either polygons or splines, facilitating labeling efficiency for both line-based and curved objects. We show that Curve-GCN outperforms all existing approaches in automatic mode, including the powerful PSP-DeepLab and is significantly more efficient in interactive mode than Polygon-RNN++. Our model runs at 29.3ms in automatic, and 2.6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.