亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<dir id='lxlpi'><del id='lxlpi'><del id='lxlpi'></del><pre id='lxlpi'><pre id='lxlpi'><option id='lxlpi'><address id='lxlpi'></address><bdo id='lxlpi'><tr id='lxlpi'><acronym id='lxlpi'><pre id='lxlpi'></pre></acronym><div id='lxlpi'></div></tr></bdo></option></pre><small id='lxlpi'><address id='lxlpi'><u id='lxlpi'><legend id='lxlpi'><option id='lxlpi'><abbr id='lxlpi'></abbr><li id='lxlpi'><pre id='lxlpi'></pre></li></option></legend><select id='lxlpi'></select></u></address></small></pre></del><sup id='lxlpi'></sup><blockquote id='lxlpi'><dt id='lxlpi'></dt></blockquote><blockquote id='lxlpi'></blockquote></dir><tt id='lxlpi'></tt><u id='lxlpi'><tt id='lxlpi'><form id='lxlpi'></form></tt><td id='lxlpi'><dt id='lxlpi'></dt></td></u>

<code id='lxlpi'><i id='lxlpi'><q id='lxlpi'><legend id='lxlpi'><pre id='lxlpi'><style id='lxlpi'><acronym id='lxlpi'><i id='lxlpi'><form id='lxlpi'><option id='lxlpi'><center id='lxlpi'></center></option></form></i></acronym></style><tt id='lxlpi'></tt></pre></legend></q></i></code><center id='lxlpi'></center>

<dd id='lxlpi'></dd>

<style id='lxlpi'></style><sub id='lxlpi'><dfn id='lxlpi'><abbr id='lxlpi'><big id='lxlpi'><bdo id='lxlpi'></bdo></big></abbr></dfn></sub>_{<dir id='lxlpi'></dir>}

·

MoDELS · 語言模型化 · 縮放 · Extensibility · Excel ·

2024 年 6 月 3 日

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Shengding Hu,Yuge Tu,Xu Han,Chaoqun He,Ganqu Cui,Xiang Long,Zhi Zheng,Yewei Fang,Yuxiang Huang,Weilin Zhao,Xinrong Zhang,Zheng Leng Thai,Kaihuo Zhang,Chongyi Wang,Yuan Yao,Chenyang Zhao,Jie Zhou,Jie Cai,Zhongwu Zhai,Ning Ding,Chao Jia,Guoyang Zeng,Dahai Li,Zhiyuan Liu,Maosong Sun

from arxiv, revise according to peer review

The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at //github.com/OpenBMB/MiniCPM .

相關內容

MoDELS

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 控制器 · Learning · 語言模型化 · Continuity ·

2024 年 7 月 15 日

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Georgios Tziafas,Hamidreza Kasaei

from arxiv, ICRA 2024

Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work, we introduce LRLL, an LLM-based lifelong learning agent that continuously grows the robot skill library to tackle manipulation tasks of ever-growing complexity. LRLL achieves this with four novel contributions: 1) a soft memory module that allows dynamic storage and retrieval of past experiences to serve as context, 2) a self-guided exploration policy that proposes new tasks in simulation, 3) a skill abstractor that distills recent experiences into new library skills, and 4) a lifelong learning algorithm for enabling human users to bootstrap new skills with minimal online interaction. LRLL continuously transfers knowledge from the memory to the library, building composable, general and interpretable policies, while bypassing gradient-based optimization, thus relieving the learner from catastrophic forgetting. Empirical evaluation in a simulated tabletop environment shows that LRLL outperforms end-to-end and vanilla LLM approaches in the lifelong setup while learning skills that are transferable to the real world. Project material will become available at the webpage //gtziafas.github.io/LRLL_project.

分解的 · AI · MoDELS · 有偏 · 相同 ·

2024 年 7 月 15 日

Laypeople's Egocentric Perceptions of Copyright for AI-Generated Art

Gabriel Lima,Nina Grgi?-Hla?a,Elissa Redmiles

Recent breakthroughs in generative AI (GenAI) have fueled debates concerning the status of AI-generated creations under copyright law. This research investigates laypeople's perceptions ($N$ = 424) of AI-generated art concerning factors associated with copyright protection. Inspired by prior work suggesting that people show egocentric biases when evaluating their own creative outputs, we also test if the same holds for AI-generated art. Namely, we study the differences between the perceptions of those who have something to gain from copyright protection -- creators of AI-generated art -- and uninvested third parties. To answer our research questions, we held an incentivized AI art competition, in which some participants used a GenAI model to generate images for consideration while others evaluated these submissions. We find that participants are most likely to attribute authorship and copyright over AI-generated images to the users who prompted the AI system to generate the image and the artists whose creations were used for training the AI model. We also find that participants egocentrically favored their own art over other participants' art and rated their own creations higher than other people evaluated them. Moreover, our results suggest that people judge their own AI-generated art more favorably with respect to some factors (creativity and effort) but not others (skills). Our findings have implications for future debates concerning the potential copyright protection of AI-generated outputs.

Networking · 卷積 · 變換 · 代碼 · 可約的 ·

2024 年 7 月 13 日

WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model

Haisheng Fu,Jie Liang,Zhenman Fang,Jingning Han,Feng Liang,Guohe Zhang

from arxiv, 16 pages, ECCV2024

Recently learned image compression (LIC) has achieved great progress and even outperformed the traditional approach using DCT or discrete wavelet transform (DWT). However, LIC mainly reduces spatial redundancy in the autoencoder networks and entropy coding, but has not fully removed the frequency-domain correlation explicitly as in DCT or DWT. To leverage the best of both worlds, we propose a surprisingly simple but efficient framework, which introduces the DWT to both the convolution layers and entropy coding of CNN-based LIC. First, in both the core and hyperprior autoencoder networks, we propose a Wavelet-domain Convolution (WeConv) module, which performs convolution after DWT, and then converts the data back to spatial domain via inverse DWT. This module is used at selected layers in a CNN network to reduce the frequency-domain correlation explicitly and make the signal sparser in DWT domain. We also propose a wavelet-domain Channel-wise Auto-Regressive entropy Model (WeChARM), where the output latent representations from the encoder network are first transformed by the DWT, before applying quantization and entropy coding, as in the traditional paradigm. Moreover, the entropy coding is split into two steps. We first code all low-frequency DWT coefficients, and then use them as prior to code high-frequency coefficients. The channel-wise entropy coding is further used in each step. By combining WeConv and WeChARM, the proposed WeConvene scheme achieves superior R-D performance compared to other state-of-the-art LIC methods as well as the latest H.266/VVC. For the Kodak dataset and the baseline network with -0.4% BD-Rate saving over H.266/VVC, introducing WeConv with the simplest Haar transform improves the saving to -4.7%. This is quite impressive given the simplicity of the Haar transform. Enabling Haar-based WeChARM entropy coding further boosts the saving to -8.2%.

FAST · 路徑 · state-of-the-art · DAG · 塊 ·

2024 年 7 月 13 日

Mysticeti: Reaching the Limits of Latency with Uncertified DAGs

Kushal Babel,Andrey Chursin,George Danezis,Anastasios Kichidis,Lefteris Kokoris-Kogias,Arun Koshy,Alberto Sonnino,Mingwei Tian

We introduce Mysticeti-C, the first DAG-based Byzantine consensus protocol to achieve the lower bounds of latency of 3 message rounds. Since Mysticeti-C is built over DAGs it also achieves high resource efficiency and censorship resistance. Mysticeti-C achieves this latency improvement by avoiding explicit certification of the DAG blocks and by proposing a novel commit rule such that every block can be committed without delays, resulting in optimal latency in the steady state and under crash failures. We further extend Mysticeti-C to Mysticeti-FPC, which incorporates a fast commit path that achieves even lower latency for transferring assets. Unlike prior fast commit path protocols, Mysticeti-FPC minimizes the number of signatures and messages by weaving the fast path transactions into the DAG. This frees up resources, which subsequently result in better performance. We prove the safety and liveness in a Byzantine context. We evaluate both Mysticeti protocols and compare them with state-of-the-art consensus and fast path protocols to demonstrate their low latency and resource efficiency, as well as their more graceful degradation under crash failures. Mysticeti-C is the first Byzantine consensus protocol to achieve WAN latency of 0.5s for consensus commit while simultaneously maintaining state-of-the-art throughput of over 200k TPS. Finally, we report on integrating Mysticeti-C as the consensus protocol into the Sui blockchain, resulting in over 4x latency reduction.

Agent · 多峰值 · 論文 · INTERACT · 語言模型化 ·

2024 年 7 月 12 日

Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

Yulong Yang,Xinshan Yang,Shuaidong Li,Chenhao Lin,Zhengyu Zhao,Chao Shen,Tianwei Zhang

from arxiv, Preprint. Work in progress

The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and device screenshots as inputs. Despite the increased human-machine interaction efficiency, the security risks of MLLM-based mobile agent systems have not been systematically studied. Existing security benchmarks for agents mainly focus on Web scenarios, and the attack techniques against MLLMs are also limited in the mobile agent scenario. To close these gaps, this paper proposes a mobile agent security matrix covering 3 functional modules of the agent systems. Based on the security matrix, this paper proposes 4 realistic attack paths and verifies these attack paths through 8 attack methods. By analyzing the attack results, this paper reveals that MLLM-based mobile agent systems are not only vulnerable to multiple traditional attacks, but also raise new security concerns previously unconsidered. This paper highlights the need for security awareness in the design of MLLM-based systems and paves the way for future research on attacks and defense methods.

Performer · 相互獨立的 · SIR · 聯系函數 · Minimax ·

2024 年 7 月 12 日

On the Structural Dimension of Sliced Inverse Regression

Dongming Huang,Songtao Tian,Qian Lin

from arxiv, 75 pages, 10 figures

In this work, we address the longstanding puzzle that Sliced Inverse Regression (SIR) often performs poorly for sufficient dimension reduction when the structural dimension $d$ (the dimension of the central space) exceeds 4. We first show that in the multiple index model $Y=f( \mathbf{P} \boldsymbol{X})+\epsilon$ where $\boldsymbol{X}$ is a $p$-standard normal vector, $\epsilon$ is an independent noise, and $\mathbf{P}$ is a projection operator from $\mathbb R^{p}$ to $\mathbb R^{d}$, if the link function $f$ follows the law of a Gaussian process, then with high probability, the $d$-th eigenvalue $\lambda_{d}$ of $\mathrm{Cov}\left[\mathbb{E}(\boldsymbol{X}\mid Y)\right]$ satisfies $\lambda_{d}\leq C e^{-\theta d}$ for some positive constants $C$ and $\theta$. We then focus on the low signal regime where $\lambda_{d}$ can be arbitrarily small and not larger than $d^{-8.1}$, and prove that the minimax risk of estimating the central space is lower bounded by $\frac{dp}{n\lambda_{d}}$. Combining these two results, we provide a convincing explanation for the poor performance of SIR when $d$ is large, a phenomenon that has perplexed researchers for nearly three decades. The technical tools developed here may be of independent interest for studying other sufficient dimension reduction methods.

多樣性 · MoDELS · 語言模型化 · 塊 · 推斷 ·

2024 年 7 月 11 日

MCSD: An Efficient Language Model with Diverse Fusion

Hua Yang,Duohai Li,Shiman Li

from arxiv, 8 pages, 9 figures

Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to $O(1)$ and time complexity to $O(N)$ respectively. Our experiments show that MCSD attains higher throughput and lower GPU memory consumption compared to Transformers, while maintaining comparable performance to larger-scale language learning models on benchmark tests. These attributes position MCSD as a promising base for edge deployment and embodied intelligence.

MoDELS · 語言模型化 · 數據集 · 大語言模型 · 多樣性 ·

2024 年 7 月 10 日

AlpaCare:Instruction-tuned Large Language Models for Medical Application

Xinlu Zhang,Chenxin Tian,Xianjun Yang,Lichang Chen,Zekun Li,Linda Ruth Petzold

Instruction-finetuning (IFT) has become crucial in aligning Large Language Models (LLMs) with diverse human needs and has shown great potential in medical applications. However, previous studies mainly fine-tune LLMs on biomedical datasets with limited diversity, which often rely on benchmarks or narrow task scopes, and hence significantly limit the effectiveness on their medical instruction-following ability and generalizability. To bridge this gap, we propose creating a diverse, machine-generated medical IFT dataset, MedInstruct-52k, using GPT-4 and ChatGPT with a high-quality expert-curated seed set. We then fine-tune LLaMA-series models on the dataset to develop AlpaCare. Despite using a smaller domain-specific dataset than previous medical LLMs, AlpaCare not only demonstrates superior performance on medical applications, with up to 38.1% absolute gain over best baselines in medical free-form instruction evaluations, but also achieves 6.7% absolute gains averaged over multiple general domain benchmarks. Human evaluation further shows that AlpaCare consistently outperforms best baselines in terms of both correctness and helpfulness. We offer public access to our data, model, and codebase in //github.com/XZhang97666/AlpaCare.

Agent · INTERACT · 回合 · 多峰值 · AI ·

2024 年 1 月 7 日

Agent AI: Surveying the Horizons of Multimodal Interaction

Zane Durante,Qiuyuan Huang,Naoki Wake,Ran Gong,Jae Sung Park,Bidipta Sarkar,Rohan Taori,Yusuke Noda,Demetri Terzopoulos,Yejin Choi,Katsushi Ikeuchi,Hoi Vo,Li Fei-Fei,Jianfeng Gao

Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied action with infinite agent. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.

剪枝 · Better · CAP · contrastive · MoDELS ·

2021 年 12 月 14 日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Runxin Xu,Fuli Luo,Chengyu Wang,Baobao Chang,Jun Huang,Songfang Huang,Fei Huang

from arxiv, Accepted to AAAI 2022

Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

語(yu)言模型化

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<form id='lxlpi'></form>

<bdo id='lxlpi'><sup id='lxlpi'><div id='lxlpi'><bdo id='lxlpi'></bdo></div></sup></bdo>