91精品综合久久久久久五月天,国产裸体美女永久免费无遮挡久久,国产欧美日韩综合精品久久一区,亚洲一区国产精品制服诱惑

In this paper, we introduce LLaVA-$\phi$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intricate dialogues that integrate both textual and visual elements, provided they are trained with high-quality corpora. Our model delivers commendable performance on publicly available benchmarks that encompass visual comprehension, reasoning, and knowledge-based perception. Beyond its remarkable performance in multi-modal dialogue tasks, our model opens new avenues for applications in time-sensitive environments and systems that require real-time interaction, such as embodied agents. It highlights the potential of smaller language models to achieve sophisticated levels of understanding and interaction, while maintaining greater resource efficiency.The project is available at {//github.com/zhuyiche/llava-phi}.

相關內容

語言模型化

關注 9

模型評估 · Processing（編程語言） · MoDELS · 監督 · 語言模型化 ·

2024 年 2 月 19 日

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Peiyi Wang,Lei Li,Zhihong Shao,R. X. Xu,Damai Dai,Yifei Li,Deli Chen,Y. Wu,Zhifang Sui

from arxiv, Add Step-by-Step reinforcement learning results

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) \textit{Reinforcement Learning}: Math-Shepherd is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With Math-Shepherd, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with Math-Shepherd significantly improves the accuracy of Mistral-7B (77.9\%$\to$84.1\% on GSM8K and 28.6\%$\to$33.0\% on MATH). The accuracy can be further enhanced to 89.1\% and 43.5\% on GSM8K and MATH with the verification of Math-Shepherd, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.

損失函數（機器學習） · 泛函 · 損失 · 估計/估計量 · 優化器 ·

2024 年 2 月 18 日

$α$-Divergence Loss Function for Neural Density Ratio Estimation

Yoshiaki Kitazawa

from arxiv, $\mathcal{T}_{\text{Lip}}$ in Theorem 7.1 (Theorem B.15.) was changed to the set of all locally Lipschitz continuous functions. In the previous version, $\mathcal{T}_{\text{Lip}}$ was defined as the set of all Lipschitz continuous functions, which is unsuitable for the statement of case (ii) in the theorem

Recently, neural networks have produced state-of-the-art results for density-ratio estimation (DRE), a fundamental technique in machine learning. However, existing methods bear optimization issues that arise from the loss functions of DRE: a large sample requirement of Kullback--Leibler (KL)-divergence, vanishing of train loss gradients, and biased gradients of the loss functions. Thus, an $\alpha$-divergence loss function ($\alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper. Furthermore, technical justifications for the proposed loss function are presented. The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated. Additionally, this study presents a sample requirement for DRE using the proposed loss function in terms of the upper bound of $L_1$ error, which connects a curse of dimensionality as a common problem in high-dimensional DRE tasks.

圖 · Weight · Algorithmica · 相互獨立的 · 情景 ·

2024 年 2 月 18 日

Odd Cycle Transversal on $P_5$-free Graphs in Polynomial Time

Akanksha Agrawal,Paloma T. Lima,Daniel Lokshtanov,Pawel Rz??ewski,Saket Saurabh,Roohani Sharma

An independent set in a graph G is a set of pairwise non-adjacent vertices. A graph $G$ is bipartite if its vertex set can be partitioned into two independent sets. In the Odd Cycle Transversal problem, the input is a graph $G$ along with a weight function $w$ associating a rational weight with each vertex, and the task is to find a smallest weight vertex subset $S$ in $G$ such that $G - S$ is bipartite; the weight of $S$, $w(S) = \sum_{v\in S} w(v)$. We show that Odd Cycle Transversal is polynomial-time solvable on graphs excluding $P_5$ (a path on five vertices) as an induced subgraph. The problem was previously known to be polynomial-time solvable on $P_4$-free graphs and NP-hard on $P_6$-free graphs [Dabrowski, Feghali, Johnson, Paesani, Paulusma and Rz\k{a}\.zewski, Algorithmica 2020]. Bonamy, Dabrowski, Feghali, Johnson and Paulusma [Algorithmica 2019] posed the existence of a polynomial-time algorithm on $P_5$-free graphs as an open problem, this was later re-stated by Rz\k{a}\.zewski [Dagstuhl Reports, 9(6): 2019] and by Chudnovsky, King, Pilipczuk, Rz\k{a}\.zewski, and Spirkl [SIDMA 2021], who gave an algorithm with running time $n^{O(\sqrt{n})}$.

Gossip協議 · INFORMS · Networking · 結點 · 閉式 ·

2024 年 2 月 18 日

Age of $(k,n)$-Threshold Signature Scheme on a Gossip Network

Erkan Bayram,Melih Bastopcu,Mohamed-Ali Belabbas,Tamer Ba?ar

We consider information update systems on a gossip network, which consists of a single source and $n$ receiver nodes. The source encrypts the information into $n$ distinct keys with version stamps, sending a unique key to each node. For decryption in a $(k, n)$-Threshold Signature Scheme, each receiver node requires at least $k+1$ different keys with the same version, shared over peer-to-peer connections. We consider two different schemes: a memory scheme (in which the nodes keep the source's current and previous encrypted messages) and a memoryless scheme (in which the nodes are allowed to only keep the source's current message). We measure the ''timeliness'' of information updates by using the version age of information. Our work focuses on determining closed-form expressions for the time average age of information in a heterogeneous random graph. Our work not only allows to verify the expected outcome that a memory scheme results in a lower average age compared to a memoryless scheme, but also provides the quantitative difference between the two. In our numerical results, we quantify the value of memory and demonstrate that the advantages of memory diminish with infrequent source updates, frequent gossipping between nodes, or a decrease in $k$ for a fixed number of nodes.

Performer · 均方誤差 · 代碼 · 方陣 · 均值 ·

2024 年 2 月 17 日

Wireless Distributed Matrix-Vector Multiplication using Over-the-Air Computation and Analog Coding

Jinho Choi

from arxiv, 13 pages, 8 figures

In this paper, we propose an over-the-air (OTA)-based approach for distributed matrix-vector multiplications in the context of distributed machine learning (DML). Thanks to OTA computation, the column-wise partitioning of a large matrix enables efficient workload distribution among workers (i.e., local computing nodes) based on their computing capabilities. In addition, without requiring additional bandwidth, it allows the system to remain scalable even as the number of workers increases to mitigate the impact of slow workers, known as stragglers. However, despite the improvements, there are still instances where some workers experience deep fading and become stragglers, preventing them from transmitting their results. By analyzing the mean squared error (MSE), we demonstrate that incorporating more workers in the OTA-based approach leads to MSE reduction without the need for additional radio resources. Furthermore, we introduce an analog coding scheme to further enhance the performance and compare it with conventional coded multiplication (CM) schemes. Through simulations, it is shown that the OTA-based approach achieves comparable performance to CM schemes while potentially requiring fewer radio resources.

情景 · Branch · 線性的 · 分解的 · CASE ·

2024 年 2 月 17 日

Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

Brian Hu Zhang,Ioannis Anagnostides,Gabriele Farina,Tuomas Sandholm

Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $\epsilon$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/\epsilon)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/\epsilon^2$ rounds. In this paper, we take a step toward bridging the gap between those two results. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators. We develop parameterized algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/\epsilon^2$ rounds. This closes the gap in the sense that $k=1$ recovers linear swap regret, while $k=N$ recovers swap regret. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/\epsilon^2$ rounds, where $d$ is the depth of the game, assuming constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree.

Learning · Processing（編程語言） · 前向 · Extensibility · CASES ·

2024 年 2 月 16 日

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

Kaiyi Zhang,Ang Lv,Yuhan Chen,Hansen Ha,Tao Xu,Rui Yan

In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.

向量化 · 支持向量 · 支持向量機 · 查詢向量 · 分解的 ·

2024 年 2 月 16 日

The $\ell_p$-Subspace Sketch Problem in Small Dimensions with Applications to Support Vector Machines

Yi Li,Honghao Lin,David P. Woodruff

from arxiv, Corrected the citation for Lemma 3.3 and adjusted the constants in the proof accordingly

In the $\ell_p$-subspace sketch problem, we are given an $n\times d$ matrix $A$ with $n>d$, and asked to build a small memory data structure $Q(A,\epsilon)$ so that, for any query vector $x\in\mathbb{R}^d$, we can output a number in $(1\pm\epsilon)\|Ax\|_p^p$ given only $Q(A,\epsilon)$. This problem is known to require $\tilde{\Omega}(d\epsilon^{-2})$ bits of memory for $d=\Omega(\log(1/\epsilon))$. However, for $d=o(\log(1/\epsilon))$, no data structure lower bounds were known. We resolve the memory required to solve the $\ell_p$-subspace sketch problem for any constant $d$ and integer $p$, showing that it is $\Omega(\epsilon^{-2(d-1)/(d+2p)})$ bits and $\tilde{O} (\epsilon^{-2(d-1)/(d+2p)})$ words. This shows that one can beat the $\Omega(\epsilon^{-2})$ lower bound, which holds for $d = \Omega(\log(1/\epsilon))$, for any constant $d$. We also show how to implement the upper bound in a single pass stream, with an additional multiplicative $\operatorname{poly}(\log \log n)$ factor and an additive $\operatorname{poly}(\log n)$ cost in the memory. Our bounds can be applied to point queries for SVMs with additive error, yielding an optimal bound of $\tilde{\Theta}(\epsilon^{-2d/(d+3)})$ for every constant $d$. This is a near-quadratic improvement over the $\Omega(\epsilon^{-(d+1)/(d+3)})$ lower bound of (Andoni et al. 2020). Our techniques rely on a novel connection to low dimensional techniques from geometric functional analysis.

MoDELS · 可約的 · 核函數 · Better · SimPLe ·

2024 年 2 月 15 日

Modeling Blood Alcohol Concentration Using Fractional Differential Equations Based on the $ψ$-Caputo Derivative

Om Kalthoum Wanassi,Delfim F. M. Torres

from arxiv, This is a preprint of a paper whose final and definite form is published Open Access in 'Math. Meth. Appl. Sci.' at [//doi.org/10.1002/mma.10002]

We propose a novel dynamical model for blood alcohol concentration that incorporates $\psi$-Caputo fractional derivatives. Using the generalized Laplace transform technique, we successfully derive an analytic solution for both the alcohol concentration in the stomach and the alcohol concentration in the blood of an individual. These analytical formulas provide us a straightforward numerical scheme, which demonstrates the efficacy of the $\psi$-Caputo derivative operator in achieving a better fit to real experimental data on blood alcohol levels available in the literature. In comparison to existing classical and fractional models found in the literature, our model outperforms them significantly. Indeed, by employing a simple yet non-standard kernel function $\psi(t)$, we are able to reduce the error by more than half, resulting in an impressive gain improvement of 59 percent.

視覺問答 · 數據集 · Performer · state-of-the-art · MoDELS ·

2018 年 3 月 20 日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li,Qingyi Tao,Shafiq Joty,Jianfei Cai,Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.