亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions.

相關內容

Since the work of Polyanskiy, Poor and Verd\'u on the finite blocklength performance of capacity-achieving codes for discrete memoryless channels, many papers have attempted to find further results for more practically relevant channels. However, it seems that the complexity of computing capacity-achieving codes has not been investigated until now. We study this question for the simplest non-trivial Gaussian channels, i.e., the additive colored Gaussian noise channel. To assess the computational complexity, we consider the classes $\mathrm{FP}_1$ and $\#\mathrm{P}_1$. $\mathrm{FP}_1$ includes functions computable by a deterministic Turing machine in polynomial time, whereas $\#\mathrm{P}_1$ encompasses functions that count the number of solutions verifiable in polynomial time. It is widely assumed that $\mathrm{FP}_1\neq\#\mathrm{P}_1$. It is of interest to determine the conditions under which, for a given $M \in \mathbb{N}$, where $M$ describes the precision of the deviation of $C(P,N)$, for a certain blocklength $n_M$ and a decoding error $\epsilon > 0$ with $\epsilon\in\mathbb{Q}$, the following holds: $R_{n_M}(\epsilon)>C(P,N)-\frac{1}{2^M}$. It is shown that there is a polynomial-time computable $N_*$ such that for sufficiently large $P_*\in\mathbb{Q}$, the sequences $\{R_{n_M}(\epsilon)\}_{{n_M}\in\mathbb{N}}$, where each $R_{n_M}(\epsilon)$ satisfies the previous condition, cannot be computed in polynomial time if $\mathrm{FP}_1\neq\#\mathrm{P}_1$. Hence, the complexity of computing the sequence $\{R_{n_M}(\epsilon)\}_{n_M\in\mathbb{N}}$ grows faster than any polynomial as $M$ increases. Consequently, it is shown that either the sequence of achievable rates $\{R_{n_M}(\epsilon)\}_{n_M\in\mathbb{N}}$ as a function of the blocklength, or the sequence of blocklengths $\{n_M\}_{M\in\mathbb{N}}$ corresponding to the achievable rates, is not a polynomial-time computable sequence.

Shared dynamics models are important for capturing the complexity and variability inherent in Human-Robot Interaction (HRI). Therefore, learning such shared dynamics models can enhance coordination and adaptability to enable successful reactive interactions with a human partner. In this work, we propose a novel approach for learning a shared latent space representation for HRIs from demonstrations in a Mixture of Experts fashion for reactively generating robot actions from human observations. We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior that captures the multimodality of the human observations via a Mixture Density Network (MDN). We show how our formulation derives from a Gaussian Mixture Regression formulation that is typically used approaches for learning HRI from demonstrations such as using an HMM/GMM for learning a joint distribution over the actions of the human and the robot. We further incorporate an additional regularization to prevent "mode collapse", a common phenomenon when using latent space mixture models with VAEs. We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions compared to previous HMM-based or recurrent approaches of learning shared latent representations, which we validate on various HRI datasets involving interactions such as handshakes, fistbumps, waving, and handovers. Further experiments in a real-world human-to-robot handover scenario show the efficacy of our approach for generating successful interactions with four different human interaction partners.

We consider the proof system Res($\oplus$) introduced by Itsykson and Sokolov (Ann. Pure Appl. Log.'20), which is an extension of the resolution proof system and operates with disjunctions of linear equations over $\mathbb{F}_2$. We study characterizations of tree-like size and space of Res($\oplus$) refutations using combinatorial games. Namely, we introduce a class of extensible formulas and prove tree-like size lower bounds on it using Prover-Delayer games, as well as space lower bounds. This class is of particular interest since it contains many classical combinatorial principles, including the pigeonhole, ordering, and dense linear ordering principles. Furthermore, we present the width-space relation for Res($\oplus$) generalizing the results by Atserias and Dalmau (J. Comput. Syst. Sci.'08) and their variant of Spoiler-Duplicator games.

Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, fundamentally reshaping the landscape of natural language processing (NLP) research. However, recent evaluation frameworks often rely on the output probabilities of LLMs for predictions, primarily due to computational constraints, diverging from real-world LLM usage scenarios. While widely employed, the efficacy of these probability-based evaluation strategies remains an open research question. This study aims to scrutinize the validity of such probability-based evaluation methods within the context of using LLMs for Multiple Choice Questions (MCQs), highlighting their inherent limitations. Our empirical investigation reveals that the prevalent probability-based evaluation method inadequately aligns with generation-based prediction. Furthermore, current evaluation frameworks typically assess LLMs through predictive tasks based on output probabilities rather than directly generating responses, owing to computational limitations. We illustrate that these probability-based approaches do not effectively correspond with generative predictions. The outcomes of our study can enhance the understanding of LLM evaluation methodologies and provide insights for future research in this domain.

Finding the most sparse solution to the underdetermined system $\mathbf{y}=\mathbf{Ax}$, given a tolerance, is known to be NP-hard. A popular way to approximate a sparse solution is by using Greedy Pursuit algorithms, and Orthogonal Matching Pursuit (OMP) is one of the most widely used such solutions. For this paper, we implemented an efficient implementation of OMP that leverages Cholesky inverse properties as well as the power of Graphics Processing Units (GPUs) to deliver up to 200x speedup over the OMP implementation found in Scikit-Learn.

CLIP models have recently shown to exhibit Out of Distribution (OoD) generalization capabilities. However, Compositional Out of Distribution (C-OoD) generalization, which is a crucial aspect of a model's ability to understand unseen compositions of known concepts, is relatively unexplored for the CLIP models. Our goal is to address this problem and identify the factors that contribute to the C-OoD in CLIPs. We noted that previous studies regarding compositional understanding of CLIPs frequently fail to ensure that test samples are genuinely novel relative to the CLIP training data. To this end, we carefully synthesized a large and diverse dataset in the single object setting, comprising attributes for objects that are highly unlikely to be encountered in the combined training datasets of various CLIP models. This dataset enables an authentic evaluation of C-OoD generalization. Our observations reveal varying levels of C-OoD generalization across different CLIP models. We propose that the disentanglement of CLIP representations serves as a critical indicator in this context. By utilizing our synthesized datasets and other existing datasets, we assess various disentanglement metrics of text and image representations. Our study reveals that the disentanglement of image and text representations, particularly with respect to their compositional elements, plays a crucial role in improving the generalization of CLIP models in out-of-distribution settings. This finding suggests promising opportunities for advancing out-of-distribution generalization in CLIPs.

We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the $F_{1}$-norm (or the related Barron norm) for large width adaptivity. We show that the global minimizer of the regularized loss of DNNs can fit for example the composition of two functions $f^{*}=h\circ g$ from a small number of observations, assuming $g$ is smooth/regular and reduces the dimensionality (e.g. $g$ could be the modulo map of the symmetries of $f^{*}$), so that $h$ can be learned in spite of its low regularity. The measures of regularity we consider is the Sobolev norm with different levels of differentiability, which is well adapted to the $F_{1}$ norm. We compute scaling laws empirically and observe phase transitions depending on whether $g$ or $h$ is harder to learn, as predicted by our theory.

We study the algorithmic task of finding large independent sets in Erdos-Renyi $r$-uniform hypergraphs on $n$ vertices having average degree $d$. Krivelevich and Sudakov showed that the maximum independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$. We show that the class of low-degree polynomial algorithms can find independent sets of density $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$ but no larger. This extends and generalizes earlier results of Gamarnik and Sudan, Rahman and Virag, and Wein on graphs, and answers a question of Bal and Bennett. We conjecture that this statistical-computational gap holds for this problem. Additionally, we explore the universality of this gap by examining $r$-partite hypergraphs. A hypergraph $H=(V,E)$ is $r$-partite if there is a partition $V=V_1\cup\cdots\cup V_r$ such that each edge contains exactly one vertex from each set $V_i$. We consider the problem of finding large balanced independent sets (independent sets containing the same number of vertices in each partition) in random $r$-partite hypergraphs with $n$ vertices in each partition and average degree $d$. We prove that the maximum balanced independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$ asymptotically. Furthermore, we prove an analogous low-degree computational threshold of $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$. Our results recover and generalize recent work of Perkins and the second author on bipartite graphs. While the graph case has been extensively studied, this work is the first to consider statistical-computational gaps of optimization problems on random hypergraphs. Our results suggest that these gaps persist for larger uniformities as well as across many models. A somewhat surprising aspect of the gap for balanced independent sets is that the algorithm achieving the lower bound is a simple degree-1 polynomial.

Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable testbed. We also show relevant challenges and research directions.

In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.

北京阿比特科技有限公司