亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Dense retrieval needs to learn discriminative text embeddings to represent the semantic relationship between query and document. It may benefit from the using of large language models (LLMs), given LLMs' strong capability on semantic understanding. However, the LLMs are pre-trained by text generation tasks, whose working pattern is completely different from representing texts as embeddings. As a result, it is imperative to study how to adapt LLMs properly so that they can be effectively initialized as the backbone encoder for dense retrieval. In this paper, we propose a novel approach, called LLaRA (LLM adapted for dense RetrievAl), which works as a post-hoc adaptation of LLM for the dense retrieval application. LLaRA consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively. LLaRA turns out to be simple, lightweight, and highly effective. It is applied to adapt LLaMA-2-7B (base) on the Wikipedia corpus, where it substantially improves the model's fine-tuned performances on a variety of dense retrieval benchmarks, like MSMARCO and BEIR. Our model and code will be made publicly available at BGE repository.

相關內容

大語言(yan)模(mo)(mo)(mo)型(xing)(xing)是基于海量文(wen)本數據訓練(lian)的(de)(de)深度(du)學(xue)習(xi)模(mo)(mo)(mo)型(xing)(xing)。它(ta)不僅能(neng)(neng)夠(gou)生成自然語言(yan)文(wen)本,還能(neng)(neng)夠(gou)深入(ru)理(li)解(jie)文(wen)本含義,處(chu)理(li)各種自然語言(yan)任(ren)(ren)務,如(ru)文(wen)本摘要、問(wen)答、翻譯等。2023年,大語言(yan)模(mo)(mo)(mo)型(xing)(xing)及其在人(ren)(ren)工智能(neng)(neng)領域(yu)的(de)(de)應用(yong)已成為(wei)全(quan)球科技研究的(de)(de)熱點,其在規模(mo)(mo)(mo)上的(de)(de)增長尤為(wei)引(yin)人(ren)(ren)注目,參數量已從最初的(de)(de)十幾億躍升(sheng)到如(ru)今(jin)的(de)(de)一萬億。參數量的(de)(de)提(ti)升(sheng)使得模(mo)(mo)(mo)型(xing)(xing)能(neng)(neng)夠(gou)更(geng)加精(jing)細地捕捉人(ren)(ren)類語言(yan)微妙之(zhi)處(chu),更(geng)加深入(ru)地理(li)解(jie)人(ren)(ren)類語言(yan)的(de)(de)復雜性。在過去(qu)的(de)(de)一年里(li),大語言(yan)模(mo)(mo)(mo)型(xing)(xing)在吸納新(xin)知識、分解(jie)復雜任(ren)(ren)務以及圖文(wen)對齊等多方面都有顯著提(ti)升(sheng)。隨著技術的(de)(de)不斷(duan)成熟,它(ta)將不斷(duan)拓展其應用(yong)范圍,為(wei)人(ren)(ren)類提(ti)供更(geng)加智能(neng)(neng)化和個性化的(de)(de)服(fu)務,進一步改善(shan)人(ren)(ren)們的(de)(de)生活和生產方式。

Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However, despite these notable features, DLMs have not yet reached the performance levels of their autoregressive counterparts. One of the ways to reduce the performance gap between these two types of language models is to speed up the generation of DLMs. Therefore, we propose a novel methodology to address this issue in this work. It enables the execution of more generation steps within a given time frame, leading to higher-quality outputs. Specifically, our methods estimate DLMs completeness of text generation and allow adaptive halting of the generation process. We evaluate our methods on Plaid, SSD, and CDCD DLMs and create a cohesive perspective on their generation workflows. Finally, we confirm that our methods allow halting these models and decrease the generation time by $10$-$40$\% without a drop in the quality of model samples.

Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. However, the learning environments and biases of current artificial learners and humans diverge in ways that weaken the impact of the evidence obtained from learning simulations. For example, today's most effective neural language models are trained on roughly one thousand times the amount of linguistic data available to a typical child. To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans. If an appropriate model successfully acquires some target linguistic knowledge, it can provide a proof of concept that the target is learnable in a hypothesized human learning scenario. Plausible model learners will enable us to carry out experimental manipulations to make causal inferences about variables in the learning environment, and to rigorously test poverty-of-the-stimulus-style claims arguing for innate linguistic knowledge in humans on the basis of speculations about learnability. Comparable experiments will never be possible with human subjects due to practical and ethical considerations, making model learners an indispensable resource. So far, attempts to deprive current models of unfair advantages obtain sub-human results for key grammatical behaviors such as acceptability judgments. But before we can justifiably conclude that language learning requires more prior domain-specific knowledge than current models possess, we must first explore non-linguistic inputs in the form of multimodal stimuli and multi-agent interaction as ways to make our learners more efficient at learning from limited linguistic input.

The use of synthetic data in machine learning saves a significant amount of time when implementing an effective object detector. However, there is limited research in this domain. This study aims to improve upon previously applied implementations in the task of instance segmentation of pallets in a warehouse environment. This study proposes using synthetically generated domain-randomised data as well as data generated through Unity to achieve this. This study achieved performance improvements on the stacked and racked pallet categories by 69% and 50% mAP50, respectively when being evaluated on real data. Additionally, it was found that there was a considerable impact on the performance of a model when it was evaluated against images in a darker environment, dropping as low as 3% mAP50 when being evaluated on images with an 80% brightness reduction. This study also created a two-stage detector that used YOLOv8 and SAM, but this proved to have unstable performance. The use of domain-randomised data proved to have negligible performance improvements when compared to the Unity-generated data.

We present an algorithmic solution to the problem of incremental belief updating in the context of Monte Carlo inference in Bayesian statistical models represented by probabilistic programs. Given a model and a sample-approximated posterior, our solution constructs a set of weighted observations to condition the model such that inference would result in the same posterior. This problem arises e.g. in multi-level modelling, incremental inference, inference in presence of privacy constraints. First, a set of virtual observations is selected, then, observation weights are found through a computationally efficient optimization procedure such that the reconstructed posterior coincides with or closely approximates the original posterior. We implement and apply the solution to a number of didactic examples and case studies, showing efficiency and robustness of our approach. The provided reference implementation is agnostic to the probabilistic programming language or the inference algorithm, and can be applied to most mainstream probabilistic programming environments.

Robotic surgery promises enhanced precision and adaptability over traditional surgical methods. It also offers the possibility of automating surgical interventions, resulting in reduced stress on the surgeon, better surgical outcomes, and lower costs. Cholecystectomy, the removal of the gallbladder, serves as an ideal model procedure for automation due to its distinct and well-contrasted anatomical features between the gallbladder and liver, along with standardized surgical maneuvers. Dissection is a frequently used subtask in cholecystectomy where the surgeon delivers the energy on the hook to detach the gallbladder from the liver. Hence, dissection along tissue boundaries is a good candidate for surgical automation. For the da Vinci surgical robot to perform the same procedure as a surgeon automatically, it needs to have the ability to (1) recognize and distinguish between the two different tissues (e.g. the liver and the gallbladder), (2) understand where the boundary between the two tissues is located in the 3D workspace, (3) locate the instrument tip relative to the boundary in the 3D space using visual feedback, and (4) move the instrument along the boundary. This paper presents a novel framework that addresses these challenges through AI-assisted image processing and vision-based robot control. We also present the ex-vivo evaluation of the automated procedure on chicken and pork liver specimens that demonstrates the effectiveness of the proposed framework.

Every language recognized by a non-deterministic finite automaton can be recognized by a deterministic automaton, at the cost of a potential increase of the number of states, which in the worst case can go from $n$ states to $2^n$ states. In this article, we investigate this classical result in a probabilistic setting where we take a deterministic automaton with $n$ states uniformly at random and add just one random transition. These automata are almost deterministic in the sense that only one state has a non-deterministic choice when reading an input letter. In our model, each state has a fixed probability to be final. We prove that for any $d\geq 1$, with non-negligible probability the minimal (deterministic) automaton of the language recognized by such an automaton has more than $n^d$ states; as a byproduct, the expected size of its minimal automaton grows faster than any polynomial. Our result also holds when each state is final with some probability that depends on $n$, as long as it is not too close to $0$ and $1$, at distance at least $\Omega(\frac1{\sqrt{n}})$ to be precise, therefore allowing models with a sublinear number of final states in expectation.

Spatial symmetries and invariances play an important role in the behaviour of materials and should be respected in the description and modelling of material properties. The focus here is the class of physically symmetric and positive definite tensors, as they appear often in the description of materials, and one wants to be able to prescribe certain classes of spatial symmetries and invariances for each member of the whole ensemble, while at the same time demanding that the mean or expected value of the ensemble be subject to a possibly 'higher' spatial invariance class. We formulate a modelling framework which not only respects these two requirements$-$positive definiteness and invariance$-$but also allows a fine control over orientation on one hand, and strength/size on the other. As the set of positive definite tensors is not a linear space, but rather an open convex cone in the linear space of physically symmetric tensors, we consider it advantageous to widen the notion of mean to the so-called Fr\'echet mean on a metric space, which is based on distance measures or metrics between positive definite tensors other than the usual Euclidean one. It is shown how the random ensemble can be modelled and generated, independently in its scaling and orientational or directional aspects, with a Lie algebra representation via a memoryless transformation. The parameters which describe the elements in this Lie algebra are then to be considered as random fields on the domain of interest. As an example, a 2D and a 3D model of steady-state heat conduction in a human proximal femur, a bone with high material anisotropy, is modelled with a random thermal conductivity tensor, and the numerical results show the distinct impact of incorporating into the constitutive model different material uncertainties$-$scaling, orientation, and prescribed material symmetry$-$on the desired quantities of interest.

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don't worry if you get stuck at some point along the way---just go back and reread the previous section, and try writing down and working through some examples. And if you're still stuck, we're happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at //explained.ai

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

北京阿比特科技有限公司