Past work shows that one can associate a notion of Shannon entropy to a Dirichlet polynomial, regarded as an empirical distribution. Indeed, entropy can be extracted from any d:Dir by a two-step process, where the first step is a rig homomorphism out of Dir, the *set* of Dirichlet polynomials, with rig structure given by standard addition and multiplication. In this short note, we show that this rig homomorphism can be upgraded to a rig *functor*, when we replace the set of Dirichlet polynomials by the *category* of ordinary (Cartesian) polynomials. In the Cartesian case, the process has three steps. The first step is a rig functor PolyCart -> Poly sending a polynomial p to (dp)y, where dp is the derivative of p. The second is a rig functor Poly -> Set x Set^op, sending a polynomial q to the pair (q(1),Gamma(q)), where Gamma(q)=Poly(q,y) can be interpreted as the global sections of q viewed as a bundle, and q(1) as its base. To make this precise we define what appears to be a new distributive monoidal structure on Set x Set^op, which can be understood geometrically in terms of rectangles. The last step, as for Dirichlet polynomials, is simply to extract the entropy as a real number from a pair of sets (A,B); it is given by log A - log B^(1/A) and can be thought of as the log aspect ratio of the rectangle.
Recent advances in Neural Radiance Fields (NeRFs) have made it possible to reconstruct and reanimate dynamic portrait scenes with control over head-pose, facial expressions and viewing direction. However, training such models assumes photometric consistency over the deformed region e.g. the face must be evenly lit as it deforms with changing head-pose and facial expression. Such photometric consistency across frames of a video is hard to maintain, even in studio environments, thus making the created reanimatable neural portraits prone to artifacts during reanimation. In this work, we propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions. CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations. The surface normals prediction is guided using 3DMM normals that act as a coarse prior for the normals of the human head, where direct prediction of normals is hard due to rigid and non-rigid deformations induced by head-pose and facial expression changes. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects. The project page can be found here: //shahrukhathar.github.io/2023/08/22/CoDyNeRF.html
Skiplists are used in a variety of applications for storing data subject to order criteria. In this article we discuss the design, analysis and performance of a concurrent deterministic skip list on many-core NUMA nodes. We also evaluate the performance of a concurrent lock-free unbounded queue implementation and three implementations of multi-writer, multi-reader~(MWMR) hash tables and compare their performance with equivalent implementations from Intel's Thread Building Blocks~(TBB) library. We focus on strategies for memory management that reduce page faults and cache misses for the memory access patterns in these data structures. This paper proposes hierarchical usage of concurrent data structures in programs to improve memory latencies by reducing memory accesses from remote NUMA nodes.
The majority of prior work on information retrieval (IR) assumes that the corpus is static, whereas in the real world, the documents are continually updated. In this paper, we incorporate often overlooked dynamic nature of knowledge into the retrieval systems. Our work treats retrieval not as static archives but as dynamic knowledge bases better aligned with real-world environments. We conduct a comprehensive evaluation of dual encoders and generative retrieval, utilizing the StreamingQA benchmark designed for the temporal knowledge updates. Our initial results show that while generative retrieval outperforms dual encoders in static settings, the opposite is true in dynamic settings. Surprisingly, however, when we utilize a parameter-efficient pre-training method to enhance adaptability of generative retrieval to new corpora, our resulting model, Dynamic Generative Retrieval (DynamicGR), exhibits unexpected findings. It (1) efficiently compresses new knowledge in their internal index, attaining a remarkable storage capacity due to its fully parametric architecture and (2) outperforms dual encoders not only in static settings but in dynamic scenarios with a 5% margin in hit@5, requiring 4 times less training time.
Doeblin coefficients are a classical tool for analyzing the ergodicity and exponential convergence rates of Markov chains. Propelled by recent works on contraction coefficients of strong data processing inequalities, we investigate whether Doeblin coefficients also exhibit some of the notable properties of canonical contraction coefficients. In this paper, we present several new structural and geometric properties of Doeblin coefficients. Specifically, we show that Doeblin coefficients form a multi-way divergence, exhibit tensorization, and possess an extremal trace characterization. We then show that they also have extremal coupling and simultaneously maximal coupling characterizations. By leveraging these characterizations, we demonstrate that Doeblin coefficients act as a nice generalization of the well-known total variation (TV) distance to a multi-way divergence, enabling us to measure the "distance" between multiple distributions rather than just two. We then prove that Doeblin coefficients exhibit contraction properties over Bayesian networks similar to other canonical contraction coefficients. We additionally derive some other results and discuss an application of Doeblin coefficients to distribution fusion. Finally, in a complementary vein, we introduce and discuss three new quantities: max-Doeblin coefficient, max-DeGroot distance, and min-DeGroot distance. The max-Doeblin coefficient shares a connection with the concept of maximal leakage in information security; we explore its properties and provide a coupling characterization. On the other hand, the max-DeGroot and min-DeGroot measures extend the concept of DeGroot distance to multiple distributions.
In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a document to a reader. Identifying the salience of entities was found helpful in several downstream applications such as search, ranking, and entity-centric summarization, among others. Prior work on salient entity detection mainly focused on machine learning models that require heavy feature engineering. We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. To this end, we conduct a comprehensive benchmarking of four publicly available datasets using models representative of the medium-sized pre-trained language model family. Additionally, we show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.
Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.
This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the interdependence between the objects/concepts in the image and the creation of a succinct sentential narration. Experiments on several labeled datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. As a toy application, we apply image captioning to create video captions, and we advance a few hypotheses on the challenges we encountered.