While there is much excitement about the potential of large multimodal models (LMM), a comprehensive evaluation is critical to establish their true capabilities and limitations. In support of this aim, we evaluate two state-of-the-art LMMs, GPT-4V and Gemini, on a new visual question answering dataset sourced from an authentic online question answering community. We conduct fine-grained analysis by generating seven types of metadata for nearly 2,000 visual questions, such as image type and the required image processing capabilities. Our zero-shot performance analysis highlights the types of questions that are most challenging for both models, including questions related to "puzzling" topic, with "Identification" user intention, with "Sheet Music" image type, or labeled as "hard" by GPT-4.
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling laws. Through a suite of synthetic token manipulation tasks such as compression and recall, designed to probe capabilities, we identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis, training over 500 language models between 70M to 7B parameters. Surprisingly, we find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures via isolated proxy tasks. The new architectures found via MAD, based on simple ideas such as hybridization and sparsity, outperform state-of-the-art Transformer, convolutional, and recurrent architectures (Transformer++, Hyena, Mamba) in scaling, both at compute-optimal budgets and in overtrained regimes. Overall, these results provide evidence that performance on curated synthetic tasks can be predictive of scaling laws, and that an optimal architecture should leverage specialized layers via a hybrid topology.
Accurate velocity estimation of surrounding moving objects and their trajectories are critical elements of perception systems in Automated/Autonomous Vehicles (AVs) with a direct impact on their safety. These are non-trivial problems due to the diverse types and sizes of such objects and their dynamic and random behaviour. Recent point cloud based solutions often use Iterative Closest Point (ICP) techniques, which are known to have certain limitations. For example, their computational costs are high due to their iterative nature, and their estimation error often deteriorates as the relative velocities of the target objects increase (>2 m/sec). Motivated by such shortcomings, this paper first proposes a novel Detection and Tracking of Moving Objects (DATMO) for AVs based on an optical flow technique, which is proven to be computationally efficient and highly accurate for such problems. \textcolor{black}{This is achieved by representing the driving scenario as a vector field and applying vector calculus theories to ensure spatiotemporal continuity.} We also report the results of a comprehensive performance evaluation of the proposed DATMO technique, carried out in this study using synthetic and real-world data. The results of this study demonstrate the superiority of the proposed technique, compared to the DATMO techniques in the literature, in terms of estimation accuracy and processing time in a wide range of relative velocities of moving objects. Finally, we evaluate and discuss the sensitivity of the estimation error of the proposed DATMO technique to various system and environmental parameters, as well as the relative velocities of the moving objects.
In this study, the statistical downscaling model (SDSM) is employed for downscaling the precipitation (PREC), maximum temperature (T max ) and minimum temperature (T min ) over Krishna River Basin (KRB). The Canadian Earth System Model, version 2 (CanESM2) General Circulation Model (GCM) outputs were considered as predictor variables. First, the SDSM is calibrated using 30-years (1961-1990) of data and subsequently validated for 15-years (1991-2005). Upon perceiving the satisfactory performance, the SDSM is further used for projecting the predictand variables (PRECP, T max and T min ) for the 21 st century considering three Representative Concentration Pathway (RCP) scenarios viz. RCP2.6, RCP4.5 and RCP8.5. The future period is divided into three 30-year time slices named epoch-1 (2011-2040), epoch-2 (2041-2070) and epoch-3 (2071-2100) respectively. Further, 1976-2005 is considered as baseline period and all the future results are compared with this data. The results were analysed at various temporal scales, i.e., monthly, seasonal and annual. The study reveals that the KRB is going to become wetter during all the seasons. The results are discussed for the worst-case scenario i.e., RCP8.5 epoch-3. The average annual maximum and minimum temperature is expected to increase. The extreme event analysis is also carried out considering the 90 th and 95 th percentile values. It is noticed that the extreme (90 th and 95 th percentiles) are going to increase. There are events more than extreme values. The outcome of this study can be used in flood modelling for the KRB and also for the modelling of future irrigation demands along with the planning of optimal irrigation in the KRB culturable command area.
With the increasing use of graph-structured data, there is also increasing interest in investigating graph data dependencies and their applications, e.g., in graph data profiling. Graph Generating Dependencies (GGDs) are a class of dependencies for property graphs that can express the relation between different graph patterns and constraints based on their attribute similarities. Rich syntax and semantics of GGDs make them a good candidate for graph data profiling. Nonetheless, GGDs are difficult to define manually, especially when there are no data experts available. In this paper, we propose GGDMiner, a framework for discovering approximate GGDs from graph data automatically, with the intention of profiling graph data through GGDs for the user. GGDMiner has three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD extraction. To optimize memory consumption and execution time, GGDMiner uses a factorized representation of each discovered graph pattern, called Answer Graph. Our results show that the discovered set of GGDs can give an overview about the input graph, both schema level information and also correlations between the graph patterns and attributes.
Multilevel modeling is increasingly relevant in the context of modelling and simulation since it leads to several potential benefits, such as software reuse and integration, the split of semantically separated levels into sub-models, the possibility to employ different levels of detail, and the potential for parallel execution. The coupling that inevitably exists between the sub-models, however, implies the need for maintaining consistency between the various components, more so when different simulation paradigms are employed (e.g., sequential vs parallel, discrete vs continuous). In this paper we argue that multilevel modelling is well suited for the simulation of human mobility, since it naturally leads to the decomposition of the model into two layers, the "micro" and "macro" layer, where individual entities (micro) and long-range interactions (macro) are described. In this paper we investigate the challenges of multilevel modeling, and describe some preliminary results using prototype implementations of multilayer simulators in the context of epidemic diffusion and vehicle pollution.
A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.
The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.
The Gray-Scott model is a set of reaction-diffusion equations that describes chemical systems far from equilibrium. Interest in this model stems from its ability to generate spatio-temporal structures, including pulses, spots, stripes, and self-replicating patterns. We consider an extension of this model in which the spread of the different chemicals is assumed to be nonlocal, and can thus be represented by an integral operator. In particular, we focus on the case of strictly positive, symmetric, $L^1$ convolution kernels that have a finite second moment. Modeling the equations on a finite interval, we prove the existence of small-time weak solutions in the case of nonlocal Dirichlet and Neumann boundary constraints. We then use this result to develop a finite element numerical scheme that helps us explore the effects of nonlocal diffusion on the formation of pulse solutions.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
Game theory has by now found numerous applications in various fields, including economics, industry, jurisprudence, and artificial intelligence, where each player only cares about its own interest in a noncooperative or cooperative manner, but without obvious malice to other players. However, in many practical applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cyber-security, and national defense, players often have apparently adversarial stances, that is, selfish actions of each player inevitably or intentionally inflict loss or wreak havoc on other players. Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal strategy seeking techniques, prevailing algorithms, and practical applications. Finally, promising future research directions are also discussed for relevant adversarial games.