While road obstacle detection techniques have become increasingly effective, they typically ignore the fact that, in practice, the apparent size of the obstacles decreases as their distance to the vehicle increases. In this paper, we account for this by computing a scale map encoding the apparent size of a hypothetical object at every image location. We then leverage this perspective map to (i) generate training data by injecting onto the road synthetic objects whose size corresponds to the perspective foreshortening; and (ii) incorporate perspective information in the decoding part of the detection network to guide the obstacle detector. Our results on standard benchmarks show that, together, these two strategies significantly boost the obstacle detection performance, allowing our approach to consistently outperform state-of-the-art methods in terms of instance-level obstacle detection.
In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. While recent research on prompting has enhanced the reasoning capabilities of LLMs, a gap remains in further improving their understanding abilities. In this study, we introduce metacognitive prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. Our experiments involve five prevalent LLMs: Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general natural language understanding (NLU) tasks from the GLUE and SuperGLUE benchmarks. Results indicate that, although GPT-4 consistently excels in most tasks, PaLM, when equipped with MP, approaches its performance level. Furthermore, across models and datasets, MP consistently outperforms existing prompting methods, including standard and chain-of-thought prompting. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
Rare event simulation and rare event probability estimation are important tasks within the analysis of systems subject to uncertainty and randomness. Simultaneously, accurately estimating rare event probabilities is an inherently difficult task that calls for dedicated tools and methods. One way to improve estimation efficiency on difficult rare event estimation problems is to leverage gradients of the computational model representing the system in consideration, e.g., to explore the rare event faster and more reliably. We present a novel approach for estimating rare event probabilities using such model gradients by drawing on a technique to generate samples from non-normalized posterior distributions in Bayesian inference - the Stein variational gradient descent. We propagate samples generated from a tractable input distribution towards a near-optimal rare event importance sampling distribution by exploiting a similarity of the latter with Bayesian posterior distributions. Sample propagation takes the shape of passing samples through a sequence of invertible transforms such that their densities can be tracked and used to construct an unbiased importance sampling estimate of the rare event probability - the Stein variational rare event estimator. We discuss settings and parametric choices of the algorithm and suggest a method for balancing convergence speed with stability by choosing the step width or base learning rate adaptively. We analyze the method's performance on several analytical test functions and two engineering examples in low to high stochastic dimensions ($d = 2 - 869$) and find that it consistently outperforms other state-of-the-art gradient-based rare event simulation methods.
Internet service providers (ISPs) have a variety of quality attributes that determine their attractiveness for data transmission, ranging from quality-of-service metrics such as jitter to security properties such as the presence of DDoS defense systems. ISPs should optimize these attributes in line with their profit objective, i.e., maximize revenue from attracted traffic while minimizing attribute-related cost, all in the context of alternative offers by competing ISPs. However, this attribute optimization is difficult not least because many aspects of ISP competition are barely understood on a systematic level, e.g., the multi-dimensional and cost-driving nature of path quality, and the distributed decision making of ISPs on the same path. In this paper, we improve this understanding by analyzing how ISP competition affects path quality and ISP profits. To that end, we develop a game-theoretic model in which ISPs (i) affect path quality via multiple attributes that entail costs, (ii) are on paths together with other selfish ISPs, and (iii) are in competition with alternative paths when attracting traffic. The model enables an extensive theoretical analysis, surprisingly showing that competition can have both positive and negative effects on path quality and ISP profits, depending on the network topology and the cost structure of ISPs. However, a large-scale simulation, which draws on real-world data to instantiate the model, shows that the positive effects will likely prevail in practice: If the number of selectable paths towards any destination increases from 1 to 5, the prevalence of quality attributes increases by at least 50%, while 75% of ISPs improve their profit.
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, \ournameb streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3\%, and achieves the astonishing accuracy of 98.04\% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94\%, which signifies a substantial enhancement of 20\% over the previous state-of-the-art method.
In this work, we consider a covert communication scenario, where a transmitter Alice communicates to a receiver Bob with the aid of a probabilistic and uninformed jammer against an adversary warden's detection. The transmission status and power of the jammer are random and follow some priori probabilities. We first analyze the warden's detection performance as a function of the jammer's transmission probability, transmit power distribution, and Alice's transmit power. We then maximize the covert throughput from Alice to Bob subject to a covertness constraint, by designing the covert communication strategies from three different perspectives: Alice's perspective, the jammer's perspective, and the global perspective. Our analysis reveals that the minimum jamming power should not always be zero in the probabilistic jamming strategy, which is different from that in the continuous jamming strategy presented in the literature. In addition, we prove that the minimum jamming power should be the same as Alice's covert transmit power, depending on the covertness and average jamming power constraints. Furthermore, our results show that the probabilistic jamming can outperform the continuous jamming in terms of achieving a higher covert throughput under the same covertness and average jamming power constraints.
Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure
Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness. In practice, attack algorithms are artificially selected and tuned by human experts to break a ML system. However, manual selection of attackers tends to be sub-optimal, leading to a mistakenly assessment of model security. In this paper, a new procedure called Composite Adversarial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms and their hyper-parameters from a candidate pool of \textbf{32 base attackers}. We design a search space where attack policy is represented as an attacking sequence, i.e., the output of the previous attacker is used as the initialization input for successors. Multi-objective NSGA-II genetic algorithm is adopted for finding the strongest attack policy with minimum complexity. The experimental result shows CAA beats 10 top attackers on 11 diverse defenses with less elapsed time (\textbf{6 $\times$ faster than AutoAttack}), and achieves the new state-of-the-art on $l_{\infty}$, $l_{2}$ and unrestricted adversarial attacks.
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels. Furthermore, we propose a variant of multimodal residual networks to exploit eight-attention maps of the BAN efficiently. We quantitatively and qualitatively evaluate our model on visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.
Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer models -- which potentially limits performance. In this work, we introduce ConvE, a multi-layer convolutional network model for link prediction, and report state-of-the-art results for several established datasets. We also show that the model is highly parameter efficient, yielding the same performance as DistMult and R-GCN with 8x and 17x fewer parameters. Analysis of our model suggests that it is particularly effective at modelling nodes with high indegree -- which are common in highly-connected, complex knowledge graphs such as Freebase and YAGO3. In addition, it has been noted that the WN18 and FB15k datasets suffer from test set leakage, due to inverse relations from the training set being present in the test set -- however, the extent of this issue has so far not been quantified. We find this problem to be severe: a simple rule-based model can achieve state-of-the-art results on both WN18 and FB15k. To ensure that models are evaluated on datasets where simply exploiting inverse relations cannot yield competitive results, we investigate and validate several commonly used datasets -- deriving robust variants where necessary. We then perform experiments on these robust datasets for our own and several previously proposed models, and find that ConvE achieves state-of-the-art Mean Reciprocal Rank across all datasets.
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.