We consider a family of two-valued "fully evaluated left-sequential logics" (FELs), of which Free FEL (defined by Staudt in 2012) is most distinguishing (weakest) and immune to atomic side effects. Next is Memorising FEL, in which evaluations of subexpressions are memorised. The following stronger logic is Conditional FEL (inspired by Guzm\'an and Squier's Conditional logic, 1990). The strongest FEL is static FEL, a sequential version of propositional logic. We use evaluation trees as a simple, intuitive semantics and provide complete axiomatisations for closed terms (left-sequential propositional expressions). For each FEL except Static FEL, we also define its three-valued version, with a constant U for "undefinedness" and again provide complete, independent aziomatisations, each one containing two additional axioms for U on top of the axiomatisations of the two-valued case. In this setting, the strongest FEL is equivalent to Bochvar's strict logic.
Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform. We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.
We consider the imaging of cosmic strings by using Cosmic Microwave Background (CMB) data. Mathematically, we study the inversion of an X-ray transform in Lorentzian geometry, called the light ray transform. The inverse problem is highly ill-posed, with additional complexities of being large-scale and dynamic, with unknown parameters that represent multidimensional objects. This presents significant computational challenges for the numerical reconstruction of images that have high spatial and temporal resolution. In this paper, we begin with a microlocal stability analysis for inverting the light ray transform using the Landweber iteration. Next, we discretize the spatiotemporal object and light ray transform and consider iterative computational methods for solving the resulting inverse problem. We provide a numerical investigation and comparison of some advanced iterative methods for regularization including Tikhonov and sparsity-promoting regularizers for various example scalar functions with conormal type singularities.
The Gapeev-Shiryaev conjecture (originating in Gapeev and Shiryaev (2011) and Gapeev and Shiryaev (2013)) can be broadly stated as follows: Monotonicity of the signal-to-noise ratio implies monotonicity of the optimal stopping boundaries. The conjecture was originally formulated both within (i) sequential testing problems for diffusion processes (where one needs to decide which of the two drifts is being indirectly observed) and (ii) quickest detection problems for diffusion processes (where one needs to detect when the initial drift changes to a new drift). In this paper we present proofs of the Gapeev-Shiryaev conjecture both in (i) the sequential testing setting (under Lipschitz/Holder coefficients of the underlying SDEs) and (ii) the quickest detection setting (under analytic coefficients of the underlying SDEs). The method of proof in the sequential testing setting relies upon a stochastic time change and pathwise comparison arguments. Both arguments break down in the quickest detection setting and get replaced by arguments arising from a stochastic maximum principle for hypoelliptic equations (satisfying Hormander's condition) that is of independent interest. Verification of the Gapeev-Shiryaev conjecture establishes the fact that sequential testing and quickest detection problems with monotone signal-to-noise ratios are amenable to known methods of solution.
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed \textit{Self-Play Preference Optimization} (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.
Inspired by the connection between classical regret measures employed in universal prediction and R\'{e}nyi divergence, we introduce a new class of universal predictors that depend on a real parameter $\alpha\geq 1$. This class interpolates two well-known predictors, the mixture estimators, that include the Laplace and the Krichevsky-Trofimov predictors, and the Normalized Maximum Likelihood (NML) estimator. We point out some advantages of this new class of predictors and study its benefits from two complementary viewpoints: (1) we prove its optimality when the maximal R\'{e}nyi divergence is considered as a regret measure, which can be interpreted operationally as a middle ground between the standard average and worst-case regret measures; (2) we discuss how it can be employed when NML is not a viable option, as an alternative to other predictors such as Luckiness NML. Finally, we apply the $\alpha$-NML predictor to the class of discrete memoryless sources (DMS), where we derive simple formulas to compute the predictor and analyze its asymptotic performance in terms of worst-case regret.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.