The Matroid Secretary Conjecture is a notorious open problem in online optimization. It claims the existence of an $O(1)$-competitive algorithm for the Matroid Secretary Problem (MSP). Here, the elements of a weighted matroid appear one-by-one, revealing their weight at appearance, and the task is to select elements online with the goal to get an independent set of largest possible weight. $O(1)$-competitive MSP algorithms have so far only been obtained for restricted matroid classes and for MSP variations, including Random-Assignment MSP (RA-MSP), where an adversary fixes a number of weights equal to the ground set size of the matroid, which then get assigned randomly to the elements of the ground set. Unfortunately, these approaches heavily rely on knowing the full matroid upfront. This is an arguably undesirable requirement, and there are good reasons to believe that an approach towards resolving the MSP Conjecture should not rely on it. Thus, both Soto [SIAM Journal on Computing 2013] and Oveis Gharan & Vondrak [Algorithmica 2013] raised as an open question whether RA-MSP admits an $O(1)$-competitive algorithm even without knowing the matroid upfront. In this work, we answer this question affirmatively. Our result makes RA-MSP the first well-known MSP variant with an $O(1)$-competitive algorithm that does not need to know the underlying matroid upfront and without any restriction on the underlying matroid. Our approach is based on first approximately learning the rank-density curve of the matroid, which we then exploit algorithmically.
This paper is motivated by recent developments in the linear bandit literature, which have revealed a discrepancy between the promising empirical performance of algorithms such as Thompson sampling and Greedy, when compared to their pessimistic theoretical regret bounds. The challenge arises from the fact that while these algorithms may perform poorly in certain problem instances, they generally excel in typical instances. To address this, we propose a new data-driven technique that tracks the geometry of the uncertainty ellipsoid, enabling us to establish an instance-dependent frequentist regret bound for a broad class of algorithms, including Greedy, OFUL, and Thompson sampling. This result empowers us to identify and ``course-correct" instances in which the base algorithms perform poorly. The course-corrected algorithms achieve the minimax optimal regret of order $\tilde{\mathcal{O}}(d\sqrt{T})$, while retaining most of the desirable properties of the base algorithms. We present simulation results to validate our findings and compare the performance of our algorithms with the baselines.
The AI community is increasingly focused on merging logic with deep learning to create Neuro-Symbolic (NeSy) paradigms and assist neural approaches with symbolic knowledge. A significant trend in the literature involves integrating axioms and facts in loss functions by grounding logical symbols with neural networks and operators with fuzzy semantics. Logic Tensor Networks (LTN) is one of the main representatives in this category, known for its simplicity, efficiency, and versatility. However, it has been previously shown that not all fuzzy operators perform equally when applied in a differentiable setting. Researchers have proposed several configurations of operators, trading off between effectiveness, numerical stability, and generalization to different formulas. This paper presents a configuration of fuzzy operators for grounding formulas end-to-end in the logarithm space. Our goal is to develop a configuration that is more effective than previous proposals, able to handle any formula, and numerically stable. To achieve this, we propose semantics that are best suited for the logarithm space and introduce novel simplifications and improvements that are crucial for optimization via gradient-descent. We use LTN as the framework for our experiments, but the conclusions of our work apply to any similar NeSy framework. Our findings, both formal and empirical, show that the proposed configuration outperforms the state-of-the-art and that each of our modifications is essential in achieving these results.
Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application.
Offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL) allow policy search algorithms to make use of offline data, but require careful incorporation of uncertainty in order to circumvent the challenges of distribution shift. Gradient-based policy search methods are a promising direction due to their effectiveness in high dimensions; however, we require a more careful consideration of how these methods interplay with uncertainty estimation. We claim that in order for an uncertainty metric to be amenable for gradient-based optimization, it must be (i) stably convergent to data when uncertainty is minimized with gradients, and (ii) not prone to underestimation of true uncertainty. We investigate smoothed distance to data as a metric, and show that it not only stably converges to data, but also allows us to analyze model bias with Lipschitz constants. Moreover, we establish an equivalence between smoothed distance to data and data likelihood, which allows us to use score-matching techniques to learn gradients of distance to data. Importantly, we show that offline model-based policy search problems that maximize data likelihood do not require values of likelihood; but rather only the gradient of the log likelihood (the score function). Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: //sites.google.com/view/score-guided-planning/home
Estimating the output size of a join query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true join size by orders of magnitude, which leads to significant system performance penalty. Recently, size upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet they grossly overestimate the true join size. This paper puts forward a general class of size bounds that are based on information inequalities involving Lp-norms on the degree sequences of the join columns. They generalise prior efforts and can be asymptotically tighter than the known bounds. We give two types of lower and upper bounds: some hold for all entropic vectors, while others hold for all polymatroids. Whereas the former are asymptotically tight but possibly not computable, the latter are computable but not even asymptotically tight. In the case when all degree constraints are over a single variable then we call them "simple", and prove that the polymatroid and entropic bounds are equal, they are tight up to a query-dependent constant (which is stronger than asymptotically tight), are computable in exponential time in the size of the query, and that the worst case database instance that matches the bound has a simple structure called a "normal database".
Ductile damage models and cohesive laws incorporate the material plasticity entailing the growth of irrecoverable deformations even after complete failure. This unrealistic growth remains concealed until the unilateral effects arising from the crack closure emerge. We address this issue by proposing a new strategy to cope with the entire process of failure, from the very inception in the form of diffuse damage to the final stage, i.e. the emergence of sharp cracks. To this end, we introduce a new strain field, termed discontinuity strain, to the conventional additive strain decomposition to account for discontinuities in a continuous sense so that the standard principle of virtual work applies. We treat this strain field similar to a strong discontinuity, yet without introducing new kinematic variables and nonlinear boundary conditions. In this paper, we demonstrate the effectiveness of this new strategy at a simple ductile damage constitutive model. The model uses a scalar damage index to control the degradation process. The discontinuity strain field is injected into the strain decomposition if this damage index exceeds a certain threshold. The threshold corresponds to the limit at which the induced imperfections merge and form a discrete crack. With three-point bending tests under pure mode I and mixed-mode conditions, we demonstrate that this augmentation does not show the early crack closure artifact which is wrongly predicted by plastic damage formulations at load reversal. We also use the concrete damaged plasticity model provided in Abaqus commercial finite element program for our comparison. Lastly, a high-intensity low-cycle fatigue test demonstrates the unilateral effects resulting from the complete closure of the induced crack.
Contextual features are important data sources for building citywide crowd mobility prediction models. However, the difficulty of applying context lies in the unknown generalizability of contextual features (e.g., weather, holiday, and points of interests) and context modeling techniques across different scenarios. In this paper, we present a unified analytic framework and a large-scale benchmark for evaluating context generalizability. The benchmark includes crowd mobility data, contextual data, and advanced prediction models. We conduct comprehensive experiments in several crowd mobility prediction tasks such as bike flow, metro passenger flow, and electric vehicle charging demand. Our results reveal several important observations: (1) Using more contextual features may not always result in better prediction with existing context modeling techniques; in particular, the combination of holiday and temporal position can provide more generalizable beneficial information than other contextual feature combinations. (2) In context modeling techniques, using a gated unit to incorporate raw contextual features into the deep prediction model has good generalizability. Besides, we offer several suggestions about incorporating contextual factors for building crowd mobility prediction applications. From our findings, we call for future research efforts devoted to developing new context modeling solutions.
Atmospheric systems incorporating thermal dynamics must be stable with respect to both energy and entropy. While energy conservation can be enforced via the preservation of the skew-symmetric structure of the Hamiltonian form of the equations of motion, entropy conservation is typically derived as an additional invariant of the Hamiltonian system, and satisfied via the exact preservation of the chain rule. This is particularly challenging since the function spaces used to represent the thermodynamic variables in compatible finite element discretisations are typically discontinuous at element boundaries. In the present work we negate this problem by constructing our equations of motion via weighted averages of skew-symmetric formulations using both flux form and material form advection of thermodynamic variables, which allow for the necessary cancellations required to conserve entropy without the chain rule. We show that such formulations allow for stable simulations of both the thermal shallow water and 3D compressible Euler equations on the sphere using mixed compatible finite elements without entropy damping.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.