Previous work on fatigue prediction in Powder Bed Fusion - Laser Beam has shown that the estimate of the largest pore size within the stressed volume is correlated with the resulting fatigue behavior in porosity-driven failures. However, single value estimates for the largest pore size are insufficient to capture the experimentally observed scatter in fatigue properties. To address this gap, in this work, we incorporate uncertainty quantification into extreme value statistics to estimate the largest pore size distribution in a given volume of material by capturing uncertainty in the number of pores present and the upper tail parameters. We then applied this statistical framework to compare the porosity equivalence between two geometries: a 4-point bend fatigue specimen and an axial fatigue specimen in the gauge section. Both geometries were manufactured with the same process conditions using Ti-6Al-4V, followed by porosity characterization via X-ray Micro CT. The results show that the largest pore size distribution of the 4-point bend specimen is insufficient to accurately capture the largest pore size observed in the axial fatigue specimen, despite similar dimensions. Based on our findings, we provide insight into the design of witness coupons that exhibit part-to-coupon porosity equivalence for fatigue.
Retrieval-augmented generation (RAG) mitigates hallucination in Large Language Models (LLMs) by using query pipelines to retrieve relevant external information and grounding responses in retrieved knowledge. However, query pipeline optimization for cancer patient question-answering (CPQA) systems requires separately optimizing multiple components with domain-specific considerations. We propose a novel three-aspect optimization approach for the RAG query pipeline in CPQA systems, utilizing public biomedical databases like PubMed and PubMed Central. Our optimization includes: (1) document retrieval, utilizing a comparative analysis of NCBI resources and introducing Hybrid Semantic Real-time Document Retrieval (HSRDR); (2) passage retrieval, identifying optimal pairings of dense retrievers and rerankers; and (3) semantic representation, introducing Semantic Enhanced Overlap Segmentation (SEOS) for improved contextual understanding. On a custom-developed dataset tailored for cancer-related inquiries, our optimized RAG approach improved the answer accuracy of Claude-3-haiku by 5.24% over chain-of-thought prompting and about 3% over a naive RAG setup. This study highlights the importance of domain-specific query optimization in realizing the full potential of RAG and provides a robust framework for building more accurate and reliable CPQA systems, advancing the development of RAG-based biomedical systems.
We derive a new adaptive leverage score sampling strategy for solving the Column Subset Selection Problem (CSSP). The resulting algorithm, called Adaptive Randomized Pivoting, can be viewed as a randomization of Osinsky's recently proposed deterministic algorithm for CSSP. It guarantees, in expectation, an approximation error that matches the optimal existence result in the Frobenius norm. Although the same guarantee can be achieved with volume sampling, our sampling strategy is much simpler and less expensive. To show the versatility of Adaptive Randomized Pivoting, we apply it to select indices in the Discrete Empirical Interpolation Method, in cross/skeleton approximation of general matrices, and in the Nystroem approximation of symmetric positive semi-definite matrices. In all these cases, the resulting randomized algorithms are new and they enjoy bounds on the expected error that match -- or improve -- the best known deterministic results. A derandomization of the algorithm for the Nystroem approximation results in a new deterministic algorithm with a rather favorable error bound.
We investigate diffusion models to solve the Traveling Salesman Problem. Building on the recent DIFUSCO and T2TCO approaches, we propose IDEQ. IDEQ improves the quality of the solutions by leveraging the constrained structure of the state space of the TSP. Another key component of IDEQ consists in replacing the last stages of DIFUSCO curriculum learning by considering a uniform distribution over the Hamiltonian tours whose orbits by the 2-opt operator converge to the optimal solution as the training objective. Our experiments show that IDEQ improves the state of the art for such neural network based techniques on synthetic instances. More importantly, our experiments show that IDEQ performs very well on the instances of the TSPlib, a reference benchmark in the TSP community: it closely matches the performance of the best heuristics, LKH3, being even able to obtain better solutions than LKH3 on 2 instances of the TSPlib defined on 1577 and 3795 cities. IDEQ obtains 0.3% optimality gap on TSP instances made of 500 cities, and 0.5% on TSP instances with 1000 cities. This sets a new SOTA for neural based methods solving the TSP. Moreover, IDEQ exhibits a lower variance and better scales-up with the number of cities with regards to DIFUSCO and T2TCO.
The widespread adoption of digital distribution channels both enables and forces more and more logistical service providers to manage booking processes actively to maintain competitiveness. As a result, their operational planning is no longer limited to solving vehicle routing problems. Instead, demand management decisions and vehicle routing decisions are optimized integratively with the aim of maximizing revenue and minimizing fulfillment cost. The resulting integrated demand management and vehicle routing problems (i-DMVRPs) can be formulated as Markov decision process models and, theoretically, can be solved via the well-known Bellman equation. Unfortunately, the Bellman equation is intractable for realistic-sized instances. Thus, in the literature, i-DMVRPs are often addressed via decomposition-based solution approaches involving an opportunity cost approximation as a key component. Despite its importance, to the best of our knowledge, there is neither a technique to systematically analyze how the accuracy of the opportunity cost approximation translates into overall solution quality nor are there general guidelines on when to apply which class of approximation approach. In this work, we address this research gap by proposing an explainability technique that quantifies and visualizes the magnitude of approximation errors, their immediate impact, and their relevance in specific regions of the state space. Exploiting reward decomposition, it further yields a characterization of different types of approximation errors. Applying the technique to a generic i-DMVRP in a full-factorial computational study and comparing the results with observations in existing literature, we show that the technique contributes to better explaining algorithmic performance and provides guidance for the algorithm selection and development process.
One of the questions in Rigidity Theory is whether a realization of the vertices of a graph in the plane is flexible, namely, if it allows a continuous deformation preserving the edge lengths. A flexible realization of a connected graph in the plane exists if and only if the graph has a so called NAC-coloring, which is surjective edge coloring by two colors such that for each cycle either all the edges have the same color or there are at least two edges of each color. The question whether a graph has a NAC-coloring, and hence also the existence of a flexible realization, has been proven to be NP-complete. We show that this question is also NP-complete on graphs with maximum degree five and on graphs with the average degree at most $4+\varepsilon$ for every fixed $\varepsilon >0$. The existence of a NAC-coloring is fixed parameter tractable when parametrized by treewidth. Since the only existing implementation of checking the existence of a NAC-coloring is rather naive, we propose new algorithms along with their implementation, which is significantly faster. We also focus on searching all NAC-colorings of a graph, since they provide useful information about its possible flexible realizations.
Statistical learning under distribution shift is challenging when neither prior knowledge nor fully accessible data from the target distribution is available. Distributionally robust learning (DRL) aims to control the worst-case statistical performance within an uncertainty set of candidate distributions, but how to properly specify the set remains challenging. To enable distributional robustness without being overly conservative, in this paper, we propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate. More specifically, we assume the unknown density ratio between the target distribution and its estimate is isotonic with respect to some partial order. At the population level, we provide a solution to the shape-constrained optimization problem that does not involve the isotonic constraint. At the sample level, we provide consistency results for an empirical estimator of the target in a range of different settings. Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.
Insurance losses due to flooding can be estimated by simulating and then summing a large number of losses for each in a large set of hypothetical years of flood events. Replicated realisations lead to Monte Carlo return-level estimates and associated uncertainty. The procedure, however, is highly computationally intensive. We develop and use a new, Bennett-like concentration inequality to provide conservative but relatively accurate estimates of return levels. Bennett's inequality accounts for the different variances of each of the variables in a sum but uses a uniform upper bound on their support. Motivated by the variability in the total insured value of risks within a portfolio, we incorporate both individual upper bounds and variances and obtain tractable concentration bounds. Simulation studies and application to a representative portfolio demonstrate a substantial tightening compared with Bennett's bound. We then develop an importance-sampling procedure that repeatedly samples the loss for each year from the distribution implied by the concentration inequality, leading to conservative estimates of the return levels and their uncertainty using orders of magnitude less computation. This enables a simulation study of the sensitivity of the predictions to perturbations in quantities that are usually assumed fixed and known but, in truth, are not.
Online optimisation studies the convergence of optimisation methods as the data embedded in the problem changes. Based on this idea, we propose a primal dual online method for nonlinear time-discrete inverse problems. We analyse the method through regret theory and demonstrate its performance in real-time monitoring of moving bodies in a fluid with Electrical Impedance Tomography (EIT). To do so, we also prove the second-order differentiability of the Complete Electrode Model (CEM) solution operator on $L^\infty$.
The proposed two-dimensional geometrically exact beam element extends our previous work by including the effects of shear distortion, and also of distributed forces and moments acting along the beam. The general flexibility-based formulation exploits the kinematic equations combined with the inverted sectional equations and the integrated form of equilibrium equations. The resulting set of three first-order differential equations is discretized by finite differences and the boundary value problem is converted into an initial value problem using the shooting method. Due to the special structure of the governing equations, the scheme remains explicit even though the first derivatives are approximated by central differences, leading to high accuracy. The main advantage of the adopted approach is that the error can be efficiently reduced by refining the computational grid used for finite differences at the element level while keeping the number of global degrees of freedom low. The efficiency is also increased by dealing directly with the global centerline coordinates and sectional inclination with respect to global axes as the primary unknowns at the element level, thereby avoiding transformations between local and global coordinates. Two formulations of the sectional equations, referred to as the Reissner and Ziegler models, are presented and compared. In particular, stability of an axially loaded beam/column is investigated and the connections to the Haringx and Engesser stability theories are discussed. Both approaches are tested in a series of numerical examples, which illustrate (i) high accuracy with quadratic convergence when the spatial discretization is refined, (ii) easy modeling of variable stiffness along the element (such as rigid joint offsets), (iii) efficient and accurate characterization of the buckling and post-buckling behavior.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.