Resolving coreference and bridging relations in chemical patents is important for better understanding the precise chemical process, where chemical domain knowledge is very critical. We proposed an approach incorporating external knowledge into a multi-task learning model for both coreference and bridging resolution in the chemical domain. The results show that integrating external knowledge can benefit both chemical coreference and bridging resolution.
The problem of binary hypothesis testing between two probability measures is considered. New sharp bounds are derived for the best achievable error probability of such tests based on independent and identically distributed observations. Specifically, the asymmetric version of the problem is examined, where different requirements are placed on the two error probabilities. Accurate nonasymptotic expansions with explicit constants are obtained for the error probability, using tools from large deviations and Gaussian approximation. Examples are shown indicating that, in the asymmetric regime, the approximations suggested by the new bounds are significantly more accurate than the approximations provided by either of the two main earlier approaches -- normal approximation and error exponents.
The accuracy of finite element solutions is closely tied to the mesh quality. In particular, geometrically nonlinear problems involving large and strongly localized deformations often result in prohibitively large element distortions. In this work, we propose a novel mesh regularization approach allowing to restore a non-distorted high-quality mesh in an adaptive manner without the need for expensive re-meshing procedures. The core idea of this approach lies in the definition of a finite element distortion potential considering contributions from different distortion modes such as skewness and aspect ratio of the elements. The regularized mesh is found by minimization of this potential. Moreover, based on the concept of spatial localization functions, the method allows to specify tailored requirements on mesh resolution and quality for regions with strongly localized mechanical deformation and mesh distortion. In addition, while existing mesh regularization schemes often keep the boundary nodes of the discretization fixed, we propose a mesh-sliding algorithm based on variationally consistent mortar methods allowing for an unrestricted tangential motion of nodes along the problem boundary. Especially for problems involving significant surface deformation (e.g., frictional contact), this approach allows for an improved mesh relaxation as compared to schemes with fixed boundary nodes. To transfer data such as tensor-valued history variables of the material model from the old (distorted) to the new (regularized) mesh, a structure-preserving invariant interpolation scheme for second-order tensors is employed, which has been proposed in our previous work and is designed to preserve important mechanical properties of tensor-valued data such as objectivity and positive definiteness... {continued see pdf}
We provide a novel dimension-free uniform concentration bound for the empirical risk function of constrained logistic regression. Our bound yields a milder sufficient condition for a uniform law of large numbers than conditions derived by the Rademacher complexity argument and McDiarmid's inequality. The derivation is based on the PAC-Bayes approach with second-order expansion and Rademacher-complexity-based bounds for the residual term of the expansion.
We propose a coefficient that measures dependence in paired samples of functions. It has properties similar to the Pearson correlation, but differs in significant ways: 1) it is designed to measure dependence between curves, 2) it focuses only on extreme curves. The new coefficient is derived within the framework of regular variation in Banach spaces. A consistent estimator is proposed and justified by an asymptotic analysis and a simulation study. The usefulness of the new coefficient is illustrated on financial and and climate functional data.
Continuous p-dispersion problems with and without boundary constraints are NP-hard optimization problems with numerous real-world applications, notably in facility location and circle packing, which are widely studied in mathematics and operations research. In this work, we concentrate on general cases with a non-convex multiply-connected region that are rarely studied in the literature due to their intractability and the absence of an efficient optimization model. Using the penalty function approach, we design a unified and almost everywhere differentiable optimization model for these complex problems and propose a tabu search-based global optimization (TSGO) algorithm for solving them. Computational results over a variety of benchmark instances show that the proposed model works very well, allowing popular local optimization methods (e.g., the quasi-Newton methods and the conjugate gradient methods) to reach high-precision solutions due to the differentiability of the model. These results further demonstrate that the proposed TSGO algorithm is very efficient and significantly outperforms several popular global optimization algorithms in the literature, improving the best-known solutions for several existing instances in a short computational time. Experimental analyses are conducted to show the influence of several key ingredients of the algorithm on computational performance.
Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a covariance model, encoding the spatial dependence. We relax the strong assumption of linearity and propose embedding neural networks directly within the traditional geostatistical models to accommodate non-linear mean functions while retaining all other advantages including use of Gaussian Processes to explicitly model the spatial covariance, enabling inference on the covariate effect through the mean and on the spatial dependence through the covariance, and offering predictions at new locations via kriging. We propose NN-GLS, a new neural network estimation algorithm for the non-linear mean in GP models that explicitly accounts for the spatial covariance through generalized least squares (GLS), the same loss used in the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide a finite sample concentration rate, which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. To our knowledge, these are the first large-sample results for any neural network algorithm for irregular spatial data. We demonstrate the methodology through simulated and real datasets.
We present BadGD, a unified theoretical framework that exposes the vulnerabilities of gradient descent algorithms through strategic backdoor attacks. Backdoor attacks involve embedding malicious triggers into a training dataset to disrupt the model's learning process. Our framework introduces three novel constructs: Max RiskWarp Trigger, Max GradWarp Trigger, and Max GradDistWarp Trigger, each designed to exploit specific aspects of gradient descent by distorting empirical risk, deterministic gradients, and stochastic gradients respectively. We rigorously define clean and backdoored datasets and provide mathematical formulations for assessing the distortions caused by these malicious backdoor triggers. By measuring the impact of these triggers on the model training procedure, our framework bridges existing empirical findings with theoretical insights, demonstrating how a malicious party can exploit gradient descent hyperparameters to maximize attack effectiveness. In particular, we show that these exploitations can significantly alter the loss landscape and gradient calculations, leading to compromised model integrity and performance. This research underscores the severe threats posed by such data-centric attacks and highlights the urgent need for robust defenses in machine learning. BadGD sets a new standard for understanding and mitigating adversarial manipulations, ensuring the reliability and security of AI systems.
Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.
Partial multi-task learning where training examples are annotated for one of the target tasks is a promising idea in remote sensing as it allows combining datasets annotated for different tasks and predicting more tasks with fewer network parameters. The na\"ive approach to partial multi-task learning is sub-optimal due to the lack of all-task annotations for learning joint representations. This paper proposes using knowledge distillation to replace the need of ground truths for the alternate task and enhance the performance of such approach. Experiments conducted on the public ISPRS 2D Semantic Labeling Contest dataset show the effectiveness of the proposed idea on partial multi-task learning for semantic tasks including object detection and semantic segmentation in aerial images.
Lattice structures have been widely used in applications due to their superior mechanical properties. To fabricate such structures, a geometric processing step called triangulation is often employed to transform them into the STL format before sending them to 3D printers. Because lattice structures tend to have high geometric complexity, this step usually generates a large amount of triangles, a memory and compute-intensive task. This problem manifests itself clearly through large-scale lattice structures that have millions or billions of struts. To address this problem, this paper proposes to transform a lattice structure into an intermediate model called meta-mesh before undergoing real triangulation. Compared to triangular meshes, meta-meshes are very lightweight and much less compute-demanding. The meta-mesh can also work as a base mesh reusable for conveniently and efficiently triangulating lattice structures with arbitrary resolutions. A CPU+GPU asynchronous meta-meshing pipeline has been developed to efficiently generate meta-meshes from lattice structures. It shifts from the thread-centric GPU algorithm design paradigm commonly used in CAD to the recent warp-centric design paradigm to achieve high performance. This is achieved by a new data compression method, a GPU cache-aware data structure, and a workload-balanced scheduling method that can significantly reduce memory divergence and branch divergence. Experimenting with various billion-scale lattice structures, the proposed method is seen to be two orders of magnitude faster than previously achievable.