An arc-search interior-point method is a type of interior-point methods that approximate the central path by an ellipsoidal arc, and it can often reduce the number of iterations. In this work, to further reduce the number of iterations and computation time for solving linear programming problems, we propose two arc-search interior-point methods using Nesterov's restarting strategy that is well-known method to accelerate the gradient method with a momentum term. The first one generates a sequence of iterations in the neighborhood, and we prove that the convergence of the generated sequence to an optimal solution and the computation complexity is polynomial time. The second one incorporates the concept of the Mehrotra type interior-point method to improve numerical stability. The numerical experiments demonstrate that the second one reduced the number of iterations and computational time. In particular, the average number of iterations was reduced by 6% compared to an existing arc-search interior-point method due to the momentum term.
We present a comprehensive computational study of a class of linear system solvers, called {\it Triangle Algorithm} (TA) and {\it Centering Triangle Algorithm} (CTA), developed by Kalantari \cite{kalantari23}. The algorithms compute an approximate solution or minimum-norm solution to $Ax=b$ or $A^TAx=A^Tb$, where $A$ is an $m \times n$ real matrix of arbitrary rank. The algorithms specialize when $A$ is symmetric positive semi-definite. Based on the description and theoretical properties of TA and CTA from \cite{kalantari23}, we give an implementation of the algorithms that is easy-to-use for practitioners, versatile for a wide range of problems, and robust in that our implementation does not necessitate any constraints on $A$. Next, we make computational comparisons of our implementation with the Matlab implementations of two state-of-the-art algorithms, GMRES and ``lsqminnorm". We consider square and rectangular matrices, for $m$ up to $10000$ and $n$ up to $1000000$, encompassing a variety of applications. These results indicate that our implementation outperforms GMRES and ``lsqminnorm" both in runtime and quality of residuals. Moreover, the relative residuals of CTA decrease considerably faster and more consistently than GMRES, and our implementation provides high precision approximation, faster than GMRES reports lack of convergence. With respect to ``lsqminnorm", our implementation runs faster, producing better solutions. Additionally, we present a theoretical study in the dynamics of iterations of residuals in CTA and complement it with revealing visualizations. Lastly, we extend TA for LP feasibility problems, handling non-negativity constraints. Computational results show that our implementation for this extension is on par with those of TA and CTA, suggesting applicability in linear programming and related problems.
New advancements for the detection of synthetic images are critical for fighting disinformation, as the capabilities of generative AI models continuously evolve and can lead to hyper-realistic synthetic imagery at unprecedented scale and speed. In this paper, we focus on the challenge of generalizing across different concept classes, e.g., when training a detector on human faces and testing on synthetic animal images - highlighting the ineffectiveness of existing approaches that randomly sample generated images to train their models. By contrast, we propose an approach based on the premise that the robustness of the detector can be enhanced by training it on realistic synthetic images that are selected based on their quality scores according to a probabilistic quality estimation model. We demonstrate the effectiveness of the proposed approach by conducting experiments with generated images from two seminal architectures, StyleGAN2 and Latent Diffusion, and using three different concepts for each, so as to measure the cross-concept generalization ability. Our results show that our quality-based sampling method leads to higher detection performance for nearly all concepts, improving the overall effectiveness of the synthetic image detectors.
Quadratic minimization problems with orthogonality constraints (QMPO) play an important role in many applications of science and engineering. However, some existing methods may suffer from low accuracy or heavy workload for large-scale QMPO. Krylov subspace methods are popular for large-scale optimization problems. In this work, we propose a block Lanczos method for solving the large-scale QMPO. In the proposed method, the original problem is projected into a small-sized one, and the Riemannian Trust-Region method is employed to solve the reduced QMPO. Convergence results on the optimal solution, the optimal objective function value, the multiplier and the KKT error are established. Moreover, we give the convergence speed of optimal solution, and show that if the block Lanczos process terminates, then an exact KKT solution is derived. Numerical experiments illustrate the numerical behavior of the proposed algorithm, and demonstrate that it is more powerful than many state-of-the-art algorithms for large-scale quadratic minimization problems with orthogonality constraints.
The Teacher-Student Framework (TSF) is a reinforcement learning setting where a teacher agent guards the training of a student agent by intervening and providing online demonstrations. Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance. Nevertheless, in many real-world settings it is expensive or even impossible to obtain a well-performing teacher policy. In this work, we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. We instantiate an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared Control (TS2C), which incorporates teacher intervention based on trajectory-based value estimation. Theoretical analysis validates that the proposed TS2C algorithm attains efficient exploration and substantial safety guarantee without being affected by the teacher's own performance. Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost. Moreover, the student policy surpasses the imperfect teacher policy in terms of higher accumulated reward in held-out testing environments. Code is available at //metadriverse.github.io/TS2C.
In this paper, we introduce Optimal Classification Forests, a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed, each of them providing a predicted class for the observations in the feature space. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest. We provide a mixed integer linear programming formulation for the problem. We report the results of our computational experiments, from which we conclude that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods. More importantly, it achieves high prediction accuracy with, for example, orders of magnitude fewer trees than random forests. We also present three real-world case studies showing that our methodology has very interesting implications in terms of interpretability.
High-order implicit shock tracking (fitting) is a class of high-order, optimization-based numerical methods to approximate solutions of conservation laws with non-smooth features by aligning elements of the computational mesh with non-smooth features. This ensures the non-smooth features are perfectly represented by inter-element jumps and high-order basis functions approximate smooth regions of the solution without nonlinear stabilization, which leads to accurate approximations on traditionally coarse meshes. In this work, we introduce a robust implicit shock tracking framework specialized for problems with parameter-dependent lead shocks (i.e., shocks separating a farfield condition from the downstream flow), which commonly arise in high-speed aerodynamics and astrophysics applications. After a shock-aligned mesh is produced at one parameter configuration, all elements upstream of the lead shock are removed and the nodes on the lead shock are positioned for new parameter configurations using the implicit shock tracking solver. The proposed framework can be used for most many-query applications involving parametrized lead shocks such as optimization, uncertainty quantification, parameter sweeps, "what-if" scenarios, or parameter-based continuation. We demonstrate the robustness and flexibility of the framework using a one-dimensional space-time Riemann problem, and two- and three-dimensional supersonic and hypersonic benchmark problems.
Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise and improve the recall of the subgraphs, and hence can drive the distant nodes to converge towards the same centers. More specifically, the proposed method consists of two components, i.e. face embeddings enhancement using the embeddings from neighbors, and enclosed subgraph construction of node pairs for structural information extraction. The embeddings are combined to predict the linkage probabilities for all node pairs to replace the cosine similarities to produce new subgraphs that can be further used for aggregation of GCNs or other clustering methods. The proposed method is validated through extensive experiments against a range of clustering solutions using three benchmark datasets and numerical results confirm that it outperforms the SOTA solutions in terms of generalization capability.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.
The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.