Differential flatness enables efficient planning and control for underactuated robotic systems, but we lack a systematic and practical means of identifying a flat output (or determining whether one exists) for an arbitrary robotic system. In this work, we leverage recent results elucidating the role of symmetry in constructing flat outputs for free-flying robotic systems. Using the tools of Riemannian geometry, Lie group theory, and differential forms, we cast the search for a globally valid, equivariant flat output as an optimization problem. An approximate transcription of this continuum formulation to a quadratic program is performed, and its solutions for two example systems achieve precise agreement with the known closed-form flat outputs. Our results point towards a systematic, automated approach to numerically identify geometric flat outputs directly from the system model, particularly useful when complexity renders pen and paper analysis intractable.
Online computation is a concept to model uncertainty where not all information on a problem instance is known in advance. An online algorithm receives requests which reveal the instance piecewise and has to respond with irrevocable decisions. Often, an adversary is assumed that constructs the instance knowing the deterministic behavior of the algorithm. Thus, the adversary is able to tailor the input to any online algorithm. From a game theoretical point of view, the adversary and the online algorithm are players in an asymmetric two-player game. To overcome this asymmetry, the online algorithm is equipped with an isomorphic copy of the graph, which is referred to as unlabeled map. By applying the game theoretical perspective on online graph problems, where the solution is a subset of the vertices, we analyze the complexity of these online vertex subset games. For this, we introduce a framework for reducing online vertex subset games from TQBF. This framework is based on gadget reductions from 3-SATISFIABILITY to the corresponding offline problem. We further identify a set of rules for extending the 3-SATISFIABILITY-reduction and provide schemes for additional gadgets which assure that these rules are fulfilled. By extending the gadget reduction of the vertex subset problem with these additional gadgets, we obtain a reduction for the corresponding online vertex subset game. At last, we provide example reductions for online vertex subset games based on VERTEX COVER, INDEPENDENT SET, and DOMINATING SET, proving that they are PSPACE-complete. Thus, this paper establishes that the online version with a map of NP-complete vertex subset problems form a large class of PSPACE-complete problems.
Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all units for whom the treatment effect exceeds that threshold. For example, if the cutoff is zero, the estimand is the set of all units who would benefit from receiving treatment. Assigning treatment just to this set represents the optimal treatment rule that maximises the mean population outcome. Similarly, cutoffs greater than zero represent optimal rules under resource constraints. The level set estimator that we study follows the plug-in principle and consists of simply thresholding a good estimator of the CATE. While many CATE estimators have been recently proposed and analysed, how their properties relate to those of the corresponding level set estimators remains unclear. Our first goal is thus to fill this gap by deriving the asymptotic properties of level set estimators depending on which estimator of the CATE is used. Next, we identify a minimax optimal estimator in a model where the CATE, the propensity score and the outcome model are Holder-smooth of varying orders. We consider data generating processes that satisfy a margin condition governing the probability of observing units for whom the CATE is close to the threshold. We investigate the performance of the estimators in simulations and illustrate our methods on a dataset used to study the effects on mortality of laparoscopic vs open surgery in the treatment of various conditions of the colon.
A good automatic evaluation metric for language generation ideally correlates highly with human judgements of text quality. Yet, there is a dearth of such metrics, which inhibits the rapid and efficient progress of language generators. One exception is the recently proposed Mauve. In theory, Mauve measures an information-theoretic divergence between two probability distributions over strings: one representing the language generator under evaluation; the other representing the true natural language distribution. Mauve's authors argue that its success comes from the qualitative properties of their proposed divergence. Yet in practice, as this divergence is uncomputable, Mauve approximates it by measuring the divergence between multinomial distributions over clusters instead, where cluster assignments are attained by grouping strings based on a pre-trained language model's embeddings. As we show, however, this is not a tight approximation -- in either theory or practice. This begs the question: why does Mauve work so well? In this work, we show that Mauve was right for the wrong reasons, and that its newly proposed divergence is not necessary for its high performance. In fact, classical divergences paired with its proposed cluster-based approximation may actually serve as better evaluation metrics. We finish the paper with a probing analysis; this analysis leads us to conclude that -- by encoding syntactic- and coherence-level features of text, while ignoring surface-level features -- such cluster-based substitutes to string distributions may simply be better for evaluating state-of-the-art language generators.
Singularly perturbed problems present inherent difficulty due to the presence of a thin boundary layer in its solution. To overcome this difficulty, we propose using deep operator networks (DeepONets), a method previously shown to be effective in approximating nonlinear operators between infinite-dimensional Banach spaces. In this paper, we demonstrate for the first time the application of DeepONets to one-dimensional singularly perturbed problems, achieving promising results that suggest their potential as a robust tool for solving this class of problems. We consider the convergence rate of the approximation error incurred by the operator networks in approximating the solution operator, and examine the generalization gap and empirical risk, all of which are shown to converge uniformly with respect to the perturbation parameter. By utilizing Shishkin mesh points as locations of the loss function, we conduct several numerical experiments that provide further support for the effectiveness of operator networks in capturing the singular boundary layer behavior.
Topology optimization is a powerful tool utilized in various fields for structural design. However, its application has primarily been restricted to static or passively moving objects, mainly focusing on hard materials with limited deformations and contact capabilities. Designing soft and actively moving objects, such as soft robots equipped with actuators, poses challenges due to simulating dynamics problems involving large deformations and intricate contact interactions. Moreover, the optimal structure depends on the object's motion, necessitating a simultaneous design approach. To address these challenges, we propose "4D topology optimization," an extension of density-based topology optimization that incorporates the time dimension. This enables the simultaneous optimization of both the structure and self-actuation of soft bodies for specific dynamic tasks. Our method utilizes multi-indexed and hierarchized density variables distributed over the spatiotemporal design domain, representing the material layout, actuator layout, and time-varying actuation. These variables are efficiently optimized using gradient-based methods. Forward and backward simulations of soft bodies are done using the material point method, a Lagrangian-Eulerian hybrid approach, implemented on a recent automatic differentiation framework. We present several numerical examples of self-actuating soft body designs aimed at achieving locomotion, posture control, and rotation tasks. The results demonstrate the effectiveness of our method in successfully designing soft bodies with complex structures and biomimetic movements, benefiting from its high degree of design freedom.
The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. A recent line of work shows that the use of randomization in adversarial training is the key to find optimal strategies against adversarial example attacks. However, in a fully randomized setting where both the defender and the attacker can use randomized strategies, there are no efficient algorithm for finding such an optimal strategy. To fill the gap, we propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. FRAT maintains a lightweight mixture of models for the defender, with flexibility to efficiently update mixing weights and model parameters at each iteration. Furthermore, FRAT utilizes lightweight sampling subroutines to construct a random strategy for the attacker. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker. Experimental results also demonstrate the efficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.
A lattice quantizer approximates an arbitrary real-valued source vector with a vector taken from a specific discrete lattice. The quantization error is the difference between the source vector and the lattice vector. In a classic 1996 paper, Zamir and Feder show that the globally optimal lattice quantizer (which minimizes the mean square error) has white quantization error: for a uniformly distributed source, the covariance of the error is the identity matrix, multiplied by a positive real factor. We generalize the theorem, showing that the same property holds (i) for any lattice whose mean square error cannot be decreased by a small perturbation of the generator matrix, and (ii) for an optimal product of lattices that are themselves locally optimal in the sense of (i). We derive an upper bound on the normalized second moment (NSM) of the optimal lattice in any dimension, by proving that any lower- or upper-triangular modification to the generator matrix of a product lattice reduces the NSM. Using these tools and employing the best currently known lattice quantizers to build product lattices, we construct improved lattice quantizers in dimensions 13 to 15, 17 to 23, and 25 to 48. In some dimensions, these are the first reported lattices with normalized second moments below the best known upper bound.
Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.
Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual minimality and plausibility.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.