The release of SOSTOOLS v4.00 comes as we approach the 20th anniversary of the original release of SOSTOOLS v1.00 back in April, 2002. SOSTOOLS was originally envisioned as a flexible tool for parsing and solving polynomial optimization problems, using the SOS tightening of polynomial positivity constraints, and capable of adapting to the ever-evolving fauna of applications of SOS. There are now a variety of SOS programming parsers beyond SOSTOOLS, including YALMIP, Gloptipoly, SumOfSquares, and others. We hope SOSTOOLS remains the most intuitive, robust and adaptable toolbox for SOS programming. Recent progress in Semidefinite programming has opened up new possibilities for solving large Sum of Squares programming problems, and we hope the next decade will be one where SOS methods will find wide application in different areas. In SOSTOOLS v4.00, we implement a parsing approach that reduces the computational and memory requirements of the parser below that of the SDP solver itself. We have re-developed the internal structure of our polynomial decision variables. Specifically, polynomial and SOS variable declarations made using sossosvar, sospolyvar, sosmatrixvar, etc now return a new polynomial structure, dpvar. This new polynomial structure, is documented in the enclosed dpvar guide, and isolates the scalar SDP decision variables in the SOS program from the independent variables used to construct the SOS program. As a result, the complexity of the parser scales almost linearly in the number of decision variables. As a result of these changes, almost all users will notice a significant increase in speed, with large-scaleproblems experiencing the most dramatic speedups. Parsing time is now always less than 10% of time spent in the SDP solver. Finally, SOSTOOLS now provides support for the MOSEK solver interface as well as the SeDuMi, SDPT3, CSDP, SDPNAL, SDPNAL+, and SDPA solvers.
We consider the Cauchy problem for the Helmholtz equation with a domain in R^d, d>2 with N cylindrical outlets to infinity with bounded inclusions in R^{d-1}. Cauchy data are prescribed on the boundary of the bounded domains and the aim is to find solution on the unbounded part of the boundary. In 1989, Kozlov and Maz'ya proposed an alternating iterative method for solving Cauchy problems associated with elliptic,self-adjoint and positive-definite operators in bounded domains. Different variants of this method for solving Cauchy problems associated with Helmholtz-type operators exists. We consider the variant proposed by Mpinganzima et al. for bounded domains and derive the necessary conditions for the convergence of the procedure in unbounded domains. For the numerical implementation, a finite difference method is used to solve the problem in a simple rectangular domain in R^2 that represent a truncated infinite strip. The numerical results shows that by appropriate truncation of the domain and with appropriate choice of the Robin parameters, the Robin-Dirichlet alternating iterative procedure is convergent.
Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to design novel algorithms that perform well in practice. In this paper, we systematically survey the types of theoretical analysis techniques that have been applied to edit distance and evaluate the extent to which each one has achieved these two goals. These techniques include traditional worst-case analysis, worst-case analysis parametrized by edit distance or entropy or compressibility, average-case analysis, semi-random models, and advice-based models. We find that the track record is mixed. On one hand, two algorithms widely used in practice have been born out of theoretical analysis and their empirical performance is captured well by theoretical predictions. On the other hand, all the algorithms developed using theoretical analysis as a yardstick since then have not had any practical relevance. We conclude by discussing the remaining open problems and how they can be tackled.
We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.
This paper explores formalizing Geometric (or Clifford) algebras into the Lean 3 theorem prover, building upon the substantial body of work that is the Lean mathematics library, mathlib. As we use Lean source code to demonstrate many of our ideas, we include a brief introduction to the Lean language targeted at a reader with no prior experience with Lean or theorem provers in general. We formalize the multivectors as the quotient of the tensor algebra by a suitable relation, which provides the ring structure automatically, then go on to establish the universal property of the Clifford algebra. We show that this is quite different to the approach taken by existing formalizations of Geometric algebra in other theorem provers; most notably, our approach does not require a choice of basis. We go on to show how operations and structure such as involutions, versors, and the $\mathbb{Z}_2$-grading can be defined using the universal property alone, and how to recover an induction principle from the universal property suitable for proving statements about these definitions. We outline the steps needed to formalize the wedge product and $\mathbb{N}$-grading, and some of the gaps in mathlib that currently make this challenging.
While utilization of digital agents to support crucial decision making is increasing, trust in suggestions made by these agents is hard to achieve. However, it is essential to profit from their application, resulting in a need for explanations for both the decision making process and the model. For many systems, such as common black-box models, achieving at least some explainability requires complex post-processing, while other systems profit from being, to a reasonable extent, inherently interpretable. We propose a rule-based learning system specifically conceptualised and, thus, especially suited for these scenarios. Its models are inherently transparent and easily interpretable by design. One key innovation of our system is that the rules' conditions and which rules compose a problem's solution are evolved separately. We utilise independent rule fitnesses which allows users to specifically tailor their model structure to fit the given requirements for explainability.
Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.
Emulators that can bypass computationally expensive scientific calculations with high accuracy and speed can enable new studies of fundamental science as well as more potential applications. In this work we discuss solving a system of constraint equations efficiently using a self-learning emulator. A self-learning emulator is an active learning protocol that can be used with any emulator that faithfully reproduces the exact solution at selected training points. The key ingredient is a fast estimate of the emulator error that becomes progressively more accurate as the emulator is improved, and the accuracy of the error estimate can be corrected using machine learning. We illustrate with three examples. The first uses cubic spline interpolation to find the solution of a transcendental equation with variable coefficients. The second example compares a spline emulator and a reduced basis method emulator to find solutions of a parameterized differential equation. The third example uses eigenvector continuation to find the eigenvectors and eigenvalues of a large Hamiltonian matrix that depends on several control parameters.
In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. The methods that can incorporate such restrictions can improve model interpretability and prediction accuracy. Penalized regression can integrate selection rules by assigning the coefficients to different groups and then applying penalties to the groups. However, no general framework has been proposed to formalize selection rules and their applications. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We show that all selection rules can be represented as a combination of operations on constructs, which can be used to identify the related selection dictionary. One may then apply some criteria to select the best model. We show that the theoretical framework can help to identify the grouping structure in existing penalized regression methods. In addition, we formulate structured variable selection into mixed-integer optimization problems which can be solved by existing software. Finally, we discuss the significance of the framework in the context of statistics.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.