Many algorithms for determining properties of real algebraic or semi-algebraic sets rely upon the ability to compute smooth points. Existing methods to compute smooth points on semi-algebraic sets use symbolic quantifier elimination tools. In this paper, we present a simple algorithm based on computing the critical points of some well-chosen function that guarantees the computation of smooth points in each connected compact component of a real (semi)-algebraic set. Our technique is intuitive in principal, performs well on previously difficult examples, and is straightforward to implement using existing numerical algebraic geometry software. The practical efficiency of our approach is demonstrated by solving a conjecture on the number of equilibria of the Kuramoto model for the $n=4$ case. We also apply our method to design an efficient algorithm to compute the real dimension of (semi)-algebraic sets, the original motivation for this research.
We consider the problem of uncertainty quantification in change point regressions, where the signal can be piecewise polynomial of arbitrary but fixed degree. That is we seek disjoint intervals which, uniformly at a given confidence level, must each contain a change point location. We propose a procedure based on performing local tests at a number of scales and locations on a sparse grid, which adapts to the choice of grid in the sense that by choosing a sparser grid one explicitly pays a lower price for multiple testing. The procedure is fast as its computational complexity is always of the order $\mathcal{O} (n \log (n))$ where $n$ is the length of the data, and optimal in the sense that under certain mild conditions every change point is detected with high probability and the widths of the intervals returned match the mini-max localisation rates for the associated change point problem up to log factors. A detailed simulation study shows our procedure is competitive against state of the art algorithms for similar problems. Our procedure is implemented in the R package ChangePointInference which is available via //github.com/gaviosha/ChangePointInference.
Several physical problems modeled by second-order partial differential equations can be efficiently solved using mixed finite elements of the Raviart-Thomas family for N-simplexes, introduced in the seventies. In case Neumann conditions are prescribed on a curvilinear boundary, the normal component of the flux variable should preferably not take up values at nodes shifted to the boundary of the approximating polytope in the corresponding normal direction. This is because the method's accuracy downgrades, which was shown in \cite{FBRT}. In that work an order-preserving technique was studied, based on a parametric version of these elements with curved simplexes. In this paper an alternative with straight-edged triangles for two-dimensional problems is proposed. The key point of this method is a Petrov-Galerkin formulation of the mixed problem, in which the test-flux space is a little different from the shape-flux space. After carrying out a well-posedness and stability analysis, error estimates of optimal order are proven.
Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.
In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $\gamma$ measure defined in terms of the smallest string attractor, and the $\delta$ measure defined in terms of the number of distinct substrings of the input string. Concretely, we introduce the two-dimensional measures $\gamma_{2D}$ and $\delta_{2D}$ as natural generalizations of $\gamma$ and $\delta$ and study some of their properties. Among other things, we prove that $\delta_{2D}$ is monotone and can be computed in linear time, and we show that although it is still true that $\delta_{2D} \leq \gamma_{2D}$ the gap between the two measures can be $\Omega(\sqrt{n})$ for families of $n\times n$ matrices and therefore asymptotically larger than the gap in one-dimension. Finally, we use the measures $\gamma_{2D}$ and $\delta_{2D}$ to provide the first analysis of the space usage of the two-dimensional block tree introduced in [Brisaboa et al., Two-dimensional block trees, The computer Journal, 2023].
The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.
An alternative approach for the panel second stage of data envelopment analysis (DEA) is presented in this paper. Instead of efficiency scores, we propose to model rankings in the second stage using a dynamic ranking model in the score-driven framework. We argue that this approach is suitable to complement traditional panel regression as a robustness check. To demonstrate the proposed approach, we determine research efficiency in the higher education sector by examining scientific publications and analyze its relation to good governance. The proposed approach confirms positive relation to the Voice and Accountability indicator, as found by the standard panel linear regression, while suggesting caution regarding the Government Effectiveness indicator.
We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=w_1f_1+w_2f_2, \quad w_1+w_2=1, \quad w_1,w_2>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_1)\cap \text{supp}(\nu_2)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.
The goal of this work is to study waves interacting with partially immersed objects allowed to move freely in the vertical direction, and in a regime in which the propagation of the waves is described by the one dimensional Boussinesq-Abbott system. The problem can be reduced to a transmission problem for this Boussinesq system, in which the transmission conditions between the components of the domain at the left and at the right of the object are determined through the resolution of coupled forced ODEs in time satisfied by the vertical displacement of the object and the average discharge in the portion of the fluid located under the object. We propose a new extended formulation in which these ODEs are complemented by two other forced ODEs satisfied by the trace of the surface elevation at the contact points. The interest of this new extended formulation is that the forcing terms are easy to compute numerically and that the surface elevation at the contact points is furnished for free. Based on this formulation, we propose a second order scheme that involves a generalization of the MacCormack scheme with nonlocal flux and a source term, which is coupled to a second order Heun scheme for the ODEs. In order to validate this scheme, several explicit solutions for this wave-structure interaction problem are derived and can serve as benchmark for future codes. As a byproduct, our method provides a second order scheme for the generation of waves at the entrance of the numerical domain for the Boussinesq-Abbott system.
The accurate and efficient simulation of Partial Differential Equations (PDEs) in and around arbitrarily defined geometries is critical for many application domains. Immersed boundary methods (IBMs) alleviate the usually laborious and time-consuming process of creating body-fitted meshes around complex geometry models (described by CAD or other representations, e.g., STL, point clouds), especially when high levels of mesh adaptivity are required. In this work, we advance the field of IBM in the context of the recently developed Shifted Boundary Method (SBM). In the SBM, the location where boundary conditions are enforced is shifted from the actual boundary of the immersed object to a nearby surrogate boundary, and boundary conditions are corrected utilizing Taylor expansions. This approach allows choosing surrogate boundaries that conform to a Cartesian mesh without losing accuracy or stability. Our contributions in this work are as follows: (a) we show that the SBM numerical error can be greatly reduced by an optimal choice of the surrogate boundary, (b) we mathematically prove the optimal convergence of the SBM for this optimal choice of the surrogate boundary, (c) we deploy the SBM on massively parallel octree meshes, including algorithmic advances to handle incomplete octrees, and (d) we showcase the applicability of these approaches with a wide variety of simulations involving complex shapes, sharp corners, and different topologies. Specific emphasis is given to Poisson's equation and the linear elasticity equations.
Effective resistances are ubiquitous in graph algorithms and network analysis. In this work, we study sublinear time algorithms to approximate the effective resistance of an adjacent pair $s$ and $t$. We consider the classical adjacency list model for local algorithms. While recent works have provided sublinear time algorithms for expander graphs, we prove several lower bounds for general graphs of $n$ vertices and $m$ edges: 1.It needs $\Omega(n)$ queries to obtain $1.01$-approximations of the effective resistance of an adjacent pair $s$ and $t$, even for graphs of degree at most 3 except $s$ and $t$. 2.For graphs of degree at most $d$ and any parameter $\ell$, it needs $\Omega(m/\ell)$ queries to obtain $c \cdot \min\{d, \ell\}$-approximations where $c>0$ is a universal constant. Moreover, we supplement the first lower bound by providing a sublinear time $(1+\epsilon)$-approximation algorithm for graphs of degree 2 except the pair $s$ and $t$. One of our technical ingredients is to bound the expansion of a graph in terms of the smallest non-trivial eigenvalue of its Laplacian matrix after removing edges. We discover a new lower bound on the eigenvalues of perturbed graphs (resp. perturbed matrices) by incorporating the effective resistance of the removed edge (resp. the leverage scores of the removed rows), which may be of independent interest.