We devise a polynomial-time algorithm for partitioning a simple polygon $P$ into a minimum number of star-shaped polygons. The question of whether such an algorithm exists has been open for more than four decades [Avis and Toussaint, Pattern Recognit., 1981] and it has been repeated frequently, for example in O'Rourke's famous book [Art Gallery Theorems and Algorithms, 1987]. In addition to its strong theoretical motivation, the problem is also motivated by practical domains such as CNC pocket milling, motion planning, and shape parameterization. The only previously known algorithm for a non-trivial special case is for $P$ being both monotone and rectilinear [Liu and Ntafos, Algorithmica, 1991]. For general polygons, an algorithm was only known for the restricted version in which Steiner points are disallowed [Keil, SIAM J. Comput., 1985], meaning that each corner of a piece in the partition must also be a corner of $P$. Interestingly, the solution size for the restricted version may be linear for instances where the unrestricted solution has constant size. The covering variant in which the pieces are star-shaped but allowed to overlap--known as the Art Gallery Problem--was recently shown to be $\exists\mathbb R$-complete and is thus likely not in NP [Abrahamsen, Adamaszek and Miltzow, STOC 2018 & J. ACM 2022]; this is in stark contrast to our result. Arguably the most related work to ours is the polynomial-time algorithm to partition a simple polygon into a minimum number of convex pieces by Chazelle and Dobkin~[STOC, 1979 & Comp. Geom., 1985].
Scatter plots are popular for displaying 2D data, but in practice, many data sets have more than two dimensions. For the analysis of such multivariate data, it is often necessary to switch between scatter plots of different dimension pairs, e.g., in a scatter plot matrix (SPLOM). Alternative approaches include a "grand tour" for an overview of the entire data set or creating artificial axes from dimensionality reduction (DR). A cross-cutting concern in all techniques is the ability of viewers to find correspondence between data points in different views. Previous work proposed animations to preserve the mental map between view changes and to trace points as well as clusters between scatter plots of the same underlying data set. In this paper, we evaluate a variety of spline- and rotation-based view transitions in a crowdsourced user study focusing on ecological validity. Using the study results, we assess each animation's suitability for tracing points and clusters across view changes. We evaluate whether the order of horizontal and vertical rotation is relevant for task accuracy. The results show that rotations with an orthographic camera or staged expansion of a depth axis significantly outperform all other animation techniques for the traceability of individual points. Further, we provide a ranking of the animated transition techniques for traceability of individual points. However, we could not find any significant differences for the traceability of clusters. Furthermore, we identified differences by animation direction that could guide further studies to determine potential confounds for these differences. We publish the study data for reuse and provide the animation framework as a D3.js plug-in.
We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter. The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.
A Particle Swarm Optimizer for the search of balanced Boolean functions with good cryptographic properties is proposed in this paper. The algorithm is a modified version of the permutation PSO by Hu, Eberhart and Shi which preserves the Hamming weight of the particles positions, coupled with the Hill Climbing method devised by Millan, Clark and Dawson to improve the nonlinearity and deviation from correlation immunity of Boolean functions. The parameters for the PSO velocity equation are tuned by means of two meta-optimization techniques, namely Local Unimodal Sampling (LUS) and Continuous Genetic Algorithms (CGA), finding that CGA produces better results. Using the CGA-evolved parameters, the PSO algorithm is then run on the spaces of Boolean functions from $n=7$ to $n=12$ variables. The results of the experiments are reported, observing that this new PSO algorithm generates Boolean functions featuring similar or better combinations of nonlinearity, correlation immunity and propagation criterion with respect to the ones obtained by other optimization methods.
The mesh divergence problem occurring at subsonic and transonic speeds with the adjoint Euler equations is reviewed. By examining a recently derived analytic adjoint solution, it is shown that the explanation is that the adjoint solution is singular at the wall. The wall singularity is caused by the adjoint singularity at the trailing edge, but not in the way it was previously conjectured.
Synthetic ground motions (GMs) play a fundamental role in both deterministic and probabilistic seismic engineering assessments. This paper shows that the family of filtered and modulated white noise stochastic GM models overlooks a key parameter -- the high-pass filter's corner frequency, $f_c$. In the simulated motions, this causes significant distortions in the long-period range of the linear-response spectra and in the linear-response spectral correlations. To address this, we incorporate $f_c$ as an explicitly fitted parameter in a site-based stochastic model. We optimize $f_c$ by individually matching the long-period linear-response spectrum (i.e., $Sa(T)$ for $T \geq 1$s) of synthetic GMs with that of each recorded GM. We show that by fitting $f_c$ the resulting stochastically simulated GMs can precisely capture the spectral amplitudes, variability (i.e., variances of $\log(Sa(T))$), and the correlation structure (i.e., correlation of $\log(Sa(T))$ between distinct periods $T_1$ and $T_2$) of recorded GMs. To quantify the impact of $f_c$, a sensitivity analysis is conducted through linear regression. This regression relates the logarithmic linear-response spectrum ($\log(Sa(T))$) to seven GM parameters, including the optimized $f_c$. The results indicate that the variance of $f_c$ observed in natural GMs, along with its correlation with the other GM parameters, accounts for 26\% of the spectral variability in long periods. Neglecting either the $f_c$ variance or $f_c$ correlation typically results in an important overestimation of the linear-response spectral correlation.
We address the problem of uncertainty propagation and Bayesian fusion on unimodular Lie groups. Starting from a stochastic differential equation (SDE) defined on Lie groups via Mckean-Gangolli injection, we first convert it to a parametric SDE in exponential coordinates. The coefficient transform method for the conversion is stated for both Ito's and Stratonovich's interpretation of the SDE. Then we derive a mean and covariance fitting formula for probability distributions on Lie groups defined by a concentrated distribution on the exponential coordinate. It is used to derive the mean and covariance propagation equations for the SDE defined by injection, which coincides with the result derived from a Fokker-Planck equation in previous work. We also propose a simple modification to the update step of Kalman filters using the fitting formula, which improves the fusion accuracy with moderate computation time.
This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive a new upper bound on the minimal approximation error. This new bound is asymptotically sharp when the matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ obeys a spectral power-law decay. The relevant expected characteristic polynomials can be written as an extension of the expected polynomial for the restricted invertibility problem, incorporating two extra variable substitution operators. Finally, we propose a deterministic polynomial-time algorithm that achieves this error bound up to a computational error.
Studying the response of a climate system to perturbations has practical significance. Standard methods in computing the trajectory-wise deviation caused by perturbations may suffer from the chaotic nature that makes the model error dominate the true response after a short lead time. Statistical response, which computes the return described by the statistics, provides a systematic way of reaching robust outcomes with an appropriate quantification of the uncertainty and extreme events. In this paper, information theory is applied to compute the statistical response and find the most sensitive perturbation direction of different El Ni\~no-Southern Oscillation (ENSO) events to initial value and model parameter perturbations. Depending on the initial phase and the time horizon, different state variables contribute to the most sensitive perturbation direction. While initial perturbations in sea surface temperature (SST) and thermocline depth usually lead to the most significant response of SST at short- and long-range, respectively, initial adjustment of the zonal advection can be crucial to trigger strong statistical responses at medium-range around 5 to 7 months, especially at the transient phases between El Ni\~no and La Ni\~na. It is also shown that the response in the variance triggered by external random forcing perturbations, such as the wind bursts, often dominates the mean response, making the resulting most sensitive direction very different from the trajectory-wise methods. Finally, despite the strong non-Gaussian climatology distributions, using Gaussian approximations in the information theory is efficient and accurate for computing the statistical response, allowing the method to be applied to sophisticated operational systems.
The Kaczmarz algorithm is an iterative method that solves linear systems of equations. It stands out among iterative algorithms when dealing with large systems for two reasons. First, at each iteration, the Kaczmarz algorithm uses a single equation, resulting in minimal computational work per iteration. Second, solving the entire system may only require the use of a small subset of the equations. These characteristics have attracted significant attention to the Kaczmarz algorithm. Researchers have observed that randomly choosing equations can improve the convergence rate of the algorithm. This insight led to the development of the Randomized Kaczmarz algorithm and, subsequently, several other variations emerged. In this paper, we extensively analyze the native Kaczmarz algorithm and many of its variations using large-scale dense random systems as benchmarks. Through our investigation, we have verified that, for consistent systems, various row sampling schemes can outperform both the original and Randomized Kaczmarz method. Specifically, sampling without replacement and using quasirandom numbers are the fastest techniques. However, for inconsistent systems, the Conjugate Gradient method for Least-Squares problems overcomes all variations of the Kaczmarz method for these types of systems.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.