While conformal predictors reap the benefits of rigorous statistical guarantees on their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of finite-sample analysis and guarantees for their prediction set sizes. To address this shortfall, we theoretically quantify the expected size of the prediction sets under the split conformal prediction framework. As this precise formulation cannot usually be calculated directly, we further derive point estimates and high-probability interval bounds that can be empirically computed, providing a practical method for characterizing the expected set size. We corroborate the efficacy of our results with experiments on real-world datasets for both regression and classification problems.
To analyze the worst-case running time of branching algorithms, the majority of work in exponential time algorithms focuses on designing complicated branching rules over developing better analysis methods for simple algorithms. In the mid-$2000$s, Fomin et al. [2005] introduced measure & conquer, an advanced general analysis method, sparking widespread adoption for obtaining tighter worst-case running time upper bounds for many fundamental NP-complete problems. Yet, much potential in this direction remains untapped, as most subsequent work applied it without further advancement. Motivated by this, we present piecewise analysis, a new general method that analyzes the running time of branching algorithms. Our approach is to define a similarity ratio that divides instances into groups and then analyze the running time within each group separately. The similarity ratio is a scale between two parameters of an instance I. Instead of relying on a single measure and a single analysis for the whole instance space, our method allows to take advantage of different intrinsic properties of instances with different similarity ratios. To showcase its potential, we reanalyze two $17$-year-old algorithms from Fomin et al. [2007] that solve $4$-Coloring and #$3$-Coloring respectively. The original analysis in their paper gave running times of $O(1.7272^n)$ and $O(1.6262^n)$ respectively for these algorithms, our analysis improves these running times to $O(1.7215^n)$ and $O(1.6232^n)$. Among the two improvements, our new running time $O(1.7215^n)$ is the first improvement in the best known running time for the 4-Coloring problem since 2007.
This paper is devoted to find the numerical solutions of one dimensional general nonlinear system of third-order boundary value problems (BVPs) for the pair of functions using Galerkin weighted residual method. We derive mathematical formulations in matrix form, in details, by exploiting Bernstein polynomials as basis functions. A reasonable accuracy is found when the proposed method is used on few examples. At the end of the study, a comparison is made between the approximate and exact solutions, and also with the solutions of the existing methods. Our results converge monotonically to the exact solutions. In addition, we show that the the derived formulations may be applicable by reducing higher order complicated BVP into a lower order system of BVPs, and the performance of the numerical solutions is satisfactory.
We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph's paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with $n$ nodes and $m$ edges labeled over an alphabet of size $\sigma$. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time $\Omega(n\sigma)$, we present a new algorithm based on dynamic range stabbing building the LCP array in $O(n\log \sigma)$ time and $O(n\log\sigma)$ bits of working space.
Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.
With the robust uptick in the applications of Bayesian external data borrowing, eliciting a prior distribution with the proper amount of information becomes increasingly critical. The prior effective sample size (ESS) is an intuitive and efficient measure for this purpose. The majority of ESS definitions have been proposed in the context of borrowing control information. While many Bayesian models can be naturally extended to leveraging external information on the treatment effect scale, very little attention has been directed to computing the prior ESS in this setting. In this research, we bridge this methodological gap by extending the popular ELIR ESS definition. We lay out the general framework, and derive the prior ESS for various types of endpoints and treatment effect measures. The posterior distribution and the predictive consistency property of ESS are also examined. The methods are implemented in R programs available on GitHub: //github.com/squallteo/TrtEffESS.
We develop a thermodynamic theory for machine learning (ML) systems. Similar to physical thermodynamic systems which are characterized by energy and entropy, ML systems possess these characteristics as well. This comparison inspire us to integrate the concept of temperature into ML systems grounded in the fundamental principles of thermodynamics, and establish a basic thermodynamic framework for machine learning systems with non-Boltzmann distributions. We introduce the concept of states within a ML system, identify two typical types of state, and interpret model training and refresh as a process of state phase transition. We consider that the initial potential energy of a ML system is described by the model's loss functions, and the energy adheres to the principle of minimum potential energy. For a variety of energy forms and parameter initialization methods, we derive the temperature of systems during the phase transition both analytically and asymptotically, highlighting temperature as a vital indicator of system data distribution and ML training complexity. Moreover, we perceive deep neural networks as complex heat engines with both global temperature and local temperatures in each layer. The concept of work efficiency is introduced within neural networks, which mainly depends on the neural activation functions. We then classify neural networks based on their work efficiency, and describe neural networks as two types of heat engines.
We investigate the computational power of particle methods, a well-established class of algorithms with applications in scientific computing and computer simulation. The computational power of a compute model determines the class of problems it can solve. Automata theory allows describing the computational power of abstract machines (automata) and the problems they can solve. At the top of the Chomsky hierarchy of formal languages and grammars are Turing machines, which resemble the concept on which most modern computers are built. Although particle methods can be interpreted as automata based on their formal definition, their computational power has so far not been studied. We address this by analyzing Turing completeness of particle methods. In particular, we prove two sets of restrictions under which a particle method is still Turing powerful, and we show when it loses Turing powerfulness. This contributes to understanding the theoretical foundations of particle methods and provides insight into the powerfulness of computer simulations.
We propose a theoretical framework to compute, rapidly and accurately, the signal-to-noise ratio at the output of spatial-division multiplexing (SDM) linear MIMO equalizers with arbitrary numbers of spatial modes and filter taps and demonstrate three orders of magnitude of speed-up compared to Monte Carlo simulations.