Regression analysis based on many covariates is becoming increasingly common. However, when the number of covariates $p$ is of the same order as the number of observations $n$, statistical protocols like maximum likelihood estimation of regression and nuisance parameters become unreliable due to overfitting. Overfitting typically leads to systematic estimation biases, and to increased estimator variances. It is crucial to be able to correctly quantify these effects, for inference and prediction purposes. In literature, several methods have been proposed to overcome overfitting bias or adjust estimates. The vast majority of these focus on the regression parameters only, either via empirical regularization methods or by expansion for small ratios $p/n$. This failure to correctly estimate also the nuisance parameters may lead to significant errors in outcome predictions. In this paper we use the leave one out method to derive the compact set of non-linear equations for the overfitting biases of maximum likelihood (ML) estimators in parametric regression models, as obtained previously using the replica method. We show that these equations enable one to correct regression and nuisance parameter estimators, and make them asymptotically unbiased. To illustrate the theory we performed simulation studies for multiple regression models. In all cases we find excellent agreement between theory and simulations.
In the envy-free perfect matching problem, $n$ items with unit supply are available to be sold to $n$ buyers with unit demand. The objective is to find allocation and prices such that both seller's revenue and buyers' surpluses are maximized -- given the buyers' valuations for the items -- and all items must be sold. A previous work has shown that this problem can be solved in cubic time, using maximum weight perfect matchings to find optimal envy-free allocations and shortest paths to find optimal envy-free prices. In this work, I consider that buyers have fixed budgets, the items have quality measures and so the valuations are defined by multiplying these two quantities. Under this approach, I prove that the valuation matrix have the inverse Monge property, thus simplifying the search for optimal envy-free allocations and, consequently, for optimal envy-free prices through a strategy based on dynamic programming. As result, I propose an algorithm that finds optimal solutions in quadratic time.
We present a method for finding envy-free prices in a combinatorial auction where the consumers' number $n$ coincides with that of distinct items for sale, each consumer can buy one single item and each item has only one unit available. This is a particular case of the {\it unit-demand envy-free pricing problem}, and was recently revisited by Arbib et al. (2019). These authors proved that using a Fibonacci heap for solving the maximum weight perfect matching and the Bellman-Ford algorithm for getting the envy-free prices, the overall time complexity for solving the problem is $O(n^3)$. We propose a method based on dynamic programming design strategy that seeks the optimal envy-free prices by increasing the consumers' utilities, which has the same cubic complexity time as the aforementioned approach, but whose theoretical and empirical results indicate that our method performs faster than the shortest paths strategy, obtaining an average time reduction in determining optimal envy-free prices of approximately 48\%.
In many scientific applications the aim is to infer a function which is smooth in some areas, but rough or even discontinuous in other areas of its domain. Such spatially inhomogeneous functions can be modelled in Besov spaces with suitable integrability parameters. In this work we study adaptive Bayesian inference over Besov spaces, in the white noise model from the point of view of rates of contraction, using $p$-exponential priors, which range between Laplace and Gaussian and possess regularity and scaling hyper-parameters. To achieve adaptation, we employ empirical and hierarchical Bayes approaches for tuning these hyper-parameters. Our results show that, while it is known that Gaussian priors can attain the minimax rate only in Besov spaces of spatially homogeneous functions, Laplace priors attain the minimax or nearly the minimax rate in both Besov spaces of spatially homogeneous functions and Besov spaces permitting spatial inhomogeneities.
We introduce a new stochastic algorithm to locate the index-1 saddle points of a function $V:\mathbb R^d \to \mathbb R$, with $d$ possibly large. This algorithm can be seen as an equivalent of the stochastic gradient descent which is a natural stochastic process to locate local minima. It relies on two ingredients: (i) the concentration properties on index-1 saddle points of the first eigenmodes of the Witten Laplacian (associated with $V$) on $1$-forms and (ii) a probabilistic representation of a partial differential equation involving this differential operator. Numerical examples on simple molecular systems illustrate the efficacy of the proposed approach.
Many economic panel and dynamic models, such as rational behavior and Euler equations, imply that the parameters of interest are identified by conditional moment restrictions with high dimensional conditioning instruments. We develop a novel inference method for the parameters identified by conditional moment restrictions, where the dimension of the conditioning instruments is high and there is no prior information about which conditioning instruments are weak or irrelevant. Building on Bierens (1990), we propose penalized maximum statistics and combine bootstrap inference with model selection. Our method optimizes the asymptotic power against a set of $n^{-1/2}$-local alternatives of interest by solving a data-dependent max-min problem for tuning parameter selection. We demonstrate the efficacy of our method by two empirical examples: the elasticity of intertemporal substitution and rational unbiased reporting of ability status. Extensive Monte Carlo experiments based on the first empirical example show that our inference procedure is superior to those available in the literature in realistic settings.
The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
We provide the first convergence guarantees for the Consistency Models (CMs), a newly emerging type of one-step generative models that can generate comparable samples to those generated by Diffusion Models. Our main result is that, under the basic assumptions on score-matching errors, consistency errors and smoothness of the data distribution, CMs can efficiently sample from any realistic data distribution in one step with small $W_2$ error. Our results (1) hold for $L^2$-accurate score and consistency assumption (rather than $L^\infty$-accurate); (2) do note require strong assumptions on the data distribution such as log-Sobelev inequality; (3) scale polynomially in all parameters; and (4) match the state-of-the-art convergence guarantee for score-based generative models (SGMs). We also provide the result that the Multistep Consistency Sampling procedure can further reduce the error comparing to one step sampling, which support the original statement of "Consistency Models, Yang Song 2023". Our result further imply a TV error guarantee when take some Langevin-based modifications to the output distributions.
Humans effortlessly infer the 3D shape of objects. What computations underlie this ability? Although various computational models have been proposed, none of them capture the human ability to match object shape across viewpoints. Here, we ask whether and how this gap might be closed. We begin with a relatively novel class of computational models, 3D neural fields, which encapsulate the basic principles of classic analysis-by-synthesis in a deep neural network (DNN). First, we find that a 3D Light Field Network (3D-LFN) supports 3D matching judgments well aligned to humans for within-category comparisons, adversarially-defined comparisons that accentuate the 3D failure cases of standard DNN models, and adversarially-defined comparisons for algorithmically generated shapes with no category structure. We then investigate the source of the 3D-LFN's ability to achieve human-aligned performance through a series of computational experiments. Exposure to multiple viewpoints of objects during training and a multi-view learning objective are the primary factors behind model-human alignment; even conventional DNN architectures come much closer to human behavior when trained with multi-view objectives. Finally, we find that while the models trained with multi-view learning objectives are able to partially generalize to new object categories, they fall short of human alignment. This work provides a foundation for understanding human shape inferences within neurally mappable computational architectures and highlights important questions for future work.
We propose a new way to assess certain short constructed responses to mathematics items. Our approach uses a pipeline that identifies the key values specified by the student in their response. This allows us to determine the correctness of the response, as well as identify any misconceptions. The information from the value identification pipeline can then be used to provide feedback to the teacher and student. The value identification pipeline consists of two fine-tuned language models. The first model determines if a value is implicit in the student response. The second model identifies where in the response the key value is specified. We consider both a generic model that can be used for any prompt and value, as well as models that are specific to each prompt and value. The value identification pipeline is a more accurate and informative way to assess short constructed responses than traditional rubric-based scoring. It can be used to provide more targeted feedback to students, which can help them improve their understanding of mathematics.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.