Given data on choices made by consumers for different assortments, a key challenge is to develop parsimonious models that describe and predict consumer choice behavior. One such choice model is the marginal distribution model, which requires only the specification of the marginal distributions of the random utilities of the alternatives to explain choice data. In this paper, we develop an exact characterization of the set of choice probabilities that can be represented by this model and show that verifying the consistency of choice probability data with this model is equivalent to solving a polynomial-size linear program. We extend these results to the case where alternatives are grouped based on the marginal distribution of their utilities. Based on the representable conditions, we find the best-fit to the choice data that reduces to solving a mixed integer convex program and develop novel prediction intervals for the choice probabilities of unseen assortments. Our numerical results show that the marginal distribution model provides much better representational power, estimation performance, and prediction accuracy than multinomial logit and much better computational performance than the random utility model.
Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks.
A novel numerical strategy is introduced for computing approximations of solutions to a Cahn-Hilliard model with degenerate mobilities. This model has recently been introduced as a second-order phase-field approximation for surface diffusion flows. Its numerical discretization is challenging due to the degeneracy of the mobilities, which generally requires an implicit treatment to avoid stability issues at the price of increased complexity costs. To mitigate this drawback, we consider new first- and second-order Scalar Auxiliary Variable (SAV) schemes that, differently from existing approaches, focus on the relaxation of the mobility, rather than the Cahn-Hilliard energy. These schemes are introduced and analysed theoretically in the general context of gradient flows and then specialised for the Cahn-Hilliard equation with mobilities. Various numerical experiments are conducted to highlight the advantages of these new schemes in terms of accuracy, effectiveness and computational cost.
In survey sampling, survey data do not necessarily represent the target population, and the samples are often biased. However, information on the survey weights aids in the elimination of selection bias. The Horvitz-Thompson estimator is a well-known unbiased, consistent, and asymptotically normal estimator; however, it is not efficient. Thus, this study derives the semiparametric efficiency bound for various target parameters by considering the survey weight as a random variable and consequently proposes a semiparametric optimal estimator with certain working models on the survey weights. The proposed estimator is consistent, asymptotically normal, and efficient in a class of the regular and asymptotically linear estimators. Further, a limited simulation study is conducted to investigate the finite sample performance of the proposed method. The proposed method is applied to the 1999 Canadian Workplace and Employee Survey data.
We propose a causal framework for decomposing a group disparity in an outcome in terms of an intermediate treatment variable. Our framework captures the contributions of group differences in baseline potential outcome, treatment prevalence, average treatment effect, and selection into treatment. This framework is counterfactually formulated and readily informs policy interventions. The decomposition component for differential selection into treatment is particularly novel, revealing a new mechanism for explaining and ameliorating disparities. This framework reformulates the classic Kitagawa-Blinder-Oaxaca decomposition in causal terms, supplements causal mediation analysis by explaining group disparities instead of group effects, and resolves conceptual difficulties of recent random equalization decompositions. We also provide a conditional decomposition that allows researchers to incorporate covariates in defining the estimands and corresponding interventions. We develop nonparametric estimators based on efficient influence functions of the decompositions. We show that, under mild conditions, these estimators are $\sqrt{n}$-consistent, asymptotically normal, semiparametrically efficient, and doubly robust. We apply our framework to study the causal role of education in intergenerational income persistence. We find that both differential prevalence of and differential selection into college graduation significantly contribute to the disparity in income attainment between income origin groups.
In the present work, we introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets. Reduced models provide a real-time numerical approximation by simplifying the original model. The error introduced by the such operation is usually neglected and sacrificed in order to reach a fast computation. We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions. We emphasize that the framework maximizes the exploitation of high-fidelity information, using it for building the reduced order model and for learning the residual. In this work, we explore the integration of proper orthogonal decomposition (POD), and gappy POD for sensors data, with the recent DeepONet architecture. Numerical investigations for a parametric benchmark function and a nonlinear parametric Navier-Stokes problem are presented.
Hidden Markov models (HMMs) are flexible tools for clustering dependent data coming from unknown populations, allowing nonparametric modelling of the population densities. Identifiability fails when the data is in fact independent, and we study the frontier between learnable and unlearnable two-state nonparametric HMMs. Interesting new phenomena emerge when the cluster distributions are modelled via density functions (the 'emission' densities) belonging to standard smoothness classes compared to the multinomial setting. Notably, in contrast to the multinomial setting previously considered, the identification of a direction separating the two emission densities becomes a critical, and challenging, issue. Surprisingly, it is possible to "borrow strength" from estimators of the smoother density to improve estimation of the other. We conduct precise analysis of minimax rates, showing a transition depending on the relative smoothnesses of the emission densities.
The last decade has seen many attempts to generalise the definition of modes, or MAP estimators, of a probability distribution $\mu$ on a space $X$ to the case that $\mu$ has no continuous Lebesgue density, and in particular to infinite-dimensional Banach and Hilbert spaces $X$. This paper examines the properties of and connections among these definitions. We construct a systematic taxonomy -- or `periodic table' -- of modes that includes the established notions as well as large hitherto-unexplored classes. We establish implications between these definitions and provide counterexamples to distinguish them. We also distinguish those definitions that are merely `grammatically correct' from those that are `meaningful' in the sense of satisfying certain `common-sense' axioms for a mode, among them the correct handling of discrete measures and those with continuous Lebesgue densities. However, despite there being 17 such `meaningful' definitions of mode, we show that none of them satisfy the `merging modes property', under which the modes of $\mu|_{A}$, $\mu|_{B}$ and $\mu|_{A \cup B}$ enjoy a straightforward relationship for open, positive-mass $A,B \subseteq X$.
Partial orders are a natural model for the social hierarchies that may constrain "queue-like" rank-order data. However, the computational cost of counting the linear extensions of a general partial order on a ground set with more than a few tens of elements is prohibitive. Vertex-series-parallel partial orders (VSPs) are a subclass of partial orders which admit rapid counting and represent the sorts of relations we expect to see in a social hierarchy. However, no Bayesian analysis of VSPs has been given to date. We construct a marginally consistent family of priors over VSPs with a parameter controlling the prior distribution over VSP depth. The prior for VSPs is given in closed form. We extend an existing observation model for queue-like rank-order data to represent noise in our data and carry out Bayesian inference on "Royal Acta" data and Formula 1 race data. Model comparison shows our model is a better fit to the data than Plackett-Luce mixtures, Mallows mixtures, and "bucket order" models and competitive with more complex models fitting general partial orders.
To determine any pattern in an ordered binary sequence of wins and losses of a player over a period of time, the Runs Test may show results contradictory to the intuition visualised by scatter plots of win proportions over time. We design a test suitable for this purpose by computing the gaps between two consecutive wins and then using exact binomial tests and non-parametric tests like Kendall's Tau and Siegel-Tukey's test for scale problem for determination of heteroscedastic patterns and direction of the occurrence of wins. Further modifications suggested by Jan Vegelius(1982) have been applied in the Siegel Tukey test to adjust for tied ranks.
A novel numerical strategy is introduced for computing approximations of solutions to a Cahn-Hilliard model with degenerate mobilities. This model has recently been introduced as a second-order phase-field approximation for surface diffusion flows. Its numerical discretization is challenging due to the degeneracy of the mobilities, which generally requires an implicit treatment to avoid stability issues at the price of increased complexity costs. To mitigate this drawback, we consider new first- and second-order Scalar Auxiliary Variable (SAV) schemes that, differently from existing approaches, focus on the relaxation of the mobility, rather than the Cahn-Hilliard energy. These schemes are introduced and analysed theoretically in the general context of gradient flows and then specialised for the Cahn-Hilliard equation with mobilities. Various numerical experiments are conducted to highlight the advantages of these new schemes in terms of accuracy, effectiveness and computational cost.