This paper studies many-to-one assignment markets, or matching markets with wages. Although it is well-known that the core of this model is non-empty, the structure of the core has not been fully investigated. To the known dissimilarities with the one-to-one assignment game, we add that the bargaining set does not coincide with the core and the kernel may not be included in the core. Besides, not all extreme core allocations can be obtained by means of a lexicographic maximization or a lexicographic minimization procedure, as it is the case in the one-to-one assignment game. The maximum and minimum competitive salaries are characterized in two ways: axiomatically and by means of easily verifiable properties of an associated directed graph. Regarding the remaining extreme core allocations of the many-to-one assignment game, we propose a lexicographic procedure that, for each order on the set of workers, sequentially maximizes or minimizes each worker's competitive salary. This procedure provides all extreme vectors of competitive salaries, that is all extreme core allocations.
Various approaches to iterative refinement (IR) for least-squares problems have been proposed in the literature and it may not be clear which approach is suitable for a given problem. We consider three approaches to IR for least-squares problems when two precisions are used and review their theoretical guarantees, known shortcomings and when the method can be expected to recognize that the correct solution has been found, and extend uniform precision analysis for an IR approach based on the semi-normal equations to the two-precision case. We focus on the situation where it is desired to refine the solution to the working precision level. It is shown that the IR methods exhibit different sensitivities to the conditioning of the problem and the size of the least-squares residual, which should be taken into account when choosing the IR approach. We also discuss a new approach that is based on solving multiple least-squares problems.
The aim of the present work is to design, analyze theoretically, and test numerically, a generalized Dryja-Smith-Widlund (GDSW) preconditioner for composite Discontinuous Galerkin discretizations of multicompartment parabolic reaction-diffusion equations, where the solution can exhibit natural discontinuities across the domain. We prove that the resulting preconditioned operator for the solution of the discrete system arising at each time step converges with a scalable and quasi-optimal upper bound for the condition number. The GDSW preconditioner is then applied to the EMI (Extracellular - Membrane - Intracellular) reaction-diffusion system, recently proposed to model microscopically the spatiotemporal evolution of cardiac bioelectrical potentials. Numerical tests validate the scalability and quasi-optimality of the EMI-GDSW preconditioner, and investigate its robustness with respect to the time step size as well as jumps in the diffusion coefficients.
It is well known that a non-cooperative game may have multiple equilibria. In this paper we consider the efficiency of games, measured by the ratio between the aggregate payoff over all Nash equilibria and that over all admissible controls. Such efficiency operator is typically unstable with respect to small perturbation of the game. This seemingly bad property can actually be a good news in practice: it is possible that a small change of the game mechanism may improve the efficiency of the game dramatically. We shall introduce a game mediator with limited resources and investigate the mechanism designs aiming to improve the efficiency.
This paper compares statistical experiments in discounted problems, ranging from the simplest ones where the state is fixed and the flow of information exogenous to more complex ones, where the decision-maker controls the flow of information or the state changes over time.
Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether the techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unclear whether it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using weaker (less capable) feedback to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach with three LLM alignment tasks.
This paper proposes two innovative vector transport operators, leveraging the Cayley transform, for the generalized Stiefel manifold embedded with a non-standard metric. Specifically, it introduces the differentiated retraction and an approximation of the Cayley transform to the differentiated matrix exponential. These vector transports are demonstrated to satisfy the Ring-Wirth non-expansive condition under non-standard metrics, and one of them is also isometric. Building upon the novel vector transport operators, we extend the modified Polak-Ribi$\grave{e}$re-Polyak (PRP) conjugate gradient method to the generalized Stiefel manifold. Under a non-monotone line search condition, we prove our algorithm globally converges to a stationary point. The efficiency of the proposed vector transport operators is empirically validated through numerical experiments involving generalized eigenvalue problems and canonical correlation analysis.
Through an uncertainty quantification (UQ) perspective, we show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem, a model-form UQ bound that describes how the $L^2$ error from learning the score function propagates to a Wasserstein-1 ($\mathbf{d}_1$) ball around the true data distribution under the evolution of the Fokker-Planck equation. We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization expressiveness, and (e) reference distribution choice, impact the quality of the generative model in terms of a $\mathbf{d}_1$ bound of computable quantities. The WUP theorem relies on Bernstein estimates for Hamilton-Jacobi-Bellman partial differential equations (PDE) and the regularizing properties of diffusion processes. Specifically, PDE regularity theory shows that stochasticity is the key mechanism ensuring SGM algorithms are provably robust. The WUP theorem applies to integral probability metrics beyond $\mathbf{d}_1$, such as the total variation distance and the maximum mean discrepancy. Sample complexity and generalization bounds in $\mathbf{d}_1$ follow directly from the WUP theorem. Our approach requires minimal assumptions, is agnostic to the manifold hypothesis and avoids absolute continuity assumptions for the target distribution. Additionally, our results clarify the trade-offs among multiple error sources in SGMs.
In this paper, we develop a new and effective approach to nonparametric quantile regression that accommodates ultrahigh-dimensional data arising from spatio-temporal processes. This approach proves advantageous in staving off computational challenges that constitute known hindrances to existing nonparametric quantile regression methods when the number of predictors is much larger than the available sample size. We investigate conditions under which estimation is feasible and of good overall quality and obtain sharp approximations that we employ to devising statistical inference methodology. These include simultaneous confidence intervals and tests of hypotheses, whose asymptotics is borne by a non-trivial functional central limit theorem tailored to martingale differences. Additionally, we provide finite-sample results through various simulations which, accompanied by an illustrative application to real-worldesque data (on electricity demand), offer guarantees on the performance of the proposed methodology.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language