We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.
Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.
In this paper we consider the finite element approximation of Maxwell's problem and analyse the prescription of essential boundary conditions in a weak sense using Nitsche's method. To avoid indefiniteness of the problem, the original equations are augmented with the gradient of a scalar field that allows one to impose the zero divergence of the magnetic induction, even if the exact solution for this scalar field is zero. Two finite element approximations are considered, namely, one in which the approximation spaces are assumed to satisfy the appropriate inf-sup condition that render the standard Galerkin method stable, and another augmented and stabilised one that permits the use of finite element interpolations of arbitrary order. Stability and convergence results are provided for the two finite element formulations considered.
Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We argue that this lead to a form of instability that lies at the heart of their generative capabilities and that can be described by a set of mean field critical exponents. We conclude by analyzing recent work connecting diffusion models and associative memory networks in view of the thermodynamic formulations.
Uniform continuity bounds on entropies are generally expressed in terms of a single distance measure between a pair of probability distributions or quantum states, typically, the total variation distance or trace distance. However, if an additional distance measure between the probability distributions or states is known, then the continuity bounds can be significantly strengthened. Here, we prove a tight uniform continuity bound for the Shannon entropy in terms of both the local- and total variation distances, sharpening an inequality proven in [I. Sason, IEEE Trans. Inf. Th., 59, 7118 (2013)]. We also obtain a uniform continuity bound for the von Neumann entropy in terms of both the operator norm- and trace distances. The bound is tight when the quotient of the trace distance by the operator norm distance is an integer. We then apply our results to compute upper bounds on the quantum- and private classical capacities of channels. We begin by refining the concept of approximate degradable channels, namely, $\varepsilon$-degradable channels, which are, by definition, $\varepsilon$-close in diamond norm to their complementary channel when composed with a degrading channel. To this end, we introduce the notion of $(\varepsilon,\nu)$-degradable channels; these are $\varepsilon$-degradable channels that are, in addition, $\nu$-close in completely bounded spectral norm to their complementary channel, when composed with the same degrading channel. This allows us to derive improved upper bounds to the quantum- and private classical capacities of such channels. Moreover, these bounds can be further improved by considering certain unstabilized versions of the above norms. We show that upper bounds on the latter can be efficiently expressed as semidefinite programs. We illustrate our results by obtaining a new upper bound on the quantum capacity of the qubit depolarizing channel.
In this study, we propose a novel multi-objective Bayesian optimization (MOBO) method to efficiently identify the Pareto front (PF) defined by risk measures for black-box functions under the presence of input uncertainty (IU). Existing BO methods for Pareto optimization in the presence of IU are risk-specific or without theoretical guarantees, whereas our proposed method addresses general risk measures and has theoretical guarantees. The basic idea of the proposed method is to assume a Gaussian process (GP) model for the black-box function and to construct high-probability bounding boxes for the risk measures using the GP model. Furthermore, in order to reduce the uncertainty of non-dominated bounding boxes, we propose a method of selecting the next evaluation point using a maximin distance defined by the maximum value of a quasi distance based on bounding boxes. As theoretical analysis, we prove that the algorithm can return an arbitrary-accurate solution in a finite number of iterations with high probability, for various risk measures such as Bayes risk, worst-case risk, and value-at-risk. We also give a theoretical analysis that takes into account approximation errors because there exist non-negligible approximation errors (e.g., finite approximation of PFs and sampling-based approximation of bounding boxes) in practice. We confirm that the proposed method outperforms compared with existing methods not only in the setting with IU but also in the setting of ordinary MOBO through numerical experiments.
The categorical models of the differential lambda-calculus are additive categories because of the Leibniz rule which requires the summation of two expressions. This means that, as far as the differential lambda-calculus and differential linear logic are concerned, these models feature finite non-determinism and indeed these languages are essentially non-deterministic. In a previous paper we introduced a categorical framework for differentiation which does not require additivity and is compatible with deterministic models such as coherence spaces and probabilistic models such as probabilistic coherence spaces. Based on this semantics we develop a syntax of a deterministic version of the differential lambda-calculus. One nice feature of this new approach to differentiation is that it is compatible with general fixpoints of terms, so our language is actually a differential extension of PCF for which we provide a fully deterministic operational semantics.
We study the stability of randomized Taylor schemes for ODEs. We consider three notions of probabilistic stability: asymptotic stability, mean-square stability, and stability in probability. We prove fundamental properties of the probabilistic stability regions and benchmark them against the absolute stability regions for deterministic Taylor schemes.
Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.
We introduce a Bayesian conditional autoregressive model for analyzing patient-specific and neighborhood risks of stillbirth and preterm birth within a city. Our fully Bayesian approach automatically learns the amount of spatial heterogeneity and spatial dependence between neighborhoods. Our model provides meaningful inferences and uncertainty quantification for both covariate effects and neighborhood risk probabilities through their posterior distributions. We apply our methodology to data from the city of Philadelphia. Using electronic health records (45,919 deliveries at hospitals within the University of Pennsylvania Health System) and United States Census Bureau data from 363 census tracts in Philadelphia, we find that both patient-level characteristics (e.g. self-identified race/ethnicity) and neighborhood-level characteristics (e.g. violent crime) are highly associated with patients' odds of stillbirth or preterm birth. Our neighborhood risk analysis further reveals that census tracts in West Philadelphia and North Philadelphia are at highest risk of these outcomes. Specifically, neighborhoods with higher rates of women in poverty or on public assistance have greater neighborhood risk for these outcomes, while neighborhoods with higher rates of college-educated women or women in the labor force have lower risk. Our findings could be useful for targeted individual and neighborhood interventions.
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.