In this note we use the State of the Union Address (SOTU) dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2. While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough. We also used a fine-tuned DistilBERT model for classification detecting which President delivered which address, with very good results (accuracy 93\% - 95\% depending on the run). An analogous task was performed to determine the year of writing, and we were able to pin it down to about 4 years (which is a single presidential term). It is worth noting that SOTU addresses provide relatively small writing samples (with about 8000 words on average, and varying widely from under 2000 words to more than 20000), and that the amount of authors is relatively large (we used SOTU addresses of 42 US presidents). This shows that the techniques employed turn out to be rather efficient, while all the computations described in this note can be performed using a single GPU instance of Google Colab. The accompanying code is available on GitHub.
We present an asymptotic expansion formula of an estimator for the drift coefficient of the fractional Ornstein-Uhlenbeck process. As the machinery, we apply the general expansion scheme for Wiener functionals recently developed by the authors [26]. The central limit theorem in the principal part of the expansion has the classical scaling T^{1/2}. However, the asymptotic expansion formula is a complex in that the order of the correction term becomes the classical T^{-1/2} for H in (1/2,5/8), but T^{4H-3} for H in [5/8, 3/4).
This article focuses on the coherent forecasting of the recently introduced novel geometric AR(1) (NoGeAR(1)) model - an INAR model based on inflated - parameter binomial thinning approach. Various techniques are available to achieve h - step ahead coherent forecasts of count time series, like median and mode forecasting. However, there needs to be more body of literature addressing coherent forecasting in the context of overdispersed count time series. Here, we study the forecasting distribution corresponding to NoGeAR(1) process using the Monte Carlo (MC) approximation method. Accordingly, several forecasting measures are employed in the simulation study to facilitate a thorough comparison of the forecasting capability of NoGeAR(1) with other models. The methodology is also demonstrated using real-life data, specifically the data on CW{\ss} TeXpert downloads and Barbados COVID-19 data.
We consider arbitrary bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find an almost periodic function which suitably characterizes the corresponding time series.
In this paper, we present a new high-order discontinuous Galerkin (DG) method, in which neither a penalty parameter nor a stabilization parameter is needed. We refer to this method as penalty-free DG (\PFDG). In this method, the trial and test functions belong to the broken Sobolev space, in which the functions are in general discontinuous on the mesh skeleton and do not meet the Dirichlet boundary conditions. However, a subset can be distinguished in this space, where the functions are continuous and satisfy the Dirichlet boundary conditions, and this subset is called admissible. The trial solution is chosen to lie in an \emph{augmented} admissible subset, in which a small violation of the continuity condition is permitted. This subset is constructed by applying special augmented constraints to the linear combination of finite element basis functions. In this approach, all the advantages of the DG method are retained without the necessity of using stability parameters or numerical fluxes. Several benchmark problems in two dimensions (Poisson equation, linear elasticity, hyperelasticity, and biharmonic equation) on polygonal (triangles, quadrilateral and weakly convex polygons) meshes as well as a three-dimensional Poisson problem on hexahedral meshes are considered. Numerical results are presented that affirm the sound accuracy and optimal convergence of the method in the $L^2$ norm and the energy seminorm.
This paper is concerned with games of infinite duration played over potentially infinite graphs. Recently, Ohlmann (LICS 2022) presented a characterisation of objectives admitting optimal positional strategies, by means of universal graphs: an objective is positional if and only if it admits well-ordered monotone universal graphs. We extend Ohlmann's characterisation to encompass (finite or infinite) memory upper bounds. We prove that objectives admitting optimal strategies with $\varepsilon$-memory less than $m$ (a memory that cannot be updated when reading an $\varepsilon$-edge) are exactly those which admit well-founded monotone universal graphs whose antichains have size bounded by $m$. We also give a characterisation of chromatic memory by means of appropriate universal structures. Our results apply to finite as well as infinite memory bounds (for instance, to objectives with finite but unbounded memory, or with countable memory strategies). We illustrate the applicability of our framework by carrying out a few case studies, we provide examples witnessing limitations of our approach, and we discuss general closure properties which follow from our results.
By abstracting over well-known properties of De Bruijn's representation with nameless dummies, we design a new theory of syntax with variable binding and capture-avoiding substitution. We propose it as a simpler alternative to Fiore, Plotkin, and Turi's approach, with which we establish a strong formal link. We also show that our theory easily incorporates simple types and equations between terms.
The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at //github.com/fangkaipeng/ProS.
In this paper we consider a superlinear one-dimensional elliptic boundary value problem that generalizes the one studied by Moore and Nehari in [43]. Specifically, we deal with piecewise-constant weight functions in front of the nonlinearity with an arbitrary number $\kappa\geq 1$ of vanishing regions. We study, from an analytic and numerical point of view, the number of positive solutions, depending on the value of a parameter $\lambda$ and on $\kappa$. Our main results are twofold. On the one hand, we study analytically the behavior of the solutions, as $\lambda\downarrow-\infty$, in the regions where the weight vanishes. Our result leads us to conjecture the existence of $2^{\kappa+1}-1$ solutions for sufficiently negative $\lambda$. On the other hand, we support such a conjecture with the results of numerical simulations which also shed light on the structure of the global bifurcation diagrams in $\lambda$ and the profiles of positive solutions. Finally, we give additional numerical results suggesting that the same high multiplicity result holds true for a much larger class of weights, also arbitrarily close to situations where there is uniqueness of positive solutions.
In this work, we present new constructions for topological subsystem codes using semi-regular Euclidean and hyperbolic tessellations. They give us new families of codes, and we also provide a new family of codes obtained through an already existing construction, due to Sarvepalli and Brown. We also prove new results that allow us to obtain the parameters of these new codes.
In this short note we formulate a stabilizer formalism in the language of noncommutative graphs. The classes of noncommutative graphs we consider are obtained via unitary representations of compact groups, and suitably chosen operators on finite-dimensional Hilbert spaces. Furthermore, in this framework, we generalize previous results in this area for determining when such noncommutative graphs have anticliques.