This paper aims to characterize the typical factual characteristics of financial market returns and volatility and address the problem that the tail characteristics of asset returns have been not sufficiently considered, as an attempt to more effectively avoid risks and productively manage stock market risks. Thus, in this paper, the fat-tailed distribution and the leverage effect are introduced into the SV model. Next, the model parameters are estimated through MCMC. Subsequently, the fat-tailed distribution of financial market returns is comprehensively characterized and then incorporated with extreme value theory to fit the tail distribution of standard residuals. Afterward, a new financial risk measurement model is built, which is termed the SV-EVT-VaR-based dynamic model. With the use of daily S&P 500 index and simulated returns, the empirical results are achieved, which reveal that the SV-EVT-based models can outperform other models for out-of-sample data in backtesting and depicting the fat-tailed property of financial returns and leverage effect.
This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a very general and more realistic setup in which the comparison graph consists of hyper-edges of possible heterogeneous sizes and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly-used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in the scenarios when the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that it is the first time effective two-sample rank testing methods are proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences on statistics journals and movie rankings.
The unpredictability and volatility of the stock market render it challenging to make a substantial profit using any generalised scheme. Many previous studies tried different techniques to build a machine learning model, which can make a significant profit in the US stock market by performing live trading. However, very few studies have focused on the importance of finding the best features for a particular trading period. Our top approach used the performance to narrow down the features from a total of 148 to about 30. Furthermore, the top 25 features were dynamically selected before each time training our machine learning model. It uses ensemble learning with four classifiers: Gaussian Naive Bayes, Decision Tree, Logistic Regression with L1 regularization, and Stochastic Gradient Descent, to decide whether to go long or short on a particular stock. Our best model performed daily trade between July 2011 and January 2019, generating 54.35% profit. Finally, our work showcased that mixtures of weighted classifiers perform better than any individual predictor of making trading decisions in the stock market.
Large skew-t factor copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a conditionally Gaussian generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A fast stochastic gradient ascent algorithm is used to solve the variational optimization. The new methodology is used to estimate copula models for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. We show that intraday predictive densities from the skew-t copula are more accurate than from some other copula models, while portfolio selection strategies based on the estimated pairwise tail dependencies improve performance relative to the benchmark index.
This paper introduces a new method of discretization that collocates both endpoints of the domain and enables the complete convergence of the costate variables associated with the Hamilton boundary-value problem. This is achieved through the inclusion of an \emph{exceptional sample} to the roots of the Legendre-Lobatto polynomial, thus promoting the associated differentiation matrix to be full-rank. We study the location of the new sample such that the differentiation matrix is the most robust to perturbations and we prove that this location is also the choice that mitigates the Runge phenomenon associated with polynomial interpolation. Two benchmark problems are successfully implemented in support of our theoretical findings. The new method is observed to converge exponentially with the number of discretization points used.
Our work studies the fair allocation of indivisible items to a set of agents, and falls within the scope of establishing improved approximation guarantees. It is well known by now that the classic solution concepts in fair division, such as envy-freeness and proportionality, fail to exist in the presence of indivisible items. Unfortunately, the lack of existence remains unresolved even for some relaxations of envy-freeness, and most notably for the notion of EFX, which has attracted significant attention in the relevant literature. This in turn has motivated the quest for approximation algorithms, resulting in the currently best known $(\phi-1)$-approximation guarantee by Amanatidis et al (2020), where $\phi$ equals the golden ratio. So far, it has been notoriously hard to obtain any further advancements beyond this factor. Our main contribution is that we achieve better approximations, for certain special cases, where the agents agree on their perception of some items in terms of their worth. In particular, we first provide an algorithm with a $2/3$-approximation, when the agents agree on what are the top $n$ items (but not necessarily on their exact ranking), with $n$ being the number of agents. To do so, we also study a general framework that can be of independent interest for obtaining further guarantees.
Scientific optical 3D modeling requires the possibility to implement highly flexible and customizable mathematical models as well as high computing power. However, established ray tracing software for optical design and modeling purposes often has limitations in terms of access to underlying mathematical models and the possibility of accelerating the mostly CPU-based computation. To address these limitations, we propose the use of NVIDIA's OptiX Ray Tracing Engine as a highly flexible and high-performing alternative. OptiX offers a highly customizable ray tracing framework with onboard GPU support for parallel computing, as well as access to optimized ray tracing algorithms for accelerated computation. To demonstrate the capabilities of our approach, a realistic focus variation instrument is modeled, describing optical instrument components (light sources, lenses, detector, etc.) as well as the measuring sample surface mathematically or as meshed files. Using this focus variation instrument model, exemplary virtual measurements of arbitrary and standardized sample surfaces are carried out, generating image stacks of more than 100 images and tracing more than 1E9 light rays per image. The performance and accuracy of the simulations are qualitatively evaluated, and virtually generated detector images are compared with images acquired by a respective physical measuring device.
Matrix factorization (MF) is a classical collaborative filtering algorithm for recommender systems. It decomposes the user-item interaction matrix into a product of low-dimensional user representation matrix and item representation matrix. In typical recommendation scenarios, the user-item interaction paradigm is usually a two-stage process and requires static clustering analysis of the obtained user and item representations. The above process, however, is time and computationally intensive, making it difficult to apply in real-time to e-commerce or Internet of Things environments with billions of users and trillions of items. To address this, we propose a unified matrix factorization method based on dynamic multi-view clustering (MFDMC) that employs an end-to-end training paradigm. Specifically, in each view, a user/item representation is regarded as a weighted projection of all clusters. The representation of each cluster is learnable, enabling the dynamic discarding of bad clusters. Furthermore, we employ multi-view clustering to represent multiple roles of users/items, effectively utilizing the representation space and improving the interpretability of the user/item representations for downstream tasks. Extensive experiments show that our proposed MFDMC achieves state-of-the-art performance on real-world recommendation datasets. Additionally, comprehensive visualization and ablation studies interpretably confirm that our method provides meaningful representations for downstream tasks of users/items.
With the rapid development of facial forgery techniques, forgery detection has attracted more and more attention due to security concerns. Existing approaches attempt to use frequency information to mine subtle artifacts under high-quality forged faces. However, the exploitation of frequency information is coarse-grained, and more importantly, their vanilla learning process struggles to extract fine-grained forgery traces. To address this issue, we propose a progressive enhancement learning framework to exploit both the RGB and fine-grained frequency clues. Specifically, we perform a fine-grained decomposition of RGB images to completely decouple the real and fake traces in the frequency space. Subsequently, we propose a progressive enhancement learning framework based on a two-branch network, combined with self-enhancement and mutual-enhancement modules. The self-enhancement module captures the traces in different input spaces based on spatial noise enhancement and channel attention. The Mutual-enhancement module concurrently enhances RGB and frequency features by communicating in the shared spatial dimension. The progressive enhancement process facilitates the learning of discriminative features with fine-grained face forgery clues. Extensive experiments on several datasets show that our method outperforms the state-of-the-art face forgery detection methods.
Time series forecasting is widely used in business intelligence, e.g., forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.