We introduce quaternary modified four $\mu$-circulant codes as a modification of four circulant codes. We give basic properties of quaternary modified four $\mu$-circulant Hermitian self-dual codes. We also construct quaternary modified four $\mu$-circulant Hermitian self-dual codes having large minimum weights. Two quaternary Hermitian self-dual $[56,28,16]$ codes are constructed for the first time. These codes improve the previously known lower bound on the largest minimum weight among all quaternary (linear) $[56,28]$ codes. In addition, these codes imply the existence of a quantum $[[56,0,16]]$ code.
We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.
Maximum entropy (Maxent) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, Maxent models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for Maxent models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical Maxent models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth Maxent models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth Maxent models efficiently. For Maxent models with discrete probability distribution of $n$ elements built from samples, each containing $m$ features, the stepsize parameters estimation and iterations in our algorithms scale on the order of $O(mn)$ operations and can be trivially parallelized. Moreover, the strong $\ell_{1}$ convexity of the Kullback--Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the arts by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.
We introduce a causal regularisation extension to anchor regression (AR) for improved out-of-distribution (OOD) generalisation. We present anchor-compatible losses, aligning with the anchor framework to ensure robustness against distribution shifts. Various multivariate analysis (MVA) algorithms, such as (Orthonormalized) PLS, RRR, and MLR, fall within the anchor framework. We observe that simple regularisation enhances robustness in OOD settings. Estimators for selected algorithms are provided, showcasing consistency and efficacy in synthetic and real-world climate science problems. The empirical validation highlights the versatility of anchor regularisation, emphasizing its compatibility with MVA approaches and its role in enhancing replicability while guarding against distribution shifts. The extended AR framework advances causal inference methodologies, addressing the need for reliable OOD generalisation.
We consider the chance-constrained binary knapsack problem (CKP), where the item weights are independent and normally distributed. We introduce a continuous relaxation for the CKP, represented as a non-convex optimization problem, which we call the non-convex relaxation. A comparative study shows that the non-convex relaxation provides an upper bound for the CKP, at least as tight as those obtained from other continuous relaxations for the CKP. Furthermore, the quality of the obtained upper bound is guaranteed to be at most twice the optimal objective value of the CKP. Despite its non-convex nature, we show that the non-convex relaxation can be solved in polynomial time. Subsequently, we proposed a polynomial-time 1/2-approximation algorithm for the CKP based on this relaxation, providing a lower bound for the CKP. Computational test results demonstrate that the non-convex relaxation and the proposed approximation algorithm yields tight lower and upper bounds for the CKP within a short computation time, ensuring the quality of the obtained bounds.
In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.
Data sets tend to live in low-dimensional non-linear subspaces. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The symmetric Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich mathematical structure to account for a wide range of non-linear geometries that has been shown to be able to capture the data geometry through empirical evidence from classical non-linear embedding. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised efficiently to data on a symmetric Riemannian manifold. A conceptual challenge comes from the lack of guidelines for constructing a symmetric Riemannian structure on the data space itself and the lack of guidelines for modifying successful algorithms on symmetric Riemannian manifolds for data analysis to this setting. This work considers these challenges in the setting of pullback Riemannian geometry through a diffeomorphism. The first part of the paper characterises diffeomorphisms that result in proper, stable and efficient data analysis. The second part then uses these best practices to guide construction of such diffeomorphisms through deep learning. As a proof of concept, different types of pullback geometries -- among which the proposed construction -- are tested on several data analysis tasks and on several toy data sets. The numerical experiments confirm the predictions from theory, i.e., that the diffeomorphisms generating the pullback geometry need to map the data manifold into a geodesic subspace of the pulled back Riemannian manifold while preserving local isometry around the data manifold for proper, stable and efficient data analysis, and that pulling back positive curvature can be problematic in terms of stability.
We consider Gibbs distributions, which are families of probability distributions over a discrete space $\Omega$ with probability mass function of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_{\min}, \beta_{\max}]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The partition function is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters of these distributions are the log partition ratio $q = \log \tfrac{Z(\beta_{\max})}{Z(\beta_{\min})}$ and the counts $c_x = |H^{-1}(x)|$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we also develop algorithms to compute the partition function $Z$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and using $\tilde O(\frac{n^2}{\varepsilon^2})$ samples for integer-valued distributions.
We compute the minimum distance of the parameterized code of order $1$ over an even cycle.
Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that {\em approximate} Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games.
We cover how to determine a sufficiently large sample size for a $K$-armed randomized experiment in order to estimate conditional counterfactual expectations in data-driven subgroups. The sub-groups can be output by any feature space partitioning algorithm, including as defined by binning users having similar predictive scores or as defined by a learned policy tree. After carefully specifying the inference target, a minimum confidence level, and a maximum margin of error, the key is to turn the original goal into a simultaneous inference problem where the recommended sample size to offset an increased possibility of estimation error is directly related to the number of inferences to be conducted. Given a fixed sample size budget, our result allows us to invert the question to one about the feasible number of treatment arms or partition complexity (e.g. number of decision tree leaves). Using policy trees to learn sub-groups, we evaluate our nominal guarantees on a large publicly-available randomized experiment test data set.