We consider an infinite family of exponents $e(l,k)$ with two parameters, $l$ and $k$, and derive sufficient conditions for $e(l,k)$ to be 0-APN over $\mathbb{F}_{2^n}$. These conditions allow us to generate, for each choice of $l$ and $k$, an infinite list of dimensions $n$ where $x^{e(l,k)}$ is 0-APN much more efficiently than in general. We observe that the Gold and Inverse exponents, as well as the inverses of the Gold exponents can be expressed in the form $e(l,k)$ for suitable $l$ and $k$. We characterize all cases in which $e(l,k)$ can be cyclotomic equivalent to a representative from the Gold, Kasami, Welch, Niho, and Inverse families of exponents. We characterize when $e(l,k)$ can lie in the same cyclotomic coset as the Dobbertin exponent (without considering inverses) and provide computational data showing that the Dobbertin inverse is never equivalent to $e(l,k)$. We computationally test the APN-ness of $e(l,k)$ for small values of $l$ and $k$ over $\mathbb{F}_{2^n}$ for $n \le 100$, and sketch the limits to which such tests can be performed using currently available technology. We conclude that there are no APN monomials among the tested functions, outside of known classes.
Likelihood-free inference involves inferring parameter values given observed data and a simulator model. The simulator is computer code which takes parameters, performs stochastic calculations, and outputs simulated data. In this work, we view the simulator as a function whose inputs are (1) the parameters and (2) a vector of pseudo-random draws. We attempt to infer all these inputs conditional on the observations. This is challenging as the resulting posterior can be high dimensional and involve strong dependence. We approximate the posterior using normalizing flows, a flexible parametric family of densities. Training data is generated by likelihood-free importance sampling with a large bandwidth value epsilon, which makes the target similar to the prior. The training data is "distilled" by using it to train an updated normalizing flow. The process is iterated, using the updated flow as the importance sampling proposal, and slowly reducing epsilon so the target becomes closer to the posterior. Unlike most other likelihood-free methods, we avoid the need to reduce data to low dimensional summary statistics, and hence can achieve more accurate results. We illustrate our method in two challenging examples, on queuing and epidemiology.
Behavioral cloning (BC) can recover a good policy from abundant expert data, but may fail when expert data is insufficient. This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies. Imitation learning with a supplementary dataset is an emergent practical framework, but its theoretical foundation remains under-developed. To advance understanding, we first investigate a direct extension of BC, called NBCU, that learns from the union of all available data. Our analysis shows that, although NBCU suffers an imitation gap that is larger than BC in the worst case, there exist special cases where NBCU performs better than or equally well as BC. This discovery implies that noisy data can also be helpful if utilized elaborately. Therefore, we further introduce a discriminator-based importance sampling technique to re-weight the supplementary data, proposing the WBCU method. With our newly developed landscape-based analysis, we prove that WBCU can outperform BC in mild conditions. Empirical studies show that WBCU simultaneously achieves the best performance on two challenging tasks where prior state-of-the-art methods fail.
A dynamic graph algorithm is a data structure that answers queries about a property of the current graph while supporting graph modifications such as edge insertions and deletions. Prior work has shown strong conditional lower bounds for general dynamic graphs, yet graph families that arise in practice often exhibit structural properties that the existing lower bound constructions do not possess. We study three specific graph families that are ubiquitous, namely constant-degree graphs, power-law graphs, and expander graphs, and give the first conditional lower bounds for them. Our results show that even when restricting our attention to one of these graph classes, any algorithm for fundamental graph problems such as distance computation or approximation or maximum matching, cannot simultaneously achieve a sub-polynomial update time and query time. For example, we show that the same lower bounds as for general graphs hold for maximum matching and ($s,t$)-distance in constant-degree graphs, power-law graphs or expanders. Namely, in an $m$-edge graph, there exists no dynamic algorithms with both $O(m^{1/2 - \epsilon})$ update time and $ O(m^{1 -\epsilon})$ query time, for any small $\epsilon > 0$. Note that for ($s,t$)-distance the trivial dynamic algorithm achieves an almost matching upper bound of constant update time and $O(m)$ query time. We prove similar bounds for the other graph families and for other fundamental problems such as densest subgraph detection and perfect matching.
Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is the number of simulated samples. This can lead to significant computational challenges since a large $m$ is required to obtain an accurate estimate, which is crucial for parameter estimation. In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. The estimator is particularly well suited for computationally expensive smooth simulators with low- to mid-dimensional inputs. This claim is supported through both theoretical results and an extensive simulation study on benchmark simulators.
We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb R^d$, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution $Q$ is a discrete measure supported on a finite number of points in $\mathbb R^d$. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate $n^{-1/2}$, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems.
Programming education should aim to provide students with a broad range of skills that they will later use while developing software. An important aspect in this is their ability to write code that is not only correct but also of high quality. Unfortunately, this is difficult to control in the setting of a massive open online course. In this paper, we carry out an analysis of the code quality of submissions from JetBrains Academy - a platform for studying programming in an industry-like project-based setting with an embedded code quality assessment tool called Hyperstyle. We analyzed more than a million Java submissions and more than 1.3 million Python submissions, studied the most prevalent types of code quality issues and the dynamics of how students fix them. We provide several case studies of different issues, as well as an analysis of why certain issues remain unfixed even after several attempts. Also, we studied abnormally long sequences of submissions, in which students attempted to fix code quality issues after passing the task. Our results point the way towards the improvement of online courses, such as making sure that the task itself does not incentivize students to write code poorly.
We consider the posets of equivalence relations on finite sets under the standard embedding ordering and under the consecutive embedding ordering. In the latter case, the relations are also assumed to have an underlying linear order, which governs consecutive embeddings. For each poset we ask the well quasi-order and atomicity decidability questions: Given finitely many equivalence relations $\rho_1,\dots,\rho_k$, is the downward closed set Av$(\rho_1,\dots,\rho_k)$ consisting of all equivalence relations which do not contain any of $\rho_1,\dots,\rho_k$: (a) well-quasi-ordered, meaning that it contains no infinite antichains? and (b) atomic, meaning that it is not a union of two proper downward closed subsets, or, equivalently, that it satisfies the joint embedding property?
In this paper, we analyze an operator splitting scheme of the nonlinear heat equation in $\Omega\subset\mathbb{R}^d$ ($d\geq 1$): $\partial_t u = \Delta u + \lambda |u|^{p-1} u$ in $\Omega\times(0,\infty)$, $u=0$ in $\partial\Omega\times(0,\infty)$, $u ({\bf x},0) =\phi ({\bf x})$ in $\Omega$. where $\lambda\in\{-1,1\}$ and $\phi \in W^{1,q}(\Omega)\cap L^{\infty} (\Omega)$ with $2\leq p < \infty$ and $d(p-1)/2<q<\infty$. We establish the well-posedness of the approximation of $u$ in $L^r$-space ($r\geq q$), and furthermore, we derive its convergence rate of order $\mathcal{O}(\tau)$ for a time step $\tau>0$. Finally, we give some numerical examples to confirm the reliability of the analyzed result.
In this paper, we apply the median-of-means principle to derive robust versions of local averaging rules in non-parametric regression. For various estimates, including nearest neighbors and kernel procedures, we obtain non-asymptotic exponential inequalities, with only a second moment assumption on the noise. We then show that these bounds cannot be significantly improved by establishing a corresponding lower bound on tail probabilities.
Convergence (virtual) bidding is an important part of two-settlement electric power markets as it can effectively reduce discrepancies between the day-ahead and real-time markets. Consequently, there is extensive research into the bidding strategies of virtual participants aiming to obtain optimal bids to submit to the day-ahead market. In this paper, we introduce a price-based general stochastic optimization framework to obtain optimal convergence bid curves. Within this framework, we develop a computationally tractable linear programming-based optimization model, which produces bid prices and volumes simultaneously. We also show that different approximations and simplifications in the general model lead naturally to state-of-the-art convergence bidding approaches, such as self-scheduling and opportunistic approaches. Our general framework also provides a straightforward way to compare the performance of these models, which is demonstrated by numerical experiments on the California (CAISO) market.