In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of $1/\sqrt{n}$ (up to a logarithmic factor). Our statistical analysis implies that the key aspect behind this result is the proper choice of the initial inner weights and the adjustment of the outer weights via gradient descent. This indicates that we can also simply use linear least squares to choose the outer weights. We prove a corresponding theoretical result and compare our new linear least squares neural network estimate with standard neural network estimates via simulated data. Our simulations show that our theoretical considerations lead to an estimate with an improved performance in many cases.
Permutation pattern-avoidance is a central concept of both enumerative and extremal combinatorics. In this paper we study the effect of permutation pattern-avoidance on the complexity of optimization problems. In the context of the dynamic optimality conjecture (Sleator, Tarjan, STOC 1983), Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak (FOCS 2015) conjectured that the amortized access cost of an optimal binary search tree (BST) is $O(1)$ whenever the access sequence avoids some fixed pattern. They showed a bound of $2^{\alpha{(n)}^{O(1)}}$, which was recently improved to $2^{\alpha{(n)}(1+o(1))}$ by Chalermsook, Pettie, and Yingchareonthawornchai (2023); here $n$ is the BST size and $\alpha(\cdot)$ the inverse-Ackermann function. In this paper we resolve the conjecture, showing a tight $O(1)$ bound. This indicates a barrier to dynamic optimality: any candidate online BST (e.g., splay trees or greedy trees) must match this optimum, but current analysis techniques only give superconstant bounds. More broadly, we argue that the easiness of pattern-avoiding input is a general phenomenon, not limited to BSTs or even to data structures. To illustrate this, we show that when the input avoids an arbitrary, fixed, a priori unknown pattern, one can efficiently compute a $k$-server solution of $n$ requests from a unit interval, with total cost $n^{O(1/\log k)}$, in contrast to the worst-case $\Theta(n/k)$ bound; and a traveling salesman tour of $n$ points from a unit box, of length $O(\log{n})$, in contrast to the worst-case $\Theta(\sqrt{n})$ bound; similar results hold for the euclidean minimum spanning tree, Steiner tree, and nearest-neighbor graphs. We show both results to be tight. Our techniques build on the Marcus-Tardos proof of the Stanley-Wilf conjecture, and on the recently emerging concept of twin-width; we believe our techniques to be more generally applicable.
A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.
The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
We show how the relatively initial or relatively terminal fixed points for a well-behaved functor $F$ form a pair of adjoint functors between $F$-coalgebras and $F$-algebras. We use the language of locally presentable categories to find sufficient conditions for existence of this adjunction. We show that relative fixed points may be characterized as (co)equalizers of the free (co)monad on $F$. In particular, when $F$ is a polynomial functor on $\mathsf{Set}$ the relative fixed points are a quotient or subset of the free term algebra or the cofree term coalgebra. We give examples of the relative fixed points for polynomial functors and an example which is the Sierpinski carpet. Lastly, we prove a general preservation result for relative fixed points.
As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are non-negative, acting on diffraction-limited optical intensity patterns at the input field-of-view (FOV). Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.
For over two decades Internet access has been a topic of research and debate. Up-to-date evidence about key predictors such as age is important considering not only the complexities of access to the online medium but also the ever-changing nature of the Internet. This paper attempts to provide a stocktake of current trends in Internet access in New Zealand and their association with age. It relies on secondary analysis of data from a larger online panel survey of 1,001 adult users. Chi-square test of Independence and Cramer's V were used for analysis. A key finding uncovers an emerging gap in the quality of Internet access. While fibre is the predominant type of broadband connection at home, older adults are significantly less likely to have it, and more likely to adopt wireless broadband. Also, a large majority across all age groups have a positive view of the Internet. This was higher among older adults who, interestingly, were slightly more likely to say that their concern about the security of their personal details online has increased in the last year. The implications of the results are discussed and some directions for future research are proposed.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.