This paper studies the prediction task of tensor-on-tensor regression in which both covariates and responses are multi-dimensional arrays (a.k.a., tensors) across time with arbitrary tensor order and data dimension. Existing methods either focused on linear models without accounting for possibly nonlinear relationships between covariates and responses, or directly employed black-box deep learning algorithms that failed to utilize the inherent tensor structure. In this work, we propose a Factor Augmented Tensor-on-Tensor Neural Network (FATTNN) that integrates tensor factor models into deep neural networks. We begin with summarizing and extracting useful predictive information (represented by the ``factor tensor'') from the complex structured tensor covariates, and then proceed with the prediction task using the estimated factor tensor as input of a temporal convolutional neural network. The proposed methods effectively handle nonlinearity between complex data structures, and improve over traditional statistical models and conventional deep learning approaches in both prediction accuracy and computational cost. By leveraging tensor factor models, our proposed methods exploit the underlying latent factor structure to enhance the prediction, and in the meantime, drastically reduce the data dimensionality that speeds up the computation. The empirical performances of our proposed methods are demonstrated via simulation studies and real-world applications to three public datasets. Numerical results show that our proposed algorithms achieve substantial increases in prediction accuracy and significant reductions in computational time compared to benchmark methods.
We study push-based sampling and transmission policies for a status update system consisting of a general finite-state continuous-time Markov chain (CTMC) information source with known dynamics, with the goal of minimizing the average age of incorrect information (AoII). The problem setting we investigate involves an exponentially distributed delay channel for transmissions and a constraint on the average sampling rate. We first show that the optimum sampling and transmission policy is a 'multi-threshold policy', where the thresholds depend on both the estimation value and the state of the original process, and sampling and transmission need to be initiated when the instantaneous AoII exceeds the corresponding threshold, called the estimation- and state-aware transmission (ESAT) policy. Subsequently, we formulate the problem of finding the thresholds as a constrained semi-Markov decision process (CSMDP) and the Lagrangian approach. Additionally, we propose two lower complexity sub-optimum policies, namely the estimation-aware transmission (EAT) policy, and the single-threshold (ST) policy, for which it is possible to obtain these thresholds for CTMCs with relatively larger number of states. The underlying CSMDP formulation relies on the 'multi-regime phase-type' (MRPH) distribution which is a generalization of the well-known phase-type distribution, which allows us to obtain the distribution of time until absorption in a CTMC whose transition rates change with respect to time in a piece-wise manner. The effectiveness of the proposed ESAT, EAT and ST sampling and transmission policies are shown through numerical examples, along with comparisons with a baseline scheme that transmits packets according to a Poisson process in out-of-sync periods.
Image search and retrieval tasks can perpetuate harmful stereotypes, erase cultural identities, and amplify social disparities. Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity. We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We develop practical methods for estimating MPR, provide theoretical guarantees, and propose optimization algorithms to ensure MPR in retrieval. We demonstrate that existing methods optimizing for equal and proportional representation metrics may fail to promote MPR. Crucially, our work shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.
This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about $100$%, but SOTA robustness to $\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with $20$% ($70$%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving $74$% AutoAttack accuracy ($+3$% gain). However, our scaling laws also predict robustness slowly grows then plateaus at $90$%: dwarfing our new SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this predicted limit, we carry out a small-scale human evaluation on the AutoAttack data that fools our top-performing model. Concerningly, we estimate that human performance also plateaus near $90$%, which we show to be attributable to $\ell_{\infty}$-constrained attacks' generation of invalid images not consistent with their original labels. Having characterized limiting roadblocks, we outline promising paths for future research.
Semantic segmentation of road elements in 2D images is a crucial task in the recognition of some static objects such as lane lines and free space. In this paper, we propose DHSNet,which extracts the objects features with a end-to-end architecture along with a heatmap proposal. Deformable convolutions are also utilized in the proposed network. The DHSNet finely combines low-level feature maps with high-level ones by using upsampling operators as well as downsampling operators in a U-shape manner. Besides, DHSNet also aims to capture static objects of various shapes and scales. We also predict a proposal heatmap to detect the proposal points for more accurate target aiming in the network.
This paper suggests a statistical framework for describing the relations between the physical and conceptual entities of a brain-like model. Features and concept instances are put into context, where the paper suggests that features may be the electrical wiring, although chemical connections are also possible. With this idea, the actual length of the connection is important, because it is related to firing rates and neuron synchronization, but the signal type is less important. The paper then suggests that concepts are neuron groups that link feature sets and concept instances are determined by chemical signals from those groups. Therefore, features become the static horizontal framework of the neural system and concepts are vertically interconnected combinations of these. With regards to functionality, the neuron is then considered to be functional and the more horizontal memory structures can be glial. This would also suggest that features can be distributed entities and not concentrated to a single area.
Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging by ensuring the linear mode connectivity. However, current re-basin strategies are ineffective in many scenarios due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, this paper provides novel insights into understanding and improving the standard practice. Firstly, we decompose re-normalization into rescaling and reshift, uncovering that rescaling plays a crucial role in re-normalization while re-basin performance is sensitive to shifts in model activation. The finding calls for a more nuanced handling of the activation shift. Secondly, we identify that the merged model suffers from the issue of activation collapse and magnitude collapse. Varying the learning rate, weight decay, and initialization method can mitigate the issues and improve model performance. Lastly, we propose a new perspective to unify the re-basin and pruning, under which a lightweight yet effective post-pruning technique is derived, which can significantly improve the model performance after pruning. Our implementation is available at //github.com/XingyuQu/rethink-re-basin.
Rank-rank regressions are widely used in economic research to evaluate phenomena such as intergenerational income persistence or mobility. However, when covariates are incorporated to capture between-group persistence, the resulting coefficients can be difficult to interpret as such. We propose the conditional rank-rank regression, which uses conditional ranks instead of unconditional ranks, to measure average within-group income persistence. This property is analogous to that of the unconditional rank-rank regression that measures the overall income persistence. The difference between conditional and unconditional rank-rank regression coefficients therefore can measure between-group persistence. We develop a flexible estimation approach using distribution regression and establish a theoretical framework for large sample inference. An empirical study on intergenerational income mobility in Switzerland demonstrates the advantages of this approach. The study reveals stronger intergenerational persistence between fathers and sons compared to fathers and daughters, with the within-group persistence explaining 62% of the overall income persistence for sons and 52% for daughters. Families of small size or with highly educated fathers exhibit greater persistence in passing on their economic status.
Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5.80, 4.60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.