A recurring challenge in the application of redistricting simulation algorithms lies in extracting useful summaries and comparisons from a large ensemble of districting plans. Researchers often compute summary statistics for each district in a plan, and then study their distribution across the plans in the ensemble. This approach discards rich geographic information that is inherent in districting plans. We introduce the projective average, an operation that projects a district-level summary statistic back to the underlying geography and then averages this statistic across plans in the ensemble. Compared to traditional district-level summaries, projective averages are a powerful tool for geographically granular, sub-district analysis of districting plans along a variety of dimensions. However, care must be taken to account for variation within redistricting ensembles, to avoid misleading conclusions. We propose and validate a multiple-testing procedure to control the probability of incorrectly identifying outlier plans or regions when using projective averages.
For approximate inference in the generalized quadratic equations model, many state-of-the-art algorithms lack any prior knowledge of the target signal structure, exhibits slow convergence, and can not handle any analytic prior knowledge of the target signal structure. So, this paper proposes a new algorithm, Quadratic Message passing (QMP). QMP has a complexity as low as $O(N^{3})$. The SE derived for QMP can capture precisely the per-iteration behavior of the simulated algorithm. Simulation results confirm QMP outperforms many state-of-the-art algorithms.
The recently proposed soft finite element method (SoftFEM) reduces the stiffness (condition numbers), consequently improving the overall approximation accuracy. The method subtracts a least-square term that penalizes the gradient jumps across mesh interfaces from the FEM stiffness bilinear form while maintaining the system's coercivity. Herein, we present two generalizations for SoftFEM that aim to improve the approximation accuracy and further reduce the discrete systems' stiffness. Firstly and most naturally, we generalize SoftFEM by adding a least-square term to the mass bilinear form. Superconvergent results of rates $h^6$ and $h^8$ for eigenvalues are established for linear uniform elements; $h^8$ is the highest order of convergence known in the literature. Secondly, we generalize SoftFEM by applying the blended Gaussian-type quadratures. We demonstrate further reductions in stiffness compared to traditional FEM and SoftFEM. The coercivity and analysis of the optimal error convergences follow the work of SoftFEM. Thus, this paper focuses on the numerical study of these generalizations. For linear and uniform elements, analytical eigenpairs, exact eigenvalue errors, and superconvergent error analysis are established. Various numerical examples demonstrate the potential of generalized SoftFEMs for spectral approximation, particularly in high-frequency regimes.
The accuracy and complexity of machine learning algorithms based on kernel optimization are determined by the set of kernels over which they are able to optimize. An ideal set of kernels should: admit a linear parameterization (for tractability); be dense in the set of all kernels (for robustness); be universal (for accuracy). Recently, a framework was proposed for using positive matrices to parameterize a class of positive semi-separable kernels. Although this class can be shown to meet all three criteria, previous algorithms for optimization of such kernels were limited to classification and furthermore relied on computationally complex Semidefinite Programming (SDP) algorithms. In this paper, we pose the problem of learning semiseparable kernels as a minimax optimization problem and propose a SVD-QCQP primal-dual algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches. Furthermore, we provide an efficient implementation of this algorithm for both classification and regression -- an implementation which enables us to solve problems with 100 features and up to 30,000 datums. Finally, when applied to benchmark data, the algorithm demonstrates the potential for significant improvement in accuracy over typical (but non-convex) approaches such as Neural Nets and Random Forest with similar or better computation time.
Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems certain algorithms perform well. In most current research into heuristic optimization algorithms, only a very limited number of scenarios, algorithm configurations and hyper-parameter settings are explored, leading to incomplete and often biased insights and results. This paper presents a novel approach we call explainable benchmarking. Introducing the IOH-Xplainer software framework, for analyzing and understanding the performance of various optimization algorithms and the impact of their different components and hyper-parameters. We showcase the framework in the context of two modular optimization frameworks. Through this framework, we examine the impact of different algorithmic components and configurations, offering insights into their performance across diverse scenarios. We provide a systematic method for evaluating and interpreting the behaviour and efficiency of iterative optimization heuristics in a more transparent and comprehensible manner, allowing for better benchmarking and algorithm design.
We explore the sampling problem within the framework where parallel evaluations of the gradient of the log-density are feasible. Our investigation focuses on target distributions characterized by smooth and strongly log-concave densities. We revisit the parallelized randomized midpoint method and employ proof techniques recently developed for analyzing its purely sequential version. Leveraging these techniques, we derive upper bounds on the Wasserstein distance between the sampling and target densities. These bounds quantify the runtime improvement achieved by utilizing parallel processing units, which can be considerable.
By relaxing conditions for natural structure learning algorithms, a family of constraint-based algorithms containing all exact structure learning algorithms under the faithfulness assumption, we define localised natural structure learning algorithms (LoNS). We also provide a set of necessary and sufficient assumptions for consistency of LoNS, which can be thought of as a strict relaxation of the restricted faithfulness assumption. We provide a practical LoNS algorithm that runs in exponential time, which is then compared with related existing structure learning algorithms, namely PC/SGS and the relatively recent Sparsest Permutation algorithm. Simulation studies are also provided.
Artificial intelligence workloads, especially transformer models, exhibit emergent sparsity in which computations perform selective sparse access to dense data. The workloads are inefficient on hardware designed for dense computations and do not map well onto sparse data representations. We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computations and avoids branches based on a runtime evaluation of sparsity. We use a combination of dynamic code lookup to adapt to the specific sparsity encoded in the B matrix and preprocessing of sparsity maps of the A and B matrices to compute conditional branches once for the whole computation. For a wide range of sparsity, from 60% to 95% zeros, our implementation performs fewer instructions and increases performance when compared with Intel MKL's dense or sparse matrix multiply routines. Benefits can be as large as 2 times speedup and 4 times fewer instructions.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Cold-start problems are long-standing challenges for practical recommendations. Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions. This paper addresses such problems using few-shot learning and meta learning. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. To accomplish this, we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework, namely Scenario-specific Sequential Meta learner (or s^2 meta). By doing so, our meta-learner produces a generic initial model through aggregating contextual information from a variety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge. Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation. Deployment is at the Guess You Like session, the front page of the Mobile Taobao.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).