Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Classical strategies used to determine a suitable parameter value include the discrepancy principle and the L-curve criterion, and in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters and involves solving a nested optimization problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still an open question. In particular, a necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterizes positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and high-dimensional problems.
Synthetic data is essential for assessing clustering techniques, complementing and extending real data, and allowing for more complete coverage of a given problem's space. In turn, synthetic data generators have the potential of creating vast amounts of data -- a crucial activity when real-world data is at premium -- while providing a well-understood generation procedure and an interpretable instrument for methodically investigating cluster analysis algorithms. Here, we present Clugen, a modular procedure for synthetic data generation, capable of creating multidimensional clusters supported by line segments using arbitrary distributions. Clugen is open source, comprehensively unit tested and documented, and is available for the Python, R, Julia, and MATLAB/Octave ecosystems. We demonstrate that our proposal can produce rich and varied results in various dimensions, is fit for use in the assessment of clustering algorithms, and has the potential to be a widely used framework in diverse clustering-related research tasks.
We extend the theory of locally checkable labeling problems (LCLs) from the classical LOCAL model to a number of other models that have been studied recently, including the quantum-LOCAL model, finitely-dependent processes, non-signaling model, dynamic-LOCAL model, and online-LOCAL model [e.g. STOC 2024, ICALP 2023]. First, we demonstrate the advantage that finitely-dependent processes have over the classical LOCAL model. We show that all LCL problems solvable with locality $O(\log^* n)$ in the LOCAL model admit a finitely-dependent distribution (with constant locality). In particular, this gives a finitely-dependent coloring for regular trees, answering an open question by Holroyd [2023]. This also introduces a new formal barrier for understanding the distributed quantum advantage: it is not possible to exclude quantum advantage for any LCL in the $\Theta(\log^* n)$ complexity class by using non-signaling arguments. Second, we put limits on the capabilities of all of these models. To this end, we introduce a model called randomized online-LOCAL, which is strong enough to simulate e.g. SLOCAL and dynamic-LOCAL, and we show that it is also strong enough to simulate any non-signaling distribution and hence any quantum-LOCAL algorithm. We prove the following result for trees: if we can solve an LCL problem with locality $o(\log^{(5)} n)$ in the randomized online-LOCAL model, we can solve it with locality $O(\log^* n)$ in the classical deterministic LOCAL model. Put together, these results show that in trees the set of LCLs that can be solved with locality $O(\log^* n)$ is the same across all these models: locality $O(\log^* n)$ in quantum-LOCAL, non-signaling model, dynamic-LOCAL, or online-LOCAL is not stronger than locality $O(\log^* n)$ in the classical deterministic LOCAL model.
We prove lower bounds on the error of any estimator for the mean of a real probability distribution under the knowledge that the distribution belongs to a given set. We apply these lower bounds both to parametric and nonparametric estimation. In the nonparametric case, we apply our results to the question of sub-Gaussian estimation for distributions with finite variance to obtain new lower bounds in the small error probability regime, and present an optimal estimator in that regime. In the (semi-)parametric case, we use the Fisher information to provide distribution-dependent lower bounds that are constant-tight asymptotically, of order $\sqrt{2\log(1/\delta)/(nI)}$ where $I$ is the Fisher information of the distribution. We use known minimizers of the Fisher information on some nonparametric set of distributions to give lower bounds in cases such as corrupted distributions, or bounded/semi-bounded distributions.
Survival models capture the relationship between an accumulating hazard and the occurrence of a singular event stimulated by that accumulation. When the model for the hazard is sufficiently flexible survival models can accommodate a wide range of behaviors. If the hazard model is less flexible, for example when it is constrained by an external physical process, then the resulting survival model can be much too rigid. In this paper I introduce a modified survival model that generalizes the relationship between accumulating hazard and event occurrence with particular emphasis on capturing thresholding behavior. Finally I demonstrate the utility of this approach on a physiological application.
Mixture models are often used to identify meaningful subpopulations (i.e., clusters) in observed data such that the subpopulations have a real-world interpretation (e.g., as cell types). However, when used for subpopulation discovery, mixture model inference is usually ill-defined a priori because the assumed observation model is only an approximation to the true data-generating process. Thus, as the number of observations increases, rather than obtaining better inferences, the opposite occurs: the data is explained by adding spurious subpopulations that compensate for the shortcomings of the observation model. However, there are two important sources of prior knowledge that we can exploit to obtain well-defined results no matter the dataset size: known causal structure (e.g., knowing that the latent subpopulations cause the observed signal but not vice-versa) and a rough sense of how wrong the observation model is (e.g., based on small amounts of expert-labeled data or some understanding of the data-generating process). We propose a new model selection criteria that, while model-based, uses this available knowledge to obtain mixture model inferences that are robust to misspecification of the observation model. We provide theoretical support for our approach by proving a first-of-its-kind consistency result under intuitive assumptions. Simulation studies and an application to flow cytometry data demonstrate our model selection criteria consistently finds the correct number of subpopulations.
We consider the problem of inferring graph topology from smooth graph signals in a novel but practical scenario where data are located in distributed clients and prohibited from leaving local clients due to factors such as privacy concerns. The main difficulty in this task is how to exploit the potentially heterogeneous data of all clients under data silos. To this end, we first propose an auto-weighted multiple graph learning model to jointly learn a personalized graph for each local client and a single consensus graph for all clients. The personalized graphs match local data distributions, thereby mitigating data heterogeneity, while the consensus graph captures the global information. Moreover, the model can automatically assign appropriate contribution weights to local graphs based on their similarity to the consensus graph. We next devise a tailored algorithm to solve the induced problem, where all raw data are processed locally without leaving clients. Theoretically, we establish a provable estimation error bound and convergence analysis for the proposed model and algorithm. Finally, extensive experiments on synthetic and real data are carried out, and the results illustrate that our approach can learn graphs effectively in the target scenario.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.