Treewidth (tw) is an important parameter that, when bounded, yields tractability for many problems. For example, graph problems expressible in Monadic Second Order (MSO) logic and QUANTIFIED SAT or, more generally, QUANTIFIED CSP, are FPT parameterized by the tw of the input's (primal) graph plus the length of the MSO-formula [Courcelle, Information & Computation 1990] and the quantifier rank [Chen, ECAI 2004], resp. The algorithms from these (meta-)results have running times whose dependence on tw is a tower of exponents. A conditional lower bound by Fichte et al. [LICS 2020] shows that, for QUANTIFIED SAT, the height of this tower is equal to the number of quantifier alternations. Lower bounds showing that at least double-exponential factors in the running time are necessary are rare: there are very few (for tw and vertex cover vc parameterizations) and they are for problems that are complete for #NP, $\Sigma_2^p$, $\Pi_2^p$, or higher levels of the polynomial hierarchy. We show, for the first time, that it is not necessary to go higher up in the polynomial hierarchy to obtain such lower bounds. We design a novel, yet simple versatile technique based on Sperner families to obtain such lower bounds and apply it to 3 problems: METRIC DIMENSION, STRONG METRIC DIMENSION, and GEODETIC SET. We prove that they do not admit $2^{2^{o(tw)}} \cdot n^{O(1)}$-time algorithms, even on bounded diameter graphs, unless the ETH fails. For STRONG METRIC DIMENSION, the lower bound holds even for vc. This is impossible for the other two as they admit $2^{O({vc}^2)} \cdot n^{O(1)}$-time algorithms. We show that, unless the ETH fails, they do not admit $2^{o({vc}^2)}\cdot n^{O(1)}$-time algorithms, which are also rare results. The lower bounds for the vc parameterizations also yield rare lower bounds on the vertex-kernel sizes of these problems. We complement our lower bounds with matching upper bounds.
Side-channel analysis has been proven effective at detecting hardware Trojans in integrated circuits (ICs). However, most detection techniques rely on large external probes and antennas for data collection and require a long measurement time to detect Trojans. Such limitations make these techniques impractical for run-time deployment and ineffective in detecting small Trojans with subtle side-channel signatures. To overcome these challenges, we propose a Programmable Sensor Array (PSA) for run-time hardware Trojan detection, localization, and identification. PSA is a tampering-resilient integrated on-chip magnetic field sensor array that can be re-programmed to change the sensors' shape, size, and location. Using PSA, EM side-channel measurement results collected from sensors at different locations on an IC can be analyzed to localize and identify the Trojan. The PSA has better performance than conventional external magnetic probes and state-of-the-art on-chip single-coil magnetic field sensors. We fabricated an AES-128 test chip with four AES Hardware Trojans. They were successfully detected, located, and identified with the proposed on-chip PSA within 10 milliseconds using our proposed cross-domain analysis.
This study presents a comprehensive overview of PIML techniques in the context of condition monitoring. The central concept driving PIML is the incorporation of known physical laws and constraints into machine learning algorithms, enabling them to learn from available data while remaining consistent with physical principles. Through fusing domain knowledge with data-driven learning, PIML methods offer enhanced accuracy and interpretability in comparison to purely data-driven approaches. In this comprehensive survey, detailed examinations are performed with regard to the methodology by which known physical principles are integrated within machine learning frameworks, as well as their suitability for specific tasks within condition monitoring. Incorporation of physical knowledge into the ML model may be realized in a variety of methods, with each having its unique advantages and drawbacks. The distinct advantages and limitations of each methodology for the integration of physics within data-driven models are detailed, considering factors such as computational efficiency, model interpretability, and generalizability to different systems in condition monitoring and fault detection. Several case studies and works of literature utilizing this emerging concept are presented to demonstrate the efficacy of PIML in condition monitoring applications. From the literature reviewed, the versatility and potential of PIML in condition monitoring may be demonstrated. Novel PIML methods offer an innovative solution for addressing the complexities of condition monitoring and associated challenges. This comprehensive survey helps form the foundation for future work in the field. As the technology continues to advance, PIML is expected to play a crucial role in enhancing maintenance strategies, system reliability, and overall operational efficiency in engineering systems.
Mapping speech tokens to the same feature space as text tokens has become the paradigm for the integration of speech modality into decoder-only large language models (LLMs). An alternative approach is to use an encoder-decoder architecture that incorporates speech features through cross-attention. This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks. We evaluate them not only by conventional metrics like the F1 score but also by a novel fine-grained taxonomy of ASR-NER errors. Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context, while decoder-only architecture benefits from a long context as it fully exploits all layers of the LLM. By using LLM, we significantly reduced the entity omission errors and improved the entity ASR accuracy compared to the Conformer baseline. Additionally, we obtained a state-of-the-art (SOTA) F1 score of 0.805 on the AISHELL-NER test set by using chain-of-thought (CoT) NER which first infers long-form ASR transcriptions and then predicts NER labels.
Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
Cell-free massive multi-input multi-output (MIMO) has recently gained much attention for its potential in shaping the landscape of sixth-generation (6G) wireless systems. This paper proposes a hierarchical network architecture tailored for cell-free massive MIMO, seamlessly integrating co-located and distributed antennas. A central base station (CBS), equipped with an antenna array, positions itself near the center of the coverage area, complemented by distributed access points spanning the periphery. The proposed architecture remarkably outperforms conventional cell-free networks, demonstrating superior sum throughput while maintaining a comparable worst-case per-user spectral efficiency. Meanwhile, the implementation cost associated with the fronthaul network is substantially diminished.
Large language models (LLMs) with enormous pre-training tokens and parameters emerge diverse abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). While the open-source community has explored ad-hoc SFT for enhancing individual capabilities, proprietary LLMs exhibit versatility across various skills. Therefore, understanding the facilitation of multiple abilities via SFT is paramount. In this study, we specifically focuses on the interplay of data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies. Our experiments reveal that distinct capabilities scale differently and larger models generally show superior performance with same amount of data. Mathematical reasoning and code generation consistently improve with increasing data amount, whereas general abilities plateau after roughly a thousand samples. Moreover, we observe data composition appears to enhance various abilities under limited data conditions, yet can lead to performance conflicts when data is plentiful. Our findings also suggest the amount of composition data influences performance more than the composition ratio. In analysis of SFT strategies, we find that sequentially learning multiple skills risks catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy offers a promising solution to learn multiple abilities with different scaling patterns.
Delay and Doppler ambiguities of comb reference signal patterns are investigated through time delay and Doppler shift detection using high-resolution sensing algorithms. Necessary conditions of designing comb RS patterns and synthesizing different reference signal patterns in general are derived under the goal of eliminating side peaks and preserving the best achievable ambiguity performance of OFDM signals for target detection.
We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.