This paper proposes a distributed algorithm for average consensus in a multi-agent system under a fixed bidirectional communication topology, in the presence of malicious agents (nodes) that may try to influence the average consensus outcome by manipulating their updates. The proposed algorithm converges asymptotically to the average of the initial values of the non-malicious nodes, which we refer to as the trustworthy average, as long as the underlying topology that describes the information exchange among the non-malicious nodes is connected. We first present a distributed iterative algorithm that assumes that each node receives (at each iteration or periodically) side information about the trustworthiness of the other nodes, and it uses such trust assessments to determine whether or not to incorporate messages received from its neighbors, as well as to make proper adjustments in its calculation depending on whether a previously trustworthy neighbor becomes untrustworthy or vice-versa. We show that, as long as the trust assessments for each non-malicious node eventually reflect correctly the status (malicious or non-malicious) of its neighboring nodes, the algorithm guarantees asymptotic convergence to the trustworthy average. We subsequently discuss how the proposed algorithm can be enhanced with functionality that enables each node to obtain trust assessments about its neighbors by utilizing information that it receives from its two-hop neighbors at infrequent, perhaps randomly chosen, time instants.
We introduce two new stochastic conjugate frameworks for a class of nonconvex and possibly also nonsmooth optimization problems. These frameworks are built upon Stochastic Recursive Gradient Algorithm (SARAH) and we thus refer to them as Acc-Prox-CG-SARAH and Acc-Prox-CG-SARAH-RS, respectively. They are efficiently accelerated, easy to implement, tune free and can be smoothly extended and modified. We devise a deterministic restart scheme for stochastic optimization and apply it in our second stochastic conjugate framework, which serves the key difference between the two approaches. In addition, we apply the ProbAbilistic Gradient Estimator (PAGE) and further develop a practical variant, denoted as Acc-Prox-CG-SARAH-ST, in order to reduce potential computational overhead. We provide comprehensive and rigorous convergence analysis for all three approaches and establish linear convergence rates for unconstrained minimization problem with nonconvex and nonsmooth objective functions. Experiments have demonstrated that Acc-Prox-CG-SARAH and Acc-Prox-CG-SARAH-RS both outperform state-of-art methods consistently and Acc-Prox-CG-SARAH-ST can as well achieve comparable convergence speed. In terms of theory and experiments, we verify the strong computational efficiency of the deterministic restart scheme in stochastic optimization methods.
This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.
This paper proposes a spatially common sparsity based adaptive channel estimation and feedback scheme for frequency division duplex based massive multi-input multi-output (MIMO) systems, which adapts training overhead and pilot design to reliably estimate and feed back the downlink channel state information (CSI) with significantly reduced overhead. Specifically, a non-orthogonal downlink pilot design is first proposed, which is very different from standard orthogonal pilots. By exploiting the spatially common sparsity of massive MIMO channels, a compressive sensing (CS) based adaptive CSI acquisition scheme is proposed, where the consumed time slot overhead only adaptively depends on the sparsity level of the channels. Additionally, a distributed sparsity adaptive matching pursuit algorithm is proposed to jointly estimate the channels of multiple subcarriers. Furthermore, by exploiting the temporal channel correlation, a closed-loop channel tracking scheme is provided, which adaptively designs the non-orthogonal pilot according to the previous channel estimation to achieve an enhanced CSI acquisition. Finally, we generalize the results of the multiple-measurement-vectors case in CS and derive the Cramer-Rao lower bound of the proposed scheme, which enlightens us to design the non-orthogonal pilot signals for the improved performance. Simulation results demonstrate that the proposed scheme outperforms its counterparts, and it is capable of approaching the performance bound.
This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
A generative AI model can generate extremely realistic-looking content, posing growing challenges to the authenticity of information. To address the challenges, watermark has been leveraged to detect AI-generated content. Specifically, a watermark is embedded into an AI-generated content before it is released. A content is detected as AI-generated if a similar watermark can be decoded from it. In this work, we perform a systematic study on the robustness of such watermark-based AI-generated content detection. We focus on AI-generated images. Our work shows that an attacker can post-process a watermarked image via adding a small, human-imperceptible perturbation to it, such that the post-processed image evades detection while maintaining its visual quality. We show the effectiveness of our attack both theoretically and empirically. Moreover, to evade detection, our adversarial post-processing method adds much smaller perturbations to AI-generated images and thus better maintain their visual quality than existing popular post-processing methods such as JPEG compression, Gaussian blur, and Brightness/Contrast. Our work shows the insufficiency of existing watermark-based detection of AI-generated content, highlighting the urgent needs of new methods. Our code is publicly available: //github.com/zhengyuan-jiang/WEvade.
The paper considers simultaneous nonparametric inference for a wide class of M-regression models with time-varying coefficients. The covariates and errors of the regression model are tackled as a general class of piece-wise locally stationary time series and are allowed to be cross-dependent. We introduce an integration technique to study the M-estimators, whose limiting properties are disclosed using Bahadur representation and Gaussian approximation theory. Facilitated by a self-convolved bootstrap proposed in this paper, we introduce a unified framework to conduct general classes of Exact Function Tests, Lack-of-fit Tests, and Qualitative Tests for the time-varying coefficient M-regression under complex temporal dynamics. As an application, our method is applied to studying the anthropogenic warming trend and time-varying structures of the ENSO effect using global climate data from 1882 to 2005.
This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ where $\rho\in(0,1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes. For the decide-then-observe regime, we show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ with high probability. We test the numerical efficiency of our method for a variant of the resource-constrained inventory management problem.
This paper presents a novel approach to enhance the performance of binary code comment quality classification models through the application of Generative Artificial Intelligence (AI). By leveraging the OpenAI API, a dataset comprising 1239 newly generated code-comment pairs, extracted from various GitHub repositories and open-source projects, has been labelled as "Useful" or "Not Useful", and integrated into the existing corpus of 9048 pairs in the C programming language. Employing a cutting-edge Large Language Model Architecture, the generated dataset demonstrates notable improvements in model accuracy. Specifically, when incorporated into the Support Vector Machine (SVM) model, a 6% increase in precision is observed, rising from 0.79 to 0.85. Additionally, the Artificial Neural Network (ANN) model exhibits a 1.5% increase in recall, climbing from 0.731 to 0.746. This paper sheds light on the potential of Generative AI in augmenting code comment quality classification models. The results affirm the effectiveness of this methodology, indicating its applicability in broader contexts within software development and quality assurance domains. The findings underscore the significance of integrating generative techniques to advance the accuracy and efficacy of machine learning models in practical software engineering scenarios.
This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various structures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.