We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $\epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.
Interactive coding allows two parties to conduct a distributed computation despite noise corrupting a certain fraction of their communication. Dani et al.\@ (Inf.\@ and Comp., 2018) suggested a novel setting in which the amount of noise is unbounded and can significantly exceed the length of the (noise-free) computation. While no solution is possible in the worst case, under the restriction of oblivious noise, Dani et al.\@ designed a coding scheme that succeeds with a polynomially small failure probability. We revisit the question of conducting computations under this harsh type of noise and devise a computationally-efficient coding scheme that guarantees the success of the computation, except with an exponentially small probability. This higher degree of correctness matches the case of coding schemes with a bounded fraction of noise. Our simulation of an $N$-bit noise-free computation in the presence of $T$ corruptions, communicates an optimal number of $O(N+T)$ bits and succeeds with probability $1-2^{-\Omega(N)}$. We design this coding scheme by introducing an intermediary noise model, where an oblivious adversary can choose the locations of corruptions in a worst-case manner, but the effect of each corruption is random: the noise either flips the transmission with some probability or otherwise erases it. This randomized abstraction turns out to be instrumental in achieving an optimal coding scheme.
Anomaly synthesis strategies can effectively enhance unsupervised anomaly detection. However, existing strategies have limitations in the coverage and controllability of anomaly synthesis, particularly for weak defects that are very similar to normal regions. In this paper, we propose Global and Local Anomaly co-Synthesis Strategy (GLASS), a novel unified framework designed to synthesize a broader coverage of anomalies under the manifold and hypersphere distribution constraints of Global Anomaly Synthesis (GAS) at the feature level and Local Anomaly Synthesis (LAS) at the image level. Our method synthesizes near-in-distribution anomalies in a controllable way using Gaussian noise guided by gradient ascent and truncated projection. GLASS achieves state-of-the-art results on the MVTec AD (detection AUROC of 99.9\%), VisA, and MPDD datasets and excels in weak defect detection. The effectiveness and efficiency have been further validated in industrial applications for woven fabric defect detection. The code and dataset are available at: \url{//github.com/cqylunlun/GLASS}.
We introduce a practical sign-dependent sequence selection metric for probabilistic amplitude shaping and propose a simple method to predict the gains in signal-to-noise ratio (SNR) for sequence selection. The proposed metric provides a $0.5$ dB SNR gain for single-polarized 256-QAM transmission over a long-haul fiber link.
Nonlinear activation functions are pivotal to the success of deep neural nets, and choosing the appropriate activation function can significantly affect their performance. Most networks use fixed activation functions (e.g., ReLU, GELU, etc.), and this choice might limit their expressiveness. Furthermore, different layers may benefit from diverse activation functions. Consequently, there has been a growing interest in trainable activation functions. In this paper, we introduce DiTAC, a trainable highly-expressive activation function based on an efficient diffeomorphic transformation (called CPAB). Despite introducing only a negligible number of trainable parameters, DiTAC enhances model expressiveness and performance, often yielding substantial improvements. It also outperforms existing activation functions (regardless whether the latter are fixed or trainable) in tasks such as semantic segmentation, image generation, regression problems, and image classification. Our code is available at //github.com/BGU-CS-VIL/DiTAC.
Determining the approximate degree composition for Boolean functions remains a significant unsolved problem in Boolean function complexity. In recent decades, researchers have concentrated on proving that approximate degree composes for special types of inner and outer functions. An important and extensively studied class of functions are the recursive functions, i.e.~functions obtained by composing a base function with itself a number of times. Let $h^d$ denote the standard $d$-fold composition of the base function $h$. The main result of this work is to show that the approximate degree composes if either of the following conditions holds: \begin{itemize} \item The outer function $f:\{0,1\}^n\to \{0,1\}$ is a recursive function of the form $h^d$, with $h$ being any base function and $d= \Omega(\log\log n)$. \item The inner function is a recursive function of the form $h^d$, with $h$ being any constant arity base function (other than AND and OR) and $d= \Omega(\log\log n)$, where $n$ is the arity of the outer function. \end{itemize} In terms of proof techniques, we first observe that the lower bound for composition can be obtained by introducing majority in between the inner and the outer functions. We then show that majority can be \emph{efficiently eliminated} if the inner or outer function is a recursive function.
We investigate the tractability of a simple fusion of two fundamental structures on graphs, a spanning tree and a perfect matching. Specifically, we consider the following problem: given an edge-weighted graph, find a minimum-weight spanning tree among those containing a perfect matching. On the positive side, we design a simple greedy algorithm for the case when the graph is complete (or complete bipartite) and the edge weights take at most two values. On the negative side, the problem is NP-hard even when the graph is complete (or complete bipartite) and the edge weights take at most three values, or when the graph is cubic, planar, and bipartite and the edge weights take at most two values. We also consider an interesting variant. We call a tree strongly balanced if on one side of the bipartition of the vertex set with respect to the tree, all but one of the vertices have degree $2$ and the remaining one is a leaf. This property is a sufficient condition for a tree to have a perfect matching, which enjoys an additional property. When the underlying graph is bipartite, strongly balanced spanning trees can be written as matroid intersection, and this fact was recently utilized to design an approximation algorithm for some kind of connectivity augmentation problem. The natural question is its tractability in nonbipartite graphs. As a negative answer, it turns out NP-hard to test whether a given graph has a strongly balanced spanning tree or not even when the graph is subcubic and planar.
Brain decoding that classifies cognitive states using the functional fluctuations of the brain can provide insightful information for understanding the brain mechanisms of cognitive functions. Among the common procedures of decoding the brain cognitive states with functional magnetic resonance imaging (fMRI), extracting the time series of each brain region after brain parcellation traditionally averages across the voxels within a brain region. This neglects the spatial information among the voxels and the requirement of extracting information for the downstream tasks. In this study, we propose to use a fully connected neural network that is jointly trained with the brain decoder to perform an adaptively weighted average across the voxels within each brain region. We perform extensive evaluations by cognitive state decoding, manifold learning, and interpretability analysis on the Human Connectome Project (HCP) dataset. The performance comparison of the cognitive state decoding presents an accuracy increase of up to 5\% and stable accuracy improvement under different time window sizes, resampling sizes, and training data sizes. The results of manifold learning show that our method presents a considerable separability among cognitive states and basically excludes subject-specific information. The interpretability analysis shows that our method can identify reasonable brain regions corresponding to each cognitive state. Our study would aid the improvement of the basic pipeline of fMRI processing.
Conventional entity typing approaches are based on independent classification paradigms, which make them difficult to recognize inter-dependent, long-tailed and fine-grained entity types. In this paper, we argue that the implicitly entailed extrinsic and intrinsic dependencies between labels can provide critical knowledge to tackle the above challenges. To this end, we propose \emph{Label Reasoning Network(LRN)}, which sequentially reasons fine-grained entity labels by discovering and exploiting label dependencies knowledge entailed in the data. Specifically, LRN utilizes an auto-regressive network to conduct deductive reasoning and a bipartite attribute graph to conduct inductive reasoning between labels, which can effectively model, learn and reason complex label dependencies in a sequence-to-set, end-to-end manner. Experiments show that LRN achieves the state-of-the-art performance on standard ultra fine-grained entity typing benchmarks, and can also resolve the long tail label problem effectively.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Intent classification and slot filling are two essential tasks for natural language understanding. They often suffer from small-scale human-labeled training data, resulting in poor generalization capability, especially for rare words. Recently a new language representation model, BERT (Bidirectional Encoder Representations from Transformers), facilitates pre-training deep bidirectional representations on large-scale unlabeled corpora, and has created state-of-the-art models for a wide variety of natural language processing tasks after simple fine-tuning. However, there has not been much effort on exploring BERT for natural language understanding. In this work, we propose a joint intent classification and slot filling model based on BERT. Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on several public benchmark datasets, compared to the attention-based recurrent neural network models and slot-gated models.