Many protocols in distributed computing rely on a source of randomness, usually called a random beacon, both for their applicability and security. This is especially true for proof-of-stake blockchain protocols in which the next miner or set of miners have to be chosen randomly and each party's likelihood to be selected is in proportion to their stake in the cryptocurrency. Current random beacons used in proof-of-stake protocols, such as Ouroboros and Algorand, have two fundamental limitations: Either (i)~they rely on pseudorandomness, e.g.~assuming that the output of a hash function is uniform, which is a widely-used but unproven assumption, or (ii)~they generate their randomness using a distributed protocol in which several participants are required to submit random numbers which are then used in the generation of a final random result. However, in this case, there is no guarantee that the numbers provided by the parties are uniformly random and there is no incentive for the parties to honestly generate uniform randomness. Most random beacons have both limitations. In this thesis, we provide a protocol for distributed generation of randomness. Our protocol does not rely on pseudorandomness at all. Similar to some of the previous approaches, it uses random inputs by different participants to generate a final random result. However, the crucial difference is that we provide a game-theoretic guarantee showing that it is in everyone's best interest to submit uniform random numbers. Hence, our approach is the first to incentivize honest behavior instead of just assuming it. Moreover, the approach is trustless and generates unbiased random numbers. It is also tamper-proof and no party can change the output or affect its distribution. Finally, it is designed with modularity in mind and can be easily plugged into existing distributed protocols such as proof-of-stake blockchains.
Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters. The solution can be obtained either as a closed-form solution, which includes a matrix inversion, or iteratively through gradient descent. Using the iterative approach opens up for changing the kernel during training, something that is investigated in this paper. We theoretically address the effects this has on model complexity and generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth but are known to appear for neural networks.
Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models.
Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the output of one layer to bypass one or more intermediate layers and be added to the output of a later layer. Their complex structure and the buffering required by the residual block make them difficult to implement on resource-constrained platforms. We present a novel design flow for implementing deep learning models for field programmable gate arrays optimized for ResNets, using a strategy to reduce their buffering overhead to obtain a resource-efficient implementation of the residual layer. Our high-level synthesis (HLS)-based flow encompasses a thorough set of design principles and optimization strategies, exploiting in novel ways standard techniques such as temporal reuse and loop merging to efficiently map ResNet models, and potentially other skip connection-based NN architectures, into FPGA. The models are quantized to 8-bit integers for both weights and activations, 16-bit for biases, and 32-bit for accumulations. The experimental results are obtained on the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves 2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971 FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning accuracy, throughput, and energy.
Submodular maximization under various constraints is a fundamental problem studied continuously, in both computer science and operations research, since the late $1970$'s. A central technique in this field is to approximately optimize the multilinear extension of the submodular objective, and then round the solution. The use of this technique requires a solver able to approximately maximize multilinear extensions. Following a long line of work, Buchbinder and Feldman (2019) described such a solver guaranteeing $0.385$-approximation for down-closed constraints, while Oveis Gharan and Vondr\'ak (2011) showed that no solver can guarantee better than $0.478$-approximation. In this paper, we present a solver guaranteeing $0.401$-approximation, which significantly reduces the gap between the best known solver and the inapproximability result. The design and analysis of our solver are based on a novel bound that we prove for DR-submodular functions. This bound improves over a previous bound due to Feldman et al. (2011) that is used by essentially all state-of-the-art results for constrained maximization of general submodular/DR-submodular functions. Hence, we believe that our new bound is likely to find many additional applications in related problems, and to be a key component for further improvement.
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models.
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are two-fold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.