The automated interpretation and inversion of seismic data have advanced significantly with the development of Deep Learning (DL) methods. However, these methods often require numerous costly well logs, limiting their application only to mature or synthetic data. This paper presents ContrasInver, a method that achieves seismic inversion using as few as two or three well logs, significantly reducing current requirements. In ContrasInver, we propose three key innovations to address the challenges of applying semi-supervised learning to regression tasks with ultra-sparse labels. The Multi-dimensional Sample Generation (MSG) technique pioneers a paradigm for sample generation in multi-dimensional inversion. It produces a large number of diverse samples from a single well, while establishing lateral continuity in seismic data. MSG yields substantial improvements over current techniques, even without the use of semi-supervised learning. The Region-Growing Training (RGT) strategy leverages the inherent continuity of seismic data, effectively propagating accuracy from closer to more distant regions based on the proximity of well logs. The Impedance Vectorization Projection (IVP) vectorizes impedance values and performs semi-supervised learning in a compressed space. We demonstrated that the Jacobian matrix derived from this space can filter out some outlier components in pseudo-label vectors, thereby solving the value confusion issue in semi-supervised regression learning. In the experiments, ContrasInver achieved state-of-the-art performance in the synthetic data SEAM I. In the field data with two or three well logs, only the methods based on the components proposed in this paper were able to achieve reasonable results. It's the first data-driven approach yielding reliable results on the Netherlands F3 and Delft, using only three and two well logs respectively.
Reinterpretable cameras are defined by their post-processing capabilities that exceed traditional imaging. We present "SoDaCam" that provides reinterpretable cameras at the granularity of photons, from photon-cubes acquired by single-photon devices. Photon-cubes represent the spatio-temporal detections of photons as a sequence of binary frames, at frame-rates as high as 100 kHz. We show that simple transformations of the photon-cube, or photon-cube projections, provide the functionality of numerous imaging systems including: exposure bracketing, flutter shutter cameras, video compressive systems, event cameras, and even cameras that move during exposure. Our photon-cube projections offer the flexibility of being software-defined constructs that are only limited by what is computable, and shot-noise. We exploit this flexibility to provide new capabilities for the emulated cameras. As an added benefit, our projections provide camera-dependent compression of photon-cubes, which we demonstrate using an implementation of our projections on a novel compute architecture that is designed for single-photon imaging.
We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes.
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in EEG signal processing. Transformer models can capture the global dependencies in EEG signals through a self-attention mechanism, while CNN models can capture local features such as sawtooth waves. In this work, we propose an end-to-end neural epilepsy detection model, EENED, that combines CNN and Transformer. Specifically, by introducing the convolution module into the Transformer encoder, EENED can learn the time-dependent relationship of the patient's EEG signal features and notice local EEG abnormal mutations closely related to epilepsy, such as the appearance of spikes and the sprinkling of sharp and slow waves. Our proposed framework combines the ability of Transformer and CNN to capture different scale features of EEG signals and holds promise for improving the accuracy and reliability of epilepsy detection. Our source code will be released soon on GitHub.
Data augmentation (DA) is a key factor in medical image analysis, such as in prostate cancer (PCa) detection on magnetic resonance images. State-of-the-art computer-aided diagnosis systems still rely on simplistic spatial transformations to preserve the pathological label post transformation. However, such augmentations do not substantially increase the organ as well as tumor shape variability in the training set, limiting the model's ability to generalize to unseen cases with more diverse localized soft-tissue deformations. We propose a new anatomy-informed transformation that leverages information from adjacent organs to simulate typical physiological deformations of the prostate and generates unique lesion shapes without altering their label. Due to its lightweight computational requirements, it can be easily integrated into common DA frameworks. We demonstrate the effectiveness of our augmentation on a dataset of 774 biopsy-confirmed examinations, by evaluating a state-of-the-art method for PCa detection with different augmentation settings.
Inspired by the remarkable success of Latent Diffusion Models (LDMs) for image synthesis, we study LDM for text-to-video generation, which is a formidable challenge due to the computational and memory constraints during both model training and inference. A single LDM is usually only capable of generating a very limited number of video frames. Some existing works focus on separate prediction models for generating more video frames, which suffer from additional training cost and frame-level jittering, however. In this paper, we propose a framework called "Reuse and Diffuse" dubbed $\textit{VidRD}$ to produce more frames following the frames already generated by an LDM. Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process. Besides, for the autoencoder used for translation between pixel space and latent space, we inject temporal layers into its decoder and fine-tune these layers for higher temporal consistency. We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets including video datasets for action recognition and image-text datasets. Extensive experiments show that our method achieves good results in both quantitative and qualitative evaluations. Our project page is available $\href{//anonymous0x233.github.io/ReuseAndDiffuse/}{here}$.
We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.
Surjectivity and injectivity are the most fundamental problems in cellular automata (CA). We simplify and modify Amoroso's algorithm into optimum and make it compatible with fixed, periodic and reflective boundaries. A new algorithm (injectivity tree algorithm) for injectivity is also proposed. After our theoretic analysis and experiments, our algorithm for injectivity can save much space and 90\% or even more time compared with Amoroso's algorithm for injectivity so that it can support the decision of CA with larger neighborhood sizes. At last, we prove that the reversibility with the periodic boundary and global injectivity of one-dimensional CA is equivalent.
In backscatter communication (BC), a passive tag transmits information by just affecting an external electromagnetic field through load modulation. Thereby, the feed current of the excited tag antenna is modulated by adapting the passive termination load. This paper studies the achievable information rates with a freely adaptable passive load. As a prerequisite, we unify monostatic, bistatic, and ambient BC with circuit-based system modeling. We present the crucial insight that channel capacity is described by existing results on peak-power-limited quadrature Gaussian channels, because the steady-state tag current phasor lies on a disk. Consequently, we derive the channel capacity for the case of an unmodulated external field, for general passive, purely reactive, or purely resistive tag loads. We find that modulating both resistance and reactance is important for very high rates. We discuss the capacity-achieving load statistics, rate asymptotics, technical conclusions, and rate losses from value-range-constrained loads (which are found to be small for moderate constraints). We then demonstrate that near-capacity rates can be attained by more practical schemes: (i) amplitude-and-phase-shift keying on the reflection coefficient and (ii) simple load circuits of a few switched resistors and capacitors. Finally, we draw conclusions for the ambient BC channel capacity in important special cases.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).