題目: Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs
簡介: 盡管最初受到大腦解剖學的啟發,但在過去的幾年中,這些人工神經網絡已經從AlexNet中的簡單的八層體系結構演變為更加復雜的體系結構,由于機器學習的典型深層模型的層數眾多且缺少生物學上重要的連接,因此通常難以映射到大腦的解剖結構。在這里,我們證明,與大腦更好的解剖結構對齊以及在機器學習以及神經科學措施方面的高性能不一定是矛盾的。我們開發了CORnet-S,這是一種淺神經網絡,具有四個解剖學映射的區域和經常性連接,并在Brain-Score(一種新的大規模的神經和行為基準測試組合)的指導下進行了開發,用于量化靈長類動物腹側視覺流模型的功能保真度。盡管比大多數模型要淺得多,但CORnet-S是Brain-Score上的頂級模型,其性能優于ImageNet上類似模型。此外,我們對CORnet-S電路變體的廣泛分析表明,是Brain-Score和ImageNet top-1性能的主要預測因素。最后,我們報告了CORnet-S“ IT”的時間演變 神經種群類似于實際的猴子IT種群動態。綜上所述,這些結果使CORnet-S(一種緊湊的循環神經網絡)成為靈長類動物腹側視覺流的當前最佳模型。
題目: FEEDBACK RECURRENT AUTOENCODER
摘要: 在這項工作中,我們提出了一種新的遞歸自編碼器架構,稱為反饋遞歸自編碼器(FRAE),用于時序數據的在線壓縮。FRAE的重流結構是為了有效地提取沿時間維的冗余,并允許對學習的數據進行緊湊的離散表示。我們證明了它在語音譜圖壓縮中的有效性。具體來說,我們證明了FRAE與一個強大的神經聲碼器相結合,可以在低的固定比特率下產生高質量的語音波形。我們進一步證明,通過增加一個潛在空間的學習先驗和使用熵編碼器,我們可以實現更低的可變比特率。
題目: Individual differences among deep neural network models
摘要: 深度神經網絡(DNNs)擅長視覺識別任務,越來越多地被用作靈長類大腦神經計算的建模框架。然而,每個DNN實例,就像每個單獨的大腦一樣,都有一個獨特的連接性和代表性輪廓。在這里,我們研究DNN實例之間的個體差異,這些差異是由于只改變網絡權值的隨機初始化而產生的。利用表征相似性分析,我們證明了在訓練前初始條件的最小變化導致中高層網絡表征的顯著差異,盡管實現了難以區分的網絡級分類性能。我們將效應的起源定位在類別樣本的欠約束對齊中,而不是類別質心的不對中。此外,雖然網絡正則化可以提高學習表示的一致性,但仍然存在相當大的差異。這些結果表明,使用DNNs的計算神經科學家應該基于多個網絡實例而不是單個現成的網絡進行推理。
作者簡介: Johannes Mehrer,英國劍橋大學認知與腦科學部記憶感知小組研究生。等
Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. Instead of regularizing the entire 3D cost volume in one go, the proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes the 2D cost maps along the depth direction via the gated recurrent unit (GRU). This reduces dramatically the memory consumption and makes high-resolution reconstruction feasible. We first show the state-of-the-art performance achieved by the proposed R-MVSNet on the recent MVS benchmarks. Then, we further demonstrate the scalability of the proposed method on several large-scale scenarios, where previous learned approaches often fail due to the memory constraint. Code is available at //github.com/YoYo000/MVSNet.
We introduce Spatial-Temporal Memory Networks (STMN) for video object detection. At its core, we propose a novel Spatial-Temporal Memory module (STMM) as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables the integration of ImageNet pre-trained backbone CNN weights for both the feature stack as well as the prediction head, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. We compare our method to state-of-the-art detectors on ImageNet VID, and conduct ablative studies to dissect the contribution of our different design choices. We obtain state-of-the-art results with the VGG backbone, and competitive results with the ResNet backbone. To our knowledge, this is the first video object detector that is equipped with an explicit memory mechanism to model long-term temporal dynamics.