Structural Improvements of Convolutional Neural Networks

Thumbnail Image
File version
Primary Supervisor
Gao, Yongsheng
Other Supervisors
Zhou, Jun
File type(s)

Over the last decade, deep learning has demonstrated outstanding performance in almost every application domain. Among different types of deep frameworks, convolutional neural networks (CNNs), inspired by the biological process of the visual system, can learn to extract discriminative features from raw inputs without any prior manipulation. However, efficient information circulation and the ability to explore effective new features are still two key and challenging factors for a successful deep neural network. In this thesis, we aim at presenting novel structural improvements of the CNN frameworks to enhance their effectiveness and efficiency of feature exploring and exploiting capability. To this end, first, we propose a novel residual-dense lattice network (RDL-Net), a 2-dimensional triangular lattice of convolutional units connected using residual and dense connections. RDL-Net effectively harnesses the advantages of both residual and dense aggregations without over-allocating parameters for feature re-usage. This property improves the network’s capacity to effectively and yet efficiently extract and exploit features. Furthermore, our extensive experimental investigation in processing 1D sequential speech signals shows that RDL-Nets can achieve a higher speech enhancement performance than many state-of-the-art CNN-based speech enhancement approaches. Further, we modify RDL topology to be applicable for the spatial (2D) signals. Hence, inspired by RDL-Nets innovation, we present an attention-based pyramid dilated lattice network (APDL-Net) for blind image denoising. The proposed framework employs a novel pyramid dilated convolution strategy alongside a channel-wise attention mechanism to effectively capture contextual information corresponding to different noise levels through the training of a single model. The extensive empirical studies in image denoising and JPEG artifacts suppression tasks verify the effectiveness and efficiency of the APDL architecture. We also investigate the capability of the lattice topology for hyperspectral image classification. For this purpose, we introduce a new attention-based lattice network (ALN) empowered by a unique joint spectral-spatial attention mechanism to capture spectral and spatial information effectively. The proposed ALN achieves superior accuracy and computational efficiency against state-of-the-art deep learning benchmark approaches for hyperspectral image classification. In addition to the above architectural improvements of CNNs, inspired by geographical analysis, we propose a novel channel-wise spatially autocorrelated (CSA) attention mechanism. The proposed CSA exploits the spatial relationships between feature maps channels. It also employs a unique hybrid spatial contiguity measure based on directional metrics to measure the degree of spatial closeness between feature maps effectively. Furthermore, imposing negligible learning parameters and light computational overhead to the deep model, making CSA a powerful yet efficient attention module of choice. The experimental results on large scale image classification and object detection datasets demonstrate that CSA-Nets can consistently achieve superior performance than different state-of-the-art attention-based CNNs. Besides the above architectural and attention-based advances, this research presents a simple and novel feature pooling method as gradient-based pooling (GP). This method considers the spatial gradient of the pixels within a pooling region as a key to pick the possible discriminative information. In contrast, other common pooling methods mostly rely on pixel values. The superiority of the GP over other pooling methods is proved through experiments on different benchmark image classification tasks.

Journal Title
Conference Title
Book Title
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School of Eng & Built Env
Publisher link
Patent number
Grant identifier(s)
Rights Statement
The author owns the copyright in this thesis, unless stated otherwise.
Rights Statement
Item Access Status
Access the data
Related item(s)
channel-wise spatially autocorrelated (CSA)
hybrid spatial contiguity
gradient-based pooling (GP)
residual-dense lattice network (RDL-Net)
state-of-the-art CNN-based
Persistent link to this record