Non-coding Regions of the Human Transcriptome; Incidence and Potential Relevance of Selected Sequence Insertions
MetadataShow full item record
The human genome is made up of vast and complex deoxyribonucleic acid (DNA) molecules whose function, complexity and intellectual beauty we are barely beginning to understand. It is composed of about 3 billion base pairs, each connected to the next by single covalent bonds. Of all these nucleotides, only ~2% code for the basic building-blocks of organic life, the protein molecules. Much of the remaining DNA is seemingly without any purpose. The main cause of the large size of the human genome are insertions and expansions of repetitive DNA sequences. There are many different types of such repetitive DNA but the retroposons stand out due to their ability to be transcribed, reverse transcribed and subsequently reinserted back into the genome in new and seemingly random locations. The genome therefore accumulates an ever increasing number of such retroposon copies and the most numerous of the human retroposons, the Alu repeat, has more than 1 million copies, comprising 10% of the human genome. Once thought to be “selfish” autonomous elements inserting themselves at random in areas where they cause only a minor “discomfort” to their surroundings, Alu repeats are now assigned ever more functions and potential functions. In recent years this has culminated in the realization that an, until recently, obscure ribonucleic acid (RNA) editing enzyme, the adenosine deaminase acting on RNA (ADAR), edits these elements in their thousands in precursor-mRNA (pre-mRNA) for as yet unknown reasons. Another interesting feature of the Alu repeats, and the one that makes them such attractive targets for ADARs, is the fact that most of the 1 million Alu repeats have significant base pairing potential towards other opposite-sense Alu repeats. They are also found frequently inside genes providing abundant potential for formation of secondary structures in otherwise single-stranded gene transcripts. In this report a number of approaches are described that ultimately aim to uncover potential functions and evolutionary significance of Alu repeats, and certain other types of repetitive DNA. Towards that end, a number of genes of clinical significance, which are distinguished by their high content of Alu repeats, are investigated for potential to form secondary structures that may influence their expression. These genes include the insulin receptor and the low density lipoprotein receptor. An analysis of some gene groups and gene families with respect to their Alu content is also provided in which a pattern seems to emerge whereby Alu repeats may be involved in tissue-specific expression of different subunit isoforms involved in the mitochondrial electron transport chain. The genes coding for the electron transport chain are also investigated for their ability to form pseudogenes, which is another type of repetitive DNA that may be of functional significance. A large survey of Alu repeat pairs across exons and immediately flanking exons is also presented. That survey confirms a number of features regarding Alu distribution in genes and in the human genome in general that have been documented by others in the past. Significantly, it also shows that there is a difference in the distribution of inverted and direct Alu repeat pairs when these are intervened by an exon. The difference favors the direct Alu repeat pairs over that of the inverted Alu repeat pairs. Such a difference may well be caused by the general ability of inverted Alu repeats to form secondary structures. Such structures are hypothesized to be disruptive to a gene, and thus be selected against. However, they could also represent a possible mechanism for regulation of gene expression and may therefore be of clinical importance. Significantly, the regions where these elements are found are not usually investigated in genes suspected of causing a genetic disorders, and they may therefore potentially explain some disorders where mutations in exons, splice sites or promoters can not be found. Despite the extensive bioinformatics searches performed in the course of the work it would be surprising if there were not many more potential functions still to be discovered.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Biomolecular and Physical Sciences
Item Access Status