Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia

View/ Open
File version
Version of Record (VoR)
Author(s)
Carninci, Piero
Waki, Kazunori
Shiraki, Toshiyuki
Konno, Hideaki
Shibata, Kazuhiro
Itoh, Masayoshi
Aizawa, Katsunori
Arakawa, Takahiro
Ishii, Yoshiyuki
Sasaki, Daisuke
Bono, Hidemasa
Kondo, Shinji
Sugahara, Yuichi
Saito, Rintaro
Osato, Naoki
Fukuda, Shiro
Sato, Kenjiro
Watahiki, Akira
Hirozane-Kishikawa, Tomoko
Nakamura, Mari
Shibata, Yuko
Yasunishi, Ayako
Kikuchi, Noriko
Yoshiki, Atsushi
Kusakabe, Moriaki
Gustincich, Stefano
Beisel, Kirk
Pavan, William
Aidinis, Vassilis
Nakagawara, Akira
Held, William A.
Iwata, Hiroo
Kono, Tomohiro
Nakauchi, Hiromitsu
Lyons, Paul
Wells, C.
Hume, David A.
Fagiolini, Michela
et al.
Griffith University Author(s)
Year published
2003
Metadata
Show full item recordAbstract
We report the construction of the mouse full-length cDNA encyclopedia,th e most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units ...
View more >We report the construction of the mouse full-length cDNA encyclopedia,th e most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,an d the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
View less >
View more >We report the construction of the mouse full-length cDNA encyclopedia,th e most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,an d the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
View less >
Journal Title
Genome Research
Volume
13
Issue
6b
Copyright Statement
© 2003 Cold Spring Harbor Laboratory Press. The attached file is reproduced here in accordance with the copyright policy of the publisher. Please refer to the journal's website for access to the definitive, published version.
Subject
Genetics not elsewhere classified
Biological Sciences
Medical and Health Sciences