Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs
File version
Author(s)
Kasukawa, Takeya
Oyama, Rieko
Gough, Julian
Frith, Martin
Engstrom, Par
Lenhard, Boris
Aturaliya, Rajith
Batalov, Serge
Beisel, Kirk W
Bult, Carol J
Fletcher, Colin F
Forrest, Alistair Raymond Russell
Furuno, Masaaki
Hill, David
Itoh, Masayoshi
Kanamori-Katayama, Mutsumi
Katayama, Shintaro
Katoh, Masaru
Kawashima, Tsugumi
Quackenbush, John
Ravasi, Timothy
Ring, Brian Z
Shibata, Kazuhiro
Sugiura, Koji
Takenaka, Yoichi
Teasdale, Rohan D.
Wells, Christine A.
Zhu, Yunxia
Kai, Chikatoshi
Kawai, Jun
Hume, David A.
Carninci, Piero
Hayashizaki, Yoshihide
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
237534 bytes
File type(s)
application/pdf
Location
License
Abstract
The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
Journal Title
PLoS Genetics
Conference Title
Book Title
Edition
Volume
2
Issue
4
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Copyright 2006 Maeda et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License CCAL. (http://www.plos.org/journals/license.html)
Item Access Status
Note
Access the data
Related item(s)
Subject
Genetics