The Collection of Whole Transcriptome and its Application to Medical Research

Yoshihide Hayashizaki

We have been working to establish a comprehensive mouse full-length cDNA collection and sequence database named Riken mouse genome encyclopedia to cover as many genes as we can. Recently we have been constructing higher level annotation (Functional Annotation of Mouse cDNA, FANTOM) with not only homology search-based annotation but also with an expression data profile, mapping information, and a protein-protein database. More than 1,000,000 clones prepared from 163 tissues were end sequenced to classify into 128,000 clusters, and 60,000 representative clones were fully sequenced. As a result, the 60,000 sequences contained 35,000 unique sequences with more than 24,000 clear protein-encoding genes.

The next generation of life sciences is clearly based on all of the genome information and resources. On the basis of our cDNA clones we developed the additional system to explore gene function. We developed a cDNA microarray system to print all of these cDNA clones, a protein-protein interaction screening system, a protein-DNA interaction screening system, and so on. The integrated database of all the information is very useful for analysis of the gene transcriptional network and for the connection of gene to phenotype to facilitate positional candidate approach. In this talk, the prospect of the application of these genome resources shall be discussed.