Authors: D. Davison
Affilation: Bristol-Myers Squibb, Pharmaceutical Research Institute, United States
Pages: 6 - 11
Keywords: EST clustering, number of human genes, genomics
The definitive prototype for nanotechnology is the cell. Its many machines and exquisitely controlled internal and external movements are a reference for all researchers working in the field. The complete instructions for every molecular machine in a cell is specified in its DNA. The interactions of those parts are emergent properties of the individual components (RNAs, fats, sugars, and proteins). At present, there is considerable controversy among biologists regarding the number of human genes and proteins. In part, the differences stem from differences in definition. In this presentation we will define a gene as a transcription unit. Each transcription unit may have zero to many splice forms (known as “alternative splices”). While there are several methods for gene prediction, all involving computational tools, none agree. Estimates range from 20,000 to 120,000. One way to approach this question, involving both computation and experiment, is to look at copies of fragments of messenger RNA (mRNA), called expressed sequence tags (ESTs). mRNA comes only from a gene being expressed by a cell or tissue. By clustering mRNA fragments, we can try to reconstruct the expressed gene. The final result is a very rough representation of the ‘true expressed transcript’. Our results consistently demonstrate that there are some 70,000 transcription units with an average of 1.2 different transcripts per transcription unit. Thus, we estimate the total number of human genes at about 85,000. Post-translational modification will make the total number of proteins be much higher.