How is one to determine how many genes in the human genome?
I don’t have the articles to cite, but I remember an issue of “Nature Genetics” (around July 2000) where there was one article saying there were only 35,000 genes and the on the next page was an article saying there was 120,000 genes.
The two articles obviously used different methods to come to their conclusions. But which method is/was more accurate and what’s the latest estimate?
The number I have seen most frequently in casual discussion was 30,000. Up until the very last couple of years of the GHP, there were continued estimates that they would find up to 140,000 base pairs of genes, but that number was significantly reduced when the actual studies began to be released.
I wouldn’t say that the question changes daily - it was anticipated to be one thing, and now its more or less known to be another. 30 000 is generally what is accepted to be the range of genes in the human genome. If you want a hard number, like 31 219 or something, that isn’t going to happen. Every genome sequence will be different by a few genes - some exist in multiples in some people, and less in others, etc. Besides, not all genes have been “discovered” yet. There is a lot of “junk” DNA which is constantly being founf to actually take part in making protein variant X as opposed to Y, where that "junk"gets removed - stuff like that.
Actually, the answer does not change on a daily basis. There were estimates prior to the first publication of the HGP results that varied from the high teen thousands (not many supporters) to the mid hundred thousands (several speculative supporters). However, upon the first publication of the results of the HGP, everyone realized that the HGP team had done their work well and that the number of genes was in the 30,000 range. That has not been in dispute since the initial publications of the HGP team.
Note that this government site on the Human Genome Project was updated as recently as September 11, 2002 and it lists as a goal
The site to which I originally linked was updated at least as recently as February 2002. Among the comments on the introduction page was this paragraph:
Just as a side note, a biophysicist by the name of Andras Pellionisz has suggested that the ‘junk DNA’ in cells (DNA outside genes) isn’t really junk after all. The news story is here. The Slashdot thread that mentions it says that genes only occupy 2-3% of the human genome, and that deleting the ‘junk’ is lethal.
So maybe those 30,000 genes aren’t all there is to consider.
I really hate the term “junk DNA”. That biophysicist hasn’t discovered anything fundamentally new. It’s been generally well-known for quite a long time within the genetics and genomics communities that this so-called junk is indeed functional - it just doesn’t code for any actual protein sequence. For instance, the telomeres, the ends of the chromosomes, consist of highly-repetitive DNA sequence that doesn’t code for anything. Yet they play an important role in maintaining chromosome stability. Similarly, the centromere does not code for anything, but is crucial in controlling the correct segregation of sister chromatids during cell division. There is a sequence before the actual start of a gene called the promoter region that directly controls whether that gene is expressed, though the region itself codes for nothing.
As to the question of whether a region is coding or non-coding, we’ve sequenced enough DNA, both genic and non-genic, to be able to make pretty good guesses as to the composition of some random DNA fragment. Good enough that we can program a computer to do the predictions, such as the NCBI ORF Finder. At the simplest level, we can guess that anything between an initiation codon (ATG) and a termination codon (one of TAA, TAG, or TGA) may be a gene. More sophisticated algorithms will take into account the distinctive characteristics of promoter regions, intron-exon boundaries, etc., to make their predictions.