Human genome

The human genome is the of , which is composed of 23 distinct pairs of  (22  +  + ) with a total of approximately 3 billion  s containing an estimated 20,000–25,000 s. The  has produced a reference sequence of the   human genome, which is used worldwide in s. The human genome had fewer genes than expected, with only about 1.5% coding for s, and the rest comprised by s, s,  and controversially so-called.

Chromosomes


There are 24 distinct human s: 22 chromosomes, plus the   and  chromosomes. Chromosomes 1–22 are numbered roughly in order of decreasing size. s usually have one copy of chromosomes 1–22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46.

Genes
There are an estimated 20,000–25,000 human protein-coding s . The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and methods have improved, and could continue to drop further.

Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the and the. However, human cells make extensive use of to produce several different proteins from a single gene, and the human  is thought to be much larger than those of the aforementioned organisms.

Most human genes have multiple s, and human s are frequently much longer than the flanking exons.

Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with and. The significance of these nonrandom patterns of gene density is not well understood.

In addition to protein coding genes, the human genome contains thousands of s, including, RNA, , and other non-coding RNA genes.

Regulatory sequences
The human genome has many different which are crucial to controlling. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a is only beginning to emerge from computational, high-throughput expression and  studies.

Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the human and, for example, occurred 70–90 million years ago. So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation.

Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes.

Other DNA
Protein-coding sequences (specifically, coding s) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human. Much of this is comprised of:

elements

 * s
 * s
 * s
 * s
 * s

s

 * s
 * Ty1-copia
 * Ty3-gypsy
 * Non-LTR
 * DNA Transposons
 * DNA Transposons
 * DNA Transposons
 * DNA Transposons

s
However, there is also a large amount of sequence that does not fall under any known classification.

Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as. There are, however, a variety of emerging indications that many sequences within are likely to function in ways that are not fully understood. Recent experiments using have revealed that a substantial fraction of non-genic DNA is in fact transcribed into, which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the ian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown. The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry.

Variation
Most studies of human genetic variation have focused on, which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur on average somewhere between every 1 in 100 and 1 in 1,000 base pairs in the human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of, genetically 99.9% the same", although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in. A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the.

The genomic loci and length of certain types of small are highly variable from person to person, which is the basis of  and DNA  technologies. The portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. ,, and a number of other diseases result from of entire chromosomes. cells frequently have of chromosomes and chromosome arms, although a  relationship between aneuploidy and cancer has not been established.

Genetic disorders
These conditions are caused by abnormal expression of one or more genes that matches a clinical phenotype. The disorder may be caused by a gene mutation, an abnormal number of chromosomes, or triplet expansion repeat mutations. Defective genes can be inherited from the parents, in which case it is known as a hereditary disease. There are around 4,000 known genetic disorders, with the most common being.

Studies of genetic disorders is often performed by means of. Treatment is performed by a -physician trained in clinical genetics. The results of the are likely to provide increased availability of  for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.

One major gross effect on human s derives from, whose effects play a role in disorders caused by duplication, omission, or disruption of chromosomes. For example, those afflicted with, or , experience high rates of , an effect thought to be related to the overexpression of the Alzheimer's-related whose gene is located on chromosome 21. By contrast, Down's syndrome sufferers experience lower rates of, possibly due to the overexpression of a.

Evolution
studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of those species approximately 200 million years ago, containing the vast majority of genes. Intriguingly, since genes and known regulatory sequences probably comprise less than 2% of the genome, this suggests that there may be more unknown functional sequence than known functional sequence. A smaller, but large, fraction of human genes seem to be shared among most known s. The genome is 95% identical to the human genome. On average, a typical human protein-coding gene differs from its chimpanzee by only two  substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human, which is equivalent to a fusion product of chimpanzee chromosomes and.

Humans have undergone an extraordinary loss of genes during our recent evolution, which explains our relatively crude sense of  compared to most other mammals. Evolutionary evidence suggests that the emergence of in humans and several other  species has diminished the need for the sense of smell.

Mitochondrial genome
The human, while usually not included when referring to the "human genome", is of tremendous interest to geneticists, since it undoubtedly plays a role in. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (see )

Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of from  or ns from southeastern. It has also been used to show that there is no trace of DNA in the European gene mixture.

Epigenome
A variety of features of the human genome that transcend its primary DNA sequence, such as packaging,  modifications and, are important in regulating gene expression, genome replication and other cellular processes. These "epigenetic" features are thought to be involved in cancer and other abnormalities, and some may be heritable across generations.