When scientists said they sequenced the human genome two decades ago, their announcement was a bit premature. While researchers did have access to the DNA sequence of most protein-coding genes, around 8% of this genome still remained unsequenced to this day. It is now done. However, these latter percentages, once considered "junk" DNA, contain more than just junk.
The Human Genome Project was a program launched at the end of 1988 whose mission was to establish the complete DNA sequencing of the human genome. Its "completion" was announced on April 14, 2003. Quotation marks are in order here. At the time, researchers had indeed completed most of the puzzle. However, technological limitations prevented the integration of approximately 15% of the human DNA sequence in the table.
Most unmapped regions at the time were concentrated around telomeres and centromeres. The former are the caps located at the ends of the chromosomes while the latter are the densely packed middle sections of the chromosomes. In 2013, researchers then reduced this gap to just 8% , but they still could not integrate and place about 200 million base pairs, the equivalent of an entire chromosome.
More recently, a consortium led by the National Human Genome Research Institute, the University of California, and the University of Washington finally succeeded in mapping the entire human genome . This new work has been published in the journal Science.
As a reminder, DNA is made up of small molecules called nucleotides, each containing a phosphate group, a sugar molecule and a nitrogenous base. The four types of nitrogenous bases are adenine, thymine, guanine and cytosine. All of them pair up to form the rungs of the DNA double helix that encodes our genetic identity. Two strands of these double helices form a chromosome and we inherit 23 pairs of chromosomes of our biological parents.
This genetic material is present in each of our cells. Each cell then reads the genes that concern it. For example, a skin cell consults information on the texture and color of the skin or the amount of hair, but does not read the instructions on the eyes, the heart or the stomach.
Concretely, DNA sequencing is the process ofdetermining the order of the building blocks of the base pair in a section of DNA. To sequence the human genome in recent years, researchers have relied on short-read technologies. The latter are able to scan several hundred base pairs at a time, separating them into tiny DNA fragments compared to the much larger entire genome. The project was then similar to assembling a puzzle of ten million pieces of blue sky, which left lots of gaps .
This work was also difficult in that both chromosomes in a chromosome pair came from each parent, making it difficult to distinguish between DNA sequences from the same stretch of the genome more difficult.
To circumvent these difficulties, the researchers here turned to a strange human tissue that forms when a sperm fertilizes an egg without a nucleus. In this configuration, the egg is not viable and attaches to the uterus to develop with all the chromosomes from the father, but none from the mother.
From this result, the researchers created a cell line containing 23 pairs of chromosomes from a single person. They then used new techniques to sequence this DNA. These new so-called "long-read" techniques use lasers to scan 20,000 to one million base pairs at a time, creating much larger puzzle pieces, which have closed gaps previously seen . To sum up, this very complicated new work has thus made it possible to fill in the "holes" in the sequences that the technology of the time was unable to complete.
Having a complete sequence will allow researchers to study regions that were previously inaccessible. However, the latter could harbor complex DNA mutations that hold clues to certain diseases.