Biochemistry Questions Biochemistry Questions / Explain the process of Illumina sequencing, including library preparation, cluster generation, and sequencing by synthesis. Discuss the key features and innovations of Illumina sequencing.

Illumina sequencing is the dominant DNA sequencing platform used in laboratories around the world. An advancement from Sanger sequencing, this sequencing-by-synthesis technology uses dye-terminator nucleotides and cluster amplification to allow many thousands of Sanger sequencing reactions to happen in parallel.

The first stage of Illumina sequencing is to prepare the library. The DNA of interest is broken up into short sections of ~400bp, and adaptor molecules are added to the ends. These adaptor molecules allow the hybridisation of the DNA of interest to the flow cell (a sort-of slide with complementary DNA bound to its surface), as well as containing known restriction and polymerase-binding sites. Barcode sequences can also be added to the adaptor molecules, allowing the sequencing of multiple different samples simultaneously to increase throughput and reduce costs. The DNA-adaptor molecules strands are washed over the flow cell, binding to complementary DNA strands attached to the flow cell. This allows the separation of the DNA of interest, similar to spreading a liquid culture on agar to grow colonies. PCR is carried out on the DNA strands, through a specific process known as bridge or cluster amplification. This involves repeating the three stages of PCR (strand separation, annealing, and extension) to form clusters of identical DNA strands, amplifying the fluorescence signal and enabling it to be detected by the sensors in the device. A restriction enzyme is used to remove one set of strands in either the 5’→3’ or 3’→5’ direction, ensuring only ssDNA of one direction is present, reducing the likelihood of ambiguous fluorescence. A DNA polymerase is then added, along with reversibly-fluorescent chain-terminating nucleotides. This is different to Sanger sequencing, where the fluorescence is not reversible. Each ssDNA strand is extended by 1 nucleotide at a time, allowing for the fluorescence from each cluster to be detected, before the fluorescence is removed by a chemical deblocking step. This allows each strand to be extended by 1 nucleotide. This process is repeated until sequence reads of ~150bps are obtained. These reads can then be assembled computationally by methods first pioneered by Celera Genomics during their human genome sequencing work (a shotgun approach), producing a consensus sequence.

Due to the short sequencing reads of Illumina sequencing (~150bp), it is difficult to assemble complete genomes containing long repeat regions (such as satellites or telomeres), giving a discontinuous sequence.

Illumina sequencing has allowed the rapid decrease in DNA sequencing cost, enabling a highly automated process that has led to the cost of sequencing a genome to just $$2;000$$ USD. This cost will likely fall further with third generation sequencing technologies (such as PacBio and Nanopore), reducing library preparation processes.