Explain how de Bruijn graphs are used in genome assembly.
Answer
De Bruijn graphs are the basis of most short-read genome assemblers. Construction: 1) Break reads into k-mers (overlapping subsequences of length k). 2) Create nodes for each unique (k-1)-mer. 3) Create directed edges representing k-mers connecting consecutive (k-1)-mers. Assembly involves finding an Eulerian path (visiting each edge once) through the graph. Advantages: handles high coverage efficiently, implicitly captures read overlaps without all-vs-all comparison. Challenges: repeats create bubbles and tangles, sequencing errors create spurious nodes, k-mer choice affects resolution (larger k resolves repeats but requires higher coverage). Assemblers (SPAdes, MEGAHIT) use error correction, coverage information, paired-end reads, and multiple k values to improve contiguity.
Master These Concepts with IIT Certification
175+ hours of industry projects. Get placed at Bosch, Tata Motors, L&T and 500+ companies.