The Diploid Genome Sequence of an Individual Human
Samuel Levy1*, Granger Sutton1, Pauline C. Ng1, Lars Feuk2, Aaron L. Halpern1, Brian P. Walenz1, Nelson Axelrod1, Jiaqi Huang1, Ewen F. Kirkness1, Gennady Denisov1, Yuan Lin1, Jeffrey R. MacDonald2, Andy Wing Chun Pang2, Mary Shago2, Timothy B. Stockwell1, Alexia Tsiamouri1, Vineet Bafna3, Vikas Bansal3, Saul A. Kravitz1, Dana A. Busam1, Karen Y. Beeson1, Tina C. McIntosh1, Karin A. Remington1, Josep F. Abril4, John Gill1, Jon Borman1, Yu-Hui Rogers1, Marvin E. Frazier1, Stephen W. Scherer2, Robert L. Strausberg1, J. Craig Venter1
1 J. Craig Venter Institute, Rockville, Maryland, United States of America,
2 Program in Genetics and Genomic Biology, The Hospital for Sick Children, and Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada,
3 Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America,
4 Genetics Department, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
Presented here is a genome sequence of an individual human. It was produced from 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Funding. Funding was provided from the J. Craig Venter Institute, Genome Canada/Ontario Genomics Institute, The Hospital for Sick Children Foundation, the McLaughlin Centre for Molecular Medicine, and the Canada Foundation for Innovation. SWS is an Investigator of the Canadian Institutes of Health Research (CIHR) and a Fellow of the Canadian Institute for Advanced Research. LF is supported by CIHR scholarship.
Competing interests. The authors have declared that no competing interests exist.
Academic Editor: Edward M. Rubin, Lawrence Berkeley National Laboratories, United States of America
Citation: Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biol 5(10): e254 doi:10.1371/journal.pbio.0050254
Received: May 9, 2007; Accepted: July 30, 2007; Published: September 4, 2007