Your browser doesn't support javascript.
loading
Semi-automated assembly of high-quality diploid human reference genomes.
Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.
Afiliação
  • Jarvis ED; Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
  • Formenti G; Howard Hughes Medical Institute, Chevy Chase, MD, USA. ejarvis@rockefeller.edu.
  • Rhie A; Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA. gformenti@rockefeller.edu.
  • Guarracino A; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
  • Yang C; Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy.
  • Wood J; BGI-Shenzhen, Shenzhen, China.
  • Tracey A; Tree of Life, Wellcome Sanger Institute, Cambridge, UK.
  • Thibaud-Nissen F; Tree of Life, Wellcome Sanger Institute, Cambridge, UK.
  • Vollger MR; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
  • Porubsky D; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Cheng H; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Asri M; Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
  • Logsdon GA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Carnevali P; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
  • Chaisson MJP; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Chin CS; Chan Zuckerberg Initiative, Redwood City, CA, USA.
  • Cody S; Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
  • Collins J; Foundation for Biological Data Science, Belmont, CA, USA.
  • Ebert P; McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
  • Escalona M; Tree of Life, Wellcome Sanger Institute, Cambridge, UK.
  • Fedrigo O; Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
  • Fulton RS; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
  • Fulton LL; Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA.
  • Garg S; McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
  • Gerton JL; McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
  • Ghurye J; Department of Biology, University of Copenhagen, Copenhagen, Denmark.
  • Granat A; Stowers Institute for Medical Research, Kansas City, MO, USA.
  • Green RE; Dovetail Genomics, Scotts Valley, CA, USA.
  • Harvey W; Illumina, Inc., San Diego, CA, USA.
  • Hasenfeld P; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
  • Hastie A; Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
  • Haukness M; European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
  • Jaeger EB; Bionano Genomics, San Diego, CA, USA.
  • Jain M; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
  • Kirsche M; Illumina, Inc., San Diego, CA, USA.
  • Kolmogorov M; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
  • Korbel JO; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
  • Koren S; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
  • Korlach J; European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
  • Lee J; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
  • Li D; Pacific Biosciences, Menlo Park, CA, USA.
  • Lindsay T; Bionano Genomics, San Diego, CA, USA.
  • Lucas J; Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
  • Luo F; The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
  • Marschall T; McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
  • Mitchell MW; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
  • McDaniel J; School of Computing, Clemson University, Clemson, SC, USA.
  • Nie F; Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
  • Olsen HE; Coriell Institute for Medical Research, Camden, NJ, USA.
  • Olson ND; Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
Nature ; 611(7936): 519-531, 2022 Nov.
Article em En | MEDLINE | ID: mdl-36261518
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Genoma Humano / Mapeamento Cromossômico / Genômica / Diploide Limite: Humans Idioma: En Revista: Nature Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Genoma Humano / Mapeamento Cromossômico / Genômica / Diploide Limite: Humans Idioma: En Revista: Nature Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos