Your browser doesn't support javascript.
loading
Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3.
Hansen, Nancy F; Wang, Xunde; Tegegn, Mickias B; Liu, Zhi; Gouveia, Mateus H; Hill, Gracelyn; Lin, Jennifer C; Okulosubo, Temiloluwa; Shriner, Daniel; Thein, Swee Lay; Mullikin, James C.
Afiliación
  • Hansen NF; Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Wang X; Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA.
  • Tegegn MB; Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA.
  • Liu Z; Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Gouveia MH; Center for Research on Genomics and Global Health, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Hill G; Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Lin JC; Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Okulosubo T; Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA.
  • Shriner D; Center for Research on Genomics and Global Health, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
  • Thein SL; Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA.
  • Mullikin JC; Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA.
bioRxiv ; 2023 Nov 27.
Article en En | MEDLINE | ID: mdl-38076833
ABSTRACT
In regions where reads don't align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human cohorts. Furthermore, population allele frequencies produced by our methods across a wide set of ancestries agree more closely with previously-determined frequencies than those obtained using currently available genotyping software.
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos