Pesquisa | Biblioteca Virtual em Saúde

Analysis and benchmarking of small and large genomic variants across tandem repeats.

English, Adam C; Dolzhenko, Egor; Ziaei Jam, Helyaneh; McKenzie, Sean K; Olson, Nathan D; De Coster, Wouter; Park, Jonghun; Gu, Bida; Wagner, Justin; Eberle, Michael A; Gymrek, Melissa; Chaisson, Mark J P; Zook, Justin M; Sedlazeck, Fritz J.

Nat Biotechnol ; 2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38671154

RESUMO

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.

Benchmarking of small and large variants across tandem repeats.

English, Adam; Dolzhenko, Egor; Jam, Helyaneh Ziaei; Mckenzie, Sean; Olson, Nathan D; De Coster, Wouter; Park, Jonghun; Gu, Bida; Wagner, Justin; Eberle, Michael A; Gymrek, Melissa; Chaisson, Mark J P; Zook, Justin M; Sedlazeck, Fritz J.

bioRxiv ; 2023 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-37961319

RESUMO

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds â¼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.

Efficient targeted recombination with CRISPR/Cas9 in hybrids of Caenorhabditis nematodes with suppressed recombination.

Xie, Dongying; Gu, Bida; Liu, Yiqing; Ye, Pohao; Ma, Yiming; Wen, Tongshu; Song, Xiaoyuan; Zhao, Zhongying.

BMC Biol ; 21(1): 203, 2023 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-37775783

RESUMO

BACKGROUND: Homology-based recombination (HR) is the cornerstone of genetic mapping. However, a lack of sufficient sequence homology or the presence of a genomic rearrangement prevents HR through crossing, which inhibits genetic mapping in relevant genomic regions. This is particularly true in species hybrids whose genomic sequences are highly divergent along with various genome arrangements, making the mapping of genetic loci, such as hybrid incompatibility (HI) loci, through crossing impractical. We previously mapped tens of HI loci between two nematodes, Caenorhabditis briggsae and C. nigoni, through the repeated backcrossing of GFP-linked C. briggsae fragments into C. nigoni. However, the median introgression size was over 7 Mb, indicating apparent HR suppression and preventing the subsequent cloning of the causative gene underlying a given HI phenotype. Therefore, a robust method that permits recombination independent of sequence homology is desperately desired. RESULTS: Here, we report a method of highly efficient targeted recombination (TR) induced by CRISPR/Cas9 with dual guide RNAs (gRNAs), which circumvents the HR suppression in hybrids between the two species. We demonstrated that a single gRNA was able to induce efficient TR between highly homologous sequences only in the F1 hybrids but not in the hybrids that carry a GFP-linked C. briggsae fragment in an otherwise C. nigoni background. We achieved highly efficient TR, regardless of sequence homology or genetic background, when dual gRNAs were used that each specifically targeted one parental chromosome. We further showed that dual gRNAs were able to induce efficient TR within genomic regions that had undergone inversion, in which HR-based recombination was expected to be suppressed, supporting the idea that dual-gRNA-induced TR can be achieved through nonhomology-based end joining between two parental chromosomes. CONCLUSIONS: Recombination suppression can be circumvented through CRISPR/Cas9 with dual gRNAs, regardless of sequence homology or the genetic background of the species hybrid. This method is expected to be applicable to other situations in which recombination is suppressed in interspecies or intrapopulation hybrids.

Assuntos

Caenorhabditis , Animais , Caenorhabditis/genética , Sistemas CRISPR-Cas , Mapeamento Cromossômico , Genoma , Recombinação Genética

vamos: variable-number tandem repeats annotation using efficient motif sets.

Ren, Jingwen; Gu, Bida; Chaisson, Mark J P.

Genome Biol ; 24(1): 175, 2023 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-37501141

RESUMO

Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity. Using vamos we estimate 7.4-16.7 alleles per locus when applied to 74 haplotype-resolved human assemblies, compared to breakpoint-based approaches that estimate 4.0-5.5 alleles per locus.

Assuntos

Repetições Minissatélites , Humanos

Exploration of model misspecification in latent class methods for longitudinal data: Correlation structure matters.

Neely, Megan L; Pieper, Carl F; Gu, Bida; Dmitrieva, Natalia O; Pendergast, Jane F.

Stat Med ; 42(14): 2420-2438, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37019876

RESUMO

Modeling longitudinal trajectories and identifying latent classes of trajectories is of great interest in biomedical research, and software to identify latent classes of such is readily available for latent class trajectory analysis (LCTA), growth mixture modeling (GMM) and covariance pattern mixture models (CPMM). In biomedical applications, the level of within-person correlation is often non-negligible, which can impact the model choice and interpretation. LCTA does not incorporate this correlation. GMM does so through random effects, while CPMM specifies a model for within-class marginal covariance matrix. Previous work has investigated the impact of constraining covariance structures, both within and across classes, in GMMs-an approach often used to solve convergence problems. Using simulation, we focused specifically on how misspecification of the temporal correlation structure and strength, but correct variances, impacts class enumeration and parameter estimation under LCTA and CPMM. We found (1) even in the presence of weak correlation, LCTA often does not reproduce original classes, (2) CPMM performs well in class enumeration when the correct correlation structure is selected, and (3) regardless of misspecification of the correlation structure, both LCTA and CPMM give unbiased estimates of the class trajectory parameters when the within-individual correlation is weak and the number of classes is correctly specified. However, the bias increases markedly when the correlation is moderate for LCTA and when the incorrect correlation structure is used for CPMM. This work highlights the importance of correlation alone in obtaining appropriate model interpretations and provides insight into model choice.

Assuntos

Pesquisa Biomédica , Software , Humanos , Simulação por Computador , Análise de Classes Latentes , Viés

FF-QuantSC: accurate quantification of fetal fraction by a neural network model.

Yuan, Yuying; Chai, Xianghua; Liu, Na; Gu, Bida; Li, Shengting; Gao, Ya; Zhou, Lijun; Liu, Qiang; Yang, Fan; Liu, Jingjuan; Qiu, Jiao; Zhang, Jinjin; Hou, Yumei; Cen, Miaolan; Tian, Zhongming; Tang, Weijiang; Zhang, Hongyun; Chen, Fang; Yin, Ye; Wang, Wei.

Mol Genet Genomic Med ; 8(6): e1232, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32281746

RESUMO

BACKGROUND: Noninvasive prenatal testing (NIPT) is one of the most commonly employed clinical measures for screening of fetal aneuploidy. Fetal Fraction (ff) has been demonstrated to be one of the key factors affecting the performance of NIPT. Accurate quantification of ff plays vital role in NIPT. METHODS: In this study, we present a new approach, the accurate Quantification of Fetal Fraction with Shallow-Coverage sequencing of maternal plasma DNA (FF-QuantSC), for the estimation of ff in NIPT. The method employs neural network model and utilizes differential genomic patterns between fetal and maternal genomes to quantify ff. RESULTS: Our results show that the predicted ff by FF-QuantSC exhibit high correlation with the Y chromosome-based method on male pregnancies, and achieves the highest accuracy compared with other ff estimation approaches. We also demonstrate that the model generates statistically similar results on both male and female pregnancies. CONCLUSION: FF-QuantSC achieves high accuracy in ff quantification. The method is suitable for application in both male and female pregnancies. Since the method does not require additional information upon NIPT routines, it can be easily incorporated into current NIPT settings without causing extra costs. We believe that FF-QuantSC shall provide valuable additions to NIPT.

Assuntos

Redes Neurais de Computação , Teste Pré-Natal não Invasivo/métodos , Análise de Sequência de DNA/métodos , Adulto , Feminino , Humanos , Teste Pré-Natal não Invasivo/normas , Gravidez , Sensibilidade e Especificidade , Análise de Sequência de DNA/normas , Software

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA