RESUMO
BACKGROUND: Homology-based recombination (HR) is the cornerstone of genetic mapping. However, a lack of sufficient sequence homology or the presence of a genomic rearrangement prevents HR through crossing, which inhibits genetic mapping in relevant genomic regions. This is particularly true in species hybrids whose genomic sequences are highly divergent along with various genome arrangements, making the mapping of genetic loci, such as hybrid incompatibility (HI) loci, through crossing impractical. We previously mapped tens of HI loci between two nematodes, Caenorhabditis briggsae and C. nigoni, through the repeated backcrossing of GFP-linked C. briggsae fragments into C. nigoni. However, the median introgression size was over 7 Mb, indicating apparent HR suppression and preventing the subsequent cloning of the causative gene underlying a given HI phenotype. Therefore, a robust method that permits recombination independent of sequence homology is desperately desired. RESULTS: Here, we report a method of highly efficient targeted recombination (TR) induced by CRISPR/Cas9 with dual guide RNAs (gRNAs), which circumvents the HR suppression in hybrids between the two species. We demonstrated that a single gRNA was able to induce efficient TR between highly homologous sequences only in the F1 hybrids but not in the hybrids that carry a GFP-linked C. briggsae fragment in an otherwise C. nigoni background. We achieved highly efficient TR, regardless of sequence homology or genetic background, when dual gRNAs were used that each specifically targeted one parental chromosome. We further showed that dual gRNAs were able to induce efficient TR within genomic regions that had undergone inversion, in which HR-based recombination was expected to be suppressed, supporting the idea that dual-gRNA-induced TR can be achieved through nonhomology-based end joining between two parental chromosomes. CONCLUSIONS: Recombination suppression can be circumvented through CRISPR/Cas9 with dual gRNAs, regardless of sequence homology or the genetic background of the species hybrid. This method is expected to be applicable to other situations in which recombination is suppressed in interspecies or intrapopulation hybrids.
Assuntos
Caenorhabditis , Animais , Caenorhabditis/genética , Sistemas CRISPR-Cas , Mapeamento Cromossômico , Genoma , Recombinação GenéticaRESUMO
Modeling longitudinal trajectories and identifying latent classes of trajectories is of great interest in biomedical research, and software to identify latent classes of such is readily available for latent class trajectory analysis (LCTA), growth mixture modeling (GMM) and covariance pattern mixture models (CPMM). In biomedical applications, the level of within-person correlation is often non-negligible, which can impact the model choice and interpretation. LCTA does not incorporate this correlation. GMM does so through random effects, while CPMM specifies a model for within-class marginal covariance matrix. Previous work has investigated the impact of constraining covariance structures, both within and across classes, in GMMs-an approach often used to solve convergence problems. Using simulation, we focused specifically on how misspecification of the temporal correlation structure and strength, but correct variances, impacts class enumeration and parameter estimation under LCTA and CPMM. We found (1) even in the presence of weak correlation, LCTA often does not reproduce original classes, (2) CPMM performs well in class enumeration when the correct correlation structure is selected, and (3) regardless of misspecification of the correlation structure, both LCTA and CPMM give unbiased estimates of the class trajectory parameters when the within-individual correlation is weak and the number of classes is correctly specified. However, the bias increases markedly when the correlation is moderate for LCTA and when the incorrect correlation structure is used for CPMM. This work highlights the importance of correlation alone in obtaining appropriate model interpretations and provides insight into model choice.
Assuntos
Pesquisa Biomédica , Software , Humanos , Simulação por Computador , Análise de Classes Latentes , ViésRESUMO
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
RESUMO
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
RESUMO
Roughly 3% of the human genome is composed of variable-number tandem repeats (VNTRs): arrays of motifs at least six bases. These loci are highly polymorphic, yet current approaches that define and merge variants based on alignment breakpoints do not capture their full diversity. Here we present a method vamos: VNTR Annotation using efficient Motif Sets that instead annotates VNTR using repeat composition under different levels of motif diversity. Using vamos we estimate 7.4-16.7 alleles per locus when applied to 74 haplotype-resolved human assemblies, compared to breakpoint-based approaches that estimate 4.0-5.5 alleles per locus.
Assuntos
Repetições Minissatélites , HumanosRESUMO
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds â¼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
RESUMO
BACKGROUND: Noninvasive prenatal testing (NIPT) is one of the most commonly employed clinical measures for screening of fetal aneuploidy. Fetal Fraction (ff) has been demonstrated to be one of the key factors affecting the performance of NIPT. Accurate quantification of ff plays vital role in NIPT. METHODS: In this study, we present a new approach, the accurate Quantification of Fetal Fraction with Shallow-Coverage sequencing of maternal plasma DNA (FF-QuantSC), for the estimation of ff in NIPT. The method employs neural network model and utilizes differential genomic patterns between fetal and maternal genomes to quantify ff. RESULTS: Our results show that the predicted ff by FF-QuantSC exhibit high correlation with the Y chromosome-based method on male pregnancies, and achieves the highest accuracy compared with other ff estimation approaches. We also demonstrate that the model generates statistically similar results on both male and female pregnancies. CONCLUSION: FF-QuantSC achieves high accuracy in ff quantification. The method is suitable for application in both male and female pregnancies. Since the method does not require additional information upon NIPT routines, it can be easily incorporated into current NIPT settings without causing extra costs. We believe that FF-QuantSC shall provide valuable additions to NIPT.