RESUMO
Modern sequencing instruments bring unprecedented opportunity to study within-host viral evolution in conjunction with viral transmissions between hosts. However, no computational simulators are available to assist the characterization of within-host dynamics. This limits our ability to interpret epidemiological predictions incorporating within-host evolution and to validate computational inference tools. To fill this need we developed Apollo, a GPU-accelerated, out-of-core tool for within-host simulation of viral evolution and infection dynamics across population, tissue, and cellular levels. Apollo is scalable to hundreds of millions of viral genomes and can handle complex demographic and population genetic models. Apollo can replicate real within-host viral evolution; accurately recapturing observed viral sequences from an HIV cohort derived from initial population-genetic configurations. For practical applications, using Apollo-simulated viral genomes and transmission networks, we validated and uncovered the limitations of a widely used viral transmission inference tool.
RESUMO
Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.
Assuntos
Genômica , Polimorfismo de Nucleotídeo Único , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Transcriptome-wide association studies (TWAS) have been successful in identifying putative disease susceptibility genes by integrating gene expression predictions with genome-wide association studies (GWAS) data. However, current TWAS models only consider cis-located variants to predict gene expression. Here, we introduce transTF-TWAS, which includes transcription factor (TF)-linked trans-located variants for model building. Using data from the Genotype-Tissue Expression project, we predict alternative splicing and gene expression and applied these models to large GWAS datasets for breast, prostate, and lung cancers. Our analysis revealed 887 putative cancer susceptibility genes, including 465 in regions not yet reported by previous GWAS and 137 in known GWAS loci but not yet reported previously, at Bonferroni-corrected P < 0.05. We demonstrate that transTF-TWAS surpasses other approaches in both building gene prediction models and identifying disease-associated genes. These results have shed new light on several genetically driven key regulators and their associated regulatory networks underlying disease susceptibility.
RESUMO
Genetic interactions play critical roles in genotype-phenotype associations. We developed a novel interaction-integrated linear mixed model (ILMM) that integrates a priori knowledge into linear mixed models. ILMM enables statistical integration of genetic interactions upfront and overcomes the problems of searching for combinations. To demonstrate its utility, with 3D genomic interactions (assessed by Hi-C experiments) as a priori, we applied ILMM to whole-genome sequencing data for Autism Spectrum Disorders (ASD) and brain transcriptome data, revealing the 3D-genetic basis of ASD and 3D-expression quantitative loci (3D-eQTLs) for brain tissues. Notably, we reported a potential mechanism involving distal regulation between FOXP2 and DNMT3A, conferring the risk of ASD.
Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno do Espectro Autista/genética , Transtorno Autístico/genética , Encéfalo , Predisposição Genética para Doença , Genômica , Sequenciamento Completo do GenomaRESUMO
The coronavirus disease 2019 virus outbreak continues worldwide, with many variants emerging, some of which are considered variants of concern (VOCs). The WHO designated Omicron as a VOC and assigned it under variant B.1.1.529. Here, we used computational studies to examine the VOCs, including Omicron subvariants, and one variant of interest. Here we found that the binding affinity of human receptor angiotensin-converting enzyme 2 (hACE2) and receptor-binding domain (RBDs) increased in the order of wild type (Wuhan-strain) < Beta < Alpha < OmicronBA.5 < Gamma < Delta < Omicron BA.2.75 < BA.1 < BA.3 < BA.2. Interactions between docked complexes revealed that the RBD residue positions like 452, 478, 493, 498, 501, and 505 are crucial in creating strong interactions with hACE2. Omicron BA.2 shows the highest binding capacity to the hACE2 receptor among all the mutant complexes. The BA.5's L452R, F486V, and T478K mutation significantly impact the interaction network in the BA.5 RBD-hACE2 interface. Here for the first time, we report the His505, an active residue on the RBD forming a salt bridge in the BA.2, leading to increased mutation stability. When the active RBD residues are mutated, binding affinity and intermolecular interactions increase across all mutant complexes. By examining the differences in different variants, this study may provide a solid foundation for structure-based drug design for newly emerging variants.
Assuntos
COVID-19 , Humanos , Surtos de Doenças , Ligação Proteica , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genéticaRESUMO
The COVID-19 pandemic has illustrated the importance of infection tracking. The role of asymptomatic, undiagnosed individuals in driving infections within this pandemic has become increasingly evident. Modern phylogenetic tools that take into account asymptomatic or undiagnosed individuals can help guide public health responses. We finetuned established phylogenetic pipelines using published SARS-CoV-2 genomic data to examine reasonable estimate transmission networks with the inference of unsampled infection sources. The system utilised Bayesian phylogenetics and TransPhylo to capture the evolutionary and infection dynamics of SARS-CoV-2. Our analyses gave insight into the transmissions within a population including unsampled sources of infection and the results aligned with epidemiological observations. We were able to observe the effects of preventive measures in Canada's "Atlantic bubble" and in populations such as New York State. The tools also inferred the cross-species disease transmission of SARS-CoV-2 transmission from humans to lions and tigers in New York City's Bronx Zoo. These phylogenetic tools offer a powerful approach in response to both the COVID-19 and other emerging infectious disease outbreaks.
Assuntos
COVID-19 , Teorema de Bayes , FilogeniaRESUMO
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
Assuntos
Técnicas Genéticas , Genética Microbiana/métodos , Haplótipos , Software , Algoritmos , Evolução Biológica , HIV/genética , Humanos , Plasmodium vivax/genéticaRESUMO
Keywords: HIV; Canada; molecular phylogenetics; viral evolution; person-to-person transmission inference; transmission network; summary statistics.