Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24.149
Filter
Add more filters

Publication year range
1.
Cell ; 184(12): 3256-3266.e13, 2021 06 10.
Article in English | MEDLINE | ID: mdl-34048699

ABSTRACT

Northern East Asia was inhabited by modern humans as early as 40 thousand years ago (ka), as demonstrated by the Tianyuan individual. Using genome-wide data obtained from 25 individuals dated to 33.6-3.4 ka from the Amur region, we show that Tianyuan-related ancestry was widespread in northern East Asia before the Last Glacial Maximum (LGM). At the close of the LGM stadial, the earliest northern East Asian appeared in the Amur region, and this population is basal to ancient northern East Asians. Human populations in the Amur region have maintained genetic continuity from 14 ka, and these early inhabitants represent the closest East Asian source known for Ancient Paleo-Siberians. We also observed that EDAR V370A was likely to have been elevated to high frequency after the LGM, suggesting the possible timing for its selection. This study provides a deep look into the population dynamics of northern East Asia.


Subject(s)
Population Dynamics , DNA, Ancient/analysis , Asia, Eastern , Female , Genetic Variation , Genetics, Population , Genome, Human , Geography , Humans , Ice Cover , Likelihood Functions , Male , Models, Genetic , Phylogeny , Principal Component Analysis , Time Factors
2.
Cell ; 181(5): 997-1003.e9, 2020 05 28.
Article in English | MEDLINE | ID: mdl-32359424

ABSTRACT

Coronavirus disease 2019 (COVID-19) is caused by SARS-CoV-2 infection and was first reported in central China in December 2019. Extensive molecular surveillance in Guangdong, China's most populous province, during early 2020 resulted in 1,388 reported RNA-positive cases from 1.6 million tests. In order to understand the molecular epidemiology and genetic diversity of SARS-CoV-2 in China, we generated 53 genomes from infected individuals in Guangdong using a combination of metagenomic sequencing and tiling amplicon approaches. Combined epidemiological and phylogenetic analyses indicate multiple independent introductions to Guangdong, although phylogenetic clustering is uncertain because of low virus genetic variation early in the pandemic. Our results illustrate how the timing, size, and duration of putative local transmission chains were constrained by national travel restrictions and by the province's large-scale intensive surveillance and intervention measures. Despite these successes, COVID-19 surveillance in Guangdong is still required, because the number of cases imported from other countries has increased.


Subject(s)
Betacoronavirus/genetics , Coronavirus Infections/epidemiology , Pneumonia, Viral/epidemiology , Bayes Theorem , COVID-19 , China/epidemiology , Coronavirus Infections/virology , Epidemiological Monitoring , Humans , Likelihood Functions , Pandemics , Pneumonia, Viral/virology , SARS-CoV-2 , Travel
3.
Cell ; 181(5): 990-996.e5, 2020 05 28.
Article in English | MEDLINE | ID: mdl-32386545

ABSTRACT

The novel coronavirus SARS-CoV-2 was first detected in the Pacific Northwest region of the United States in January 2020, with subsequent COVID-19 outbreaks detected in all 50 states by early March. To uncover the sources of SARS-CoV-2 introductions and patterns of spread within the United States, we sequenced nine viral genomes from early reported COVID-19 patients in Connecticut. Our phylogenetic analysis places the majority of these genomes with viruses sequenced from Washington state. By coupling our genomic data with domestic and international travel patterns, we show that early SARS-CoV-2 transmission in Connecticut was likely driven by domestic introductions. Moreover, the risk of domestic importation to Connecticut exceeded that of international importation by mid-March regardless of our estimated effects of federal travel restrictions. This study provides evidence of widespread sustained transmission of SARS-CoV-2 within the United States and highlights the critical need for local surveillance.


Subject(s)
Betacoronavirus/genetics , Coronavirus Infections/transmission , Pneumonia, Viral/transmission , Travel , Betacoronavirus/isolation & purification , COVID-19 , Connecticut/epidemiology , Coronavirus Infections/epidemiology , Coronavirus Infections/virology , Epidemiological Monitoring , Humans , Likelihood Functions , Pandemics , Phylogeny , Pneumonia, Viral/epidemiology , Pneumonia, Viral/virology , SARS-CoV-2 , Travel/legislation & jurisprudence , United States/epidemiology , Washington/epidemiology
4.
Cell ; 161(3): 450-457, 2015 Apr 23.
Article in English | MEDLINE | ID: mdl-25910205

ABSTRACT

Until only a few years ago, single-particle electron cryo-microscopy (cryo-EM) was usually not the first choice for many structural biologists due to its limited resolution in the range of nanometer to subnanometer. Now, this method rivals X-ray crystallography in terms of resolution and can be used to determine atomic structures of macromolecules that are either refractory to crystallization or difficult to crystallize in specific functional states. In this review, I discuss the recent breakthroughs in both hardware and software that transformed cryo-microscopy, enabling understanding of complex biomolecules and their functions at atomic level.


Subject(s)
Crystallography, X-Ray/instrumentation , Crystallography, X-Ray/methods , Molecular Conformation , Image Processing, Computer-Assisted , Likelihood Functions , Software
5.
Am J Hum Genet ; 111(2): 227-241, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38232729

ABSTRACT

Distinguishing genomic alterations in cancer-associated genes that have functional impact on tumor growth and disease progression from the ones that are passengers and confer no fitness advantage have important clinical implications. Evidence-based methods for nominating drivers are limited by existing knowledge on the oncogenic effects and therapeutic benefits of specific variants from clinical trials or experimental settings. As clinical sequencing becomes a mainstay of patient care, applying computational methods to mine the rapidly growing clinical genomic data holds promise in uncovering functional candidates beyond the existing knowledge base and expanding the patient population that could potentially benefit from genetically targeted therapies. We propose a statistical and computational method (MAGPIE) that builds on a likelihood approach leveraging the mutual exclusivity pattern within an oncogenic pathway for identifying probabilistically both the specific genes within a pathway and the individual mutations within such genes that are truly the drivers. Alterations in a cancer-associated gene are assumed to be a mixture of driver and passenger mutations with the passenger rates modeled in relationship to tumor mutational burden. We use simulations to study the operating characteristics of the method and assess false-positive and false-negative rates in driver nomination. When applied to a large study of primary melanomas, the method accurately identifies the known driver genes within the RTK-RAS pathway and nominates several rare variants as prime candidates for functional validation. A comprehensive evaluation of MAGPIE against existing tools has also been conducted leveraging the Cancer Genome Atlas data.


Subject(s)
Computational Biology , Neoplasms , Humans , Computational Biology/methods , Likelihood Functions , Neoplasms/genetics , Genomics/methods , Mutation/genetics , Algorithms
6.
Am J Hum Genet ; 111(4): 654-667, 2024 04 04.
Article in English | MEDLINE | ID: mdl-38471507

ABSTRACT

Allele-specific methylation (ASM) is an epigenetic modification whereby one parental allele becomes methylated and the other unmethylated at a specific locus. ASM is most often driven by the presence of nearby heterozygous variants that influence methylation, but also occurs somatically in the context of genomic imprinting. In this study, we investigate ASM using publicly available single-cell reduced representation bisulfite sequencing (scRRBS) data on 608 B cells sampled from six healthy B cell samples and 1,230 cells from 11 chronic lymphocytic leukemia (CLL) samples. We developed a likelihood-based criterion to test whether a CpG exhibited ASM, based on the distributions of methylated and unmethylated reads both within and across cells. Applying our likelihood ratio test, 65,998 CpG sites exhibited ASM in healthy B cell samples according to a Bonferroni criterion (p < 8.4 × 10-9), and 32,862 CpG sites exhibited ASM in CLL samples (p < 8.5 × 10-9). We also called ASM at the sample level. To evaluate the accuracy of our method, we called heterozygous variants from the scRRBS data, which enabled variant-based calls of ASM within each cell. Comparing sample-level ASM calls to the variant-based measures of ASM, we observed a positive predictive value of 76%-100% across samples. We observed high concordance of ASM across samples and an overrepresentation of ASM in previously reported imprinted genes and genes with imprinting binding motifs. Our study demonstrates that single-cell bisulfite sequencing is a potentially powerful tool to investigate ASM, especially as studies expand to increase the number of samples and cells sequenced.


Subject(s)
DNA Methylation , Leukemia, Lymphocytic, Chronic, B-Cell , Sulfites , Humans , DNA Methylation/genetics , Alleles , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Likelihood Functions , Genomic Imprinting/genetics , CpG Islands/genetics
7.
Am J Hum Genet ; 110(2): 314-325, 2023 02 02.
Article in English | MEDLINE | ID: mdl-36610401

ABSTRACT

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Likelihood Functions , Population Groups , Software , Genetics, Population
8.
Nat Methods ; 20(8): 1232-1236, 2023 08.
Article in English | MEDLINE | ID: mdl-37386188

ABSTRACT

Phylogenetic models of molecular evolution are central to numerous biological applications spanning diverse timescales, from hundreds of millions of years involving orthologous proteins to just tens of days relating to single cells within an organism. A fundamental problem in these applications is estimating model parameters, for which maximum likelihood estimation is typically employed. Unfortunately, maximum likelihood estimation is a computationally expensive task, in some cases prohibitively so. To address this challenge, we here introduce CherryML, a broadly applicable method that achieves several orders of magnitude speedup by using a quantized composite likelihood over cherries in the trees. The massive speedup offered by our method should enable researchers to consider more complex and biologically realistic models than previously possible. Here we demonstrate CherryML's utility by applying it to estimate a general 400 × 400 rate matrix for residue-residue coevolution at contact sites in three-dimensional protein structures; we estimate that using current state-of-the-art methods such as the expectation-maximization algorithm for the same task would take >100,000 times longer.


Subject(s)
Evolution, Molecular , Proteins , Phylogeny , Likelihood Functions , Algorithms , Models, Genetic
9.
Nat Methods ; 20(1): 139-148, 2023 01.
Article in English | MEDLINE | ID: mdl-36522500

ABSTRACT

Quantitative data analysis is important for any single-molecule localization microscopy (SMLM) workflow to extract biological insights from the coordinates of the single fluorophores. However, current approaches are restricted to simple geometries or require identical structures. Here, we present LocMoFit (Localization Model Fit), an open-source framework to fit an arbitrary model to localization coordinates. It extracts meaningful parameters from individual structures and can select the most suitable model. In addition to analyzing complex, heterogeneous and dynamic structures for in situ structural biology, we demonstrate how LocMoFit can assemble multi-protein distribution maps of six nuclear pore components, calculate single-particle averages without any assumption about geometry or symmetry, and perform a time-resolved reconstruction of the highly dynamic endocytic process from static snapshots. We provide extensive simulation and visualization routines to validate the robustness of LocMoFit and tutorials to enable any user to increase the information content they can extract from their SMLM data.


Subject(s)
Fluorescent Dyes , Single Molecule Imaging , Likelihood Functions , Fluorescent Dyes/chemistry
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38324624

ABSTRACT

Connections between circular RNAs (circRNAs) and microRNAs (miRNAs) assume a pivotal position in the onset, evolution, diagnosis and treatment of diseases and tumors. Selecting the most potential circRNA-related miRNAs and taking advantage of them as the biological markers or drug targets could be conducive to dealing with complex human diseases through preventive strategies, diagnostic procedures and therapeutic approaches. Compared to traditional biological experiments, leveraging computational models to integrate diverse biological data in order to infer potential associations proves to be a more efficient and cost-effective approach. This paper developed a model of Convolutional Autoencoder for CircRNA-MiRNA Associations (CA-CMA) prediction. Initially, this model merged the natural language characteristics of the circRNA and miRNA sequence with the features of circRNA-miRNA interactions. Subsequently, it utilized all circRNA-miRNA pairs to construct a molecular association network, which was then fine-tuned by labeled samples to optimize the network parameters. Finally, the prediction outcome is obtained by utilizing the deep neural networks classifier. This model innovatively combines the likelihood objective that preserves the neighborhood through optimization, to learn the continuous feature representation of words and preserve the spatial information of two-dimensional signals. During the process of 5-fold cross-validation, CA-CMA exhibited exceptional performance compared to numerous prior computational approaches, as evidenced by its mean area under the receiver operating characteristic curve of 0.9138 and a minimal SD of 0.0024. Furthermore, recent literature has confirmed the accuracy of 25 out of the top 30 circRNA-miRNA pairs identified with the highest CA-CMA scores during case studies. The results of these experiments highlight the robustness and versatility of our model.


Subject(s)
MicroRNAs , Neoplasms , Humans , MicroRNAs/genetics , RNA, Circular/genetics , Likelihood Functions , Neural Networks, Computer , Neoplasms/genetics , Computational Biology/methods
11.
Nucleic Acids Res ; 52(17): 10119-10131, 2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39180401

ABSTRACT

The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.


Subject(s)
Genomics , Markov Chains , Genomics/methods , Humans , Algorithms , Likelihood Functions
12.
Nucleic Acids Res ; 52(18): 10862-10878, 2024 Oct 14.
Article in English | MEDLINE | ID: mdl-39268572

ABSTRACT

Bacteria employ CRISPR-Cas systems for defense by integrating invader-derived sequences, termed spacers, into the CRISPR array, which constitutes an immunity memory. While spacer deletions occur randomly across the array, newly acquired spacers are predominantly integrated at the leader end. Consequently, spacer arrays can be used to derive the chronology of spacer insertions. Reconstruction of ancestral spacer acquisitions and deletions could help unravel the coevolution of phages and bacteria, the evolutionary dynamics in microbiomes, or track pathogens. However, standard reconstruction methods produce misleading results by overlooking insertion order and joint deletions of spacers. Here, we present SpacerPlacer, a maximum likelihood-based ancestral reconstruction approach for CRISPR array evolution. We used SpacerPlacer to reconstruct and investigate ancestral deletion events of 4565 CRISPR arrays, revealing that spacer deletions occur 374 times more frequently than mutations and are regularly deleted jointly, with an average of 2.7 spacers. Surprisingly, we observed a decrease in the spacer deletion frequency towards both ends of the reconstructed arrays. While the resulting trailer-end conservation is commonly observed, a reduced deletion frequency is now also detectable towards the variable leader end. Finally, our results point to the hypothesis that frequent loss of recently acquired spacers may provide a selective advantage.


Subject(s)
CRISPR-Cas Systems , Evolution, Molecular , Sequence Deletion , Bacteria/genetics , Bacteria/virology , Clustered Regularly Interspaced Short Palindromic Repeats , Bacteriophages/genetics , Likelihood Functions , Software
13.
PLoS Genet ; 19(10): e1010999, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37816069

ABSTRACT

Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.


Subject(s)
Butterflies , Gene Flow , Animals , Likelihood Functions , Genetic Speciation , Butterflies/genetics , Biological Evolution
14.
PLoS Genet ; 19(2): e1010638, 2023 02.
Article in English | MEDLINE | ID: mdl-36809357

ABSTRACT

Mediation analysis is commonly used to identify mechanisms and intermediate factors between causes and outcomes. Studies drawing on polygenic scores (PGSs) can readily employ traditional regression-based procedures to assess whether trait M mediates the relationship between the genetic component of outcome Y and outcome Y itself. However, this approach suffers from attenuation bias, as PGSs capture only a (small) part of the genetic variance of a given trait. To overcome this limitation, we developed MA-GREML: a method for Mediation Analysis using Genome-based Restricted Maximum Likelihood (GREML) estimation. Using MA-GREML to assess mediation between genetic factors and traits comes with two main advantages. First, we circumvent the limited predictive accuracy of PGSs that regression-based mediation approaches suffer from. Second, compared to methods employing summary statistics from genome-wide association studies, the individual-level data approach of GREML allows to directly control for confounders of the association between M and Y. In addition to typical GREML parameters (e.g., the genetic correlation), MA-GREML estimates (i) the effect of M on Y, (ii) the direct effect (i.e., the genetic variance of Y that is not mediated by M), and (iii) the indirect effect (i.e., the genetic variance of Y that is mediated by M). MA-GREML also provides standard errors of these estimates and assesses the significance of the indirect effect. We use analytical derivations and simulations to show the validity of our approach under two main assumptions, viz., that M precedes Y and that environmental confounders of the association between M and Y are controlled for. We conclude that MA-GREML is an appropriate tool to assess the mediating role of trait M in the relationship between the genetic component of Y and outcome Y. Using data from the US Health and Retirement Study, we provide evidence that genetic effects on Body Mass Index (BMI), cognitive functioning and self-reported health in later life run partially through educational attainment. For mental health, we do not find significant evidence for an indirect effect through educational attainment. Further analyses show that the additive genetic factors of these four outcomes do partially (cognition and mental health) and fully (BMI and self-reported health) run through an earlier realization of these traits.


Subject(s)
Genome-Wide Association Study , Genome , Humans , Likelihood Functions , Phenotype , Multifactorial Inheritance
15.
Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37871206

ABSTRACT

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.


Subject(s)
Algorithms , Gene Flow , Animals , Phylogeny , Computer Simulation , Bayes Theorem , Likelihood Functions , Models, Genetic
16.
PLoS Genet ; 19(9): e1010546, 2023 09.
Article in English | MEDLINE | ID: mdl-37721937

ABSTRACT

Genome-wide association studies (GWAS) are commonly used to identify genomic variants that are associated with complex traits, and estimate the magnitude of this association for each variant. However, it has been widely observed that the association estimates of variants tend to be lower in a replication study than in the study that discovered those associations. A phenomenon known as Winner's Curse is responsible for this upward bias present in association estimates of significant variants in the discovery study. We review existing Winner's Curse correction methods which require only GWAS summary statistics in order to make adjustments. In addition, we propose modifications to improve existing methods and propose a novel approach which uses the parametric bootstrap. We evaluate and compare methods, first using a wide variety of simulated data sets and then, using real data sets for three different traits. The metric, estimated mean squared error (MSE) over significant SNPs, was primarily used for method assessment. Our results indicate that widely used conditional likelihood based methods tend to perform poorly. The other considered methods behave much more similarly, with our proposed bootstrap method demonstrating very competitive performance. To complement this review, we have developed an R package, 'winnerscurse' which can be used to implement these various Winner's Curse adjustment methods to GWAS summary statistics.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Likelihood Functions , Genetic Association Studies , Bias , Phenotype , Polymorphism, Single Nucleotide/genetics
17.
Proc Natl Acad Sci U S A ; 120(3): e2207595120, 2023 01 17.
Article in English | MEDLINE | ID: mdl-36623178

ABSTRACT

Over the past two decades, multiple countries with high vaccine coverage have experienced resurgent outbreaks of mumps. Worryingly, in these countries, a high proportion of cases have been among those who have completed the recommended vaccination schedule, raising alarm about the effectiveness of existing vaccines. Two putative mechanisms of vaccine failure have been proposed as driving observed trends: 1) gradual waning of vaccine-derived immunity (necessitating additional booster doses) and 2) the introduction of novel viral genotypes capable of evading vaccinal immunity. Focusing on the United States, we conduct statistical likelihood-based hypothesis testing using a mechanistic transmission model on age-structured epidemiological, demographic, and vaccine uptake time series data. We find that the data are most consistent with the waning hypothesis and estimate that 32.8% (32%, 33.5%) of individuals lose vaccine-derived immunity by age 18 y. Furthermore, we show using our transmission model how waning vaccine immunity reproduces qualitative and quantitatively consistent features of epidemiological data, namely 1) the shift in mumps incidence toward older individuals, 2) the recent recurrence of mumps outbreaks, and 3) the high proportion of mumps cases among previously vaccinated individuals.


Subject(s)
Mumps , Vaccines , Humans , United States/epidemiology , Adolescent , Mumps/epidemiology , Mumps/prevention & control , Likelihood Functions , Mumps virus/genetics , Causality , Disease Outbreaks , Vaccination
18.
PLoS Genet ; 19(3): e1010683, 2023 03.
Article in English | MEDLINE | ID: mdl-36972309

ABSTRACT

Prokaryotic evolution is influenced by the exchange of genetic information between species through a process referred to as recombination. The rate of recombination is a useful measure for the adaptive capacity of a prokaryotic population. We introduce Rhometa (https://github.com/sid-krish/Rhometa), a new software package to determine recombination rates from shotgun sequencing reads of metagenomes. It extends the composite likelihood approach for population recombination rate estimation and enables the analysis of modern short-read datasets. We evaluated Rhometa over a broad range of sequencing depths and complexities, using simulated and real experimental short-read data aligned to external reference genomes. Rhometa offers a comprehensive solution for determining population recombination rates from contemporary metagenomic read datasets. Rhometa extends the capabilities of conventional sequence-based composite likelihood population recombination rate estimators to include modern aligned metagenomic read datasets with diverse sequencing depths, thereby enabling the effective application of these techniques and their high accuracy rates to the field of metagenomics. Using simulated datasets, we show that our method performs well, with its accuracy improving with increasing numbers of genomes. Rhometa was validated on a real S. pneumoniae transformation experiment, where we show that it obtains plausible estimates of the rate of recombination. Finally, the program was also run on ocean surface water metagenomic datasets, through which we demonstrate that the program works on uncultured metagenomic datasets.


Subject(s)
Metagenome , Metagenomics , Metagenomics/methods , Metagenome/genetics , Sequence Analysis, DNA/methods , Likelihood Functions , High-Throughput Nucleotide Sequencing/methods , Software , Recombination, Genetic/genetics , Algorithms
19.
Proc Natl Acad Sci U S A ; 120(20): e2219816120, 2023 05 16.
Article in English | MEDLINE | ID: mdl-37159476

ABSTRACT

Current methods for near real-time estimation of effective reproduction numbers from surveillance data overlook mobility fluxes of infectors and susceptible individuals within a spatially connected network (the metapopulation). Exchanges of infections among different communities may thus be misrepresented unless explicitly measured and accounted for in the renewal equations. Here, we first derive the equations that include spatially explicit effective reproduction numbers, ℛk(t), in an arbitrary community k. These equations embed a suitable connection matrix blending mobility among connected communities and mobility-related containment measures. Then, we propose a tool to estimate, in a Bayesian framework involving particle filtering, the values of ℛk(t) maximizing a suitable likelihood function reproducing observed patterns of infections in space and time. We validate our tools against synthetic data and apply them to real COVID-19 epidemiological records in a severely affected and carefully monitored Italian region. Differences arising between connected and disconnected reproduction numbers (the latter being calculated with existing methods, to which our formulation reduces by setting mobility to zero) suggest that current standards may be improved in their estimation of disease transmission over time.


Subject(s)
COVID-19 , Humans , Basic Reproduction Number , Incidence , Bayes Theorem , COVID-19/epidemiology , Likelihood Functions
20.
Genet Epidemiol ; 48(6): 241-257, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38606643

ABSTRACT

Recent advancement in genome-wide association studies (GWAS) comes from not only increasingly larger sample sizes but also the shift in focus towards underrepresented populations. Multipopulation GWAS increase power to detect novel risk variants and improve fine-mapping resolution by leveraging evidence and differences in linkage disequilibrium (LD) from diverse populations. Here, we expand upon our previous approach for single-population fine-mapping through Joint Analysis of Marginal SNP Effects (JAM) to a multipopulation analysis (mJAM). Under the assumption that true causal variants are common across studies, we implement a hierarchical model framework that conditions on multiple SNPs while explicitly incorporating the different LD structures across populations. The mJAM framework can be used to first select index variants using the mJAM likelihood with different feature selection approaches. In addition, we present a novel approach leveraging the ideas of mediation to construct credible sets for these index variants. Construction of such credible sets can be performed given any existing index variants. We illustrate the implementation of the mJAM likelihood through two implementations: mJAM-SuSiE (a Bayesian approach) and mJAM-Forward selection. Through simulation studies based on realistic effect sizes and levels of LD, we demonstrated that mJAM performs well for constructing concise credible sets that include the underlying causal variants. In real data examples taken from the most recent multipopulation prostate cancer GWAS, we showed several practical advantages of mJAM over other existing multipopulation methods.


Subject(s)
Bayes Theorem , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Models, Genetic , Prostatic Neoplasms/genetics , Male , Likelihood Functions , Models, Statistical , Chromosome Mapping/methods , Chromosome Mapping/statistics & numerical data , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL