RESUMEN
Mechanistic statistical models are commonly used to study the flow of biological processes. For example, in landscape genetics, the aim is to infer spatial mechanisms that govern gene flow in populations. Existing statistical approaches in landscape genetics do not account for temporal dependence in the data and may be computationally prohibitive. We infer mechanisms with a Bayesian hierarchical dyadic model that scales well with large data sets and that accounts for spatial and temporal dependence. We construct a fully connected network comprising spatio-temporal data for the dyadic model and use normalized composite likelihoods to account for the dependence structure in space and time. We develop a dyadic model to account for physical mechanisms commonly found in physical-statistical models and apply our methods to ancient human DNA data to infer the mechanisms that affected human movement in Bronze Age Europe.
Asunto(s)
Teorema de Bayes , Modelos Estadísticos , Análisis Espacio-Temporal , Humanos , Europa (Continente) , Flujo Génico , Funciones de Verosimilitud , Genética de Población/estadística & datos numéricos , Migración Humana/estadística & datos numéricos , ADN/genéticaRESUMEN
Estimation of admixture proportions has become one of the most commonly used computational tools in population genomics. However, there is remarkably little population genetic theory on statistical properties of these variables. We develop theoretical results that can accurately predict means and variances of admixture proportions within a population using models with recombination and genetic drift. Based on established theory on measures of multilocus disequilibrium, we show that there is a set of recurrence relations that can be used to derive expectations for higher moments of the admixture proportions distribution. We obtain closed form solutions for some special cases. Using these results, we develop a method for estimating admixture parameters from estimated admixture proportions obtained from programs such as Structure or Admixture. We apply this method to HapMap 3 data and find that the population history of African Americans, as expected, is not best explained by a single admixture event between people of European and African ancestry. The model of constant gene flow starting at 8 generations and ending at 2 generations before present gives the best fit.
Asunto(s)
Flujo Génico , Genética de Población , Desequilibrio de Ligamiento , Conceptos Matemáticos , Modelos Genéticos , Blanco , Humanos , Negro o Afroamericano/genética , Flujo Genético , Genética de Población/estadística & datos numéricos , Recombinación Genética , Blanco/genéticaRESUMEN
The embedding problem of Markov matrices in Markov semigroups is a classic problem that regained a lot of impetus and activities through recent needs in phylogeny and population genetics. Here, we give an account for dimensions d ⩽ 4 , including a complete and simplified treatment of the case d = 3 , and derive the results in a systematic fashion, with an eye on the potential applications. Further, we reconsider the setup of the corresponding problem for time-inhomogeneous Markov chains, which is needed for real-world applications because transition rates need not be constant over time. Additional cases of this more general embedding occur for any d ⩾ 3 . We review the known case of d = 3 and describe the setting for future work on d = 4 .
Asunto(s)
Cadenas de Markov , Conceptos Matemáticos , Filogenia , Genética de Población/estadística & datos numéricos , Genética de Población/métodos , Modelos Genéticos , HumanosRESUMEN
Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.
Asunto(s)
Pueblo Africano , Cromosomas Humanos Par 1 , Variación Genética , Genética de Población , Personas del Sur de Asia , Humanos , Pueblo Asiatico/genética , Evolución Biológica , Cromosomas Humanos Par 1/genética , Genómica , Genotipo , Genética de Población/métodos , Genética de Población/estadística & datos numéricos , Personas del Sur de Asia/genética , Pueblo Africano/genética , Variación Genética/genéticaRESUMEN
The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.
Asunto(s)
Inmunogenética/estadística & datos numéricos , Receptores KIR/genética , África Austral , Alelos , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Europa (Continente) , Dosificación de Gen , Genética de Población/estadística & datos numéricos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Polimorfismo Genético , Receptores KIR/clasificación , Alineación de Secuencia/estadística & datos numéricos , Diseño de SoftwareRESUMEN
Age-related clonal hematopoiesis (ARCH) is characterized by age-associated accumulation of somatic mutations in hematopoietic stem cells (HSCs) or their pluripotent descendants. HSCs harboring driver mutations will be positively selected and cells carrying these mutations will rise in frequency. While ARCH is a known risk factor for blood malignancies, such as Acute Myeloid Leukemia (AML), why some people who harbor ARCH driver mutations do not progress to AML remains unclear. Here, we model the interaction of positive and negative selection in deeply sequenced blood samples from individuals who subsequently progressed to AML, compared to healthy controls, using deep learning and population genetics. Our modeling allows us to discriminate amongst evolutionary classes with high accuracy and captures signatures of purifying selection in most individuals. Purifying selection, acting on benign or mildly damaging passenger mutations, appears to play a critical role in preventing disease-predisposing clones from rising to dominance and is associated with longer disease-free survival. Through exploring a range of evolutionary models, we show how different classes of selection shape clonal dynamics and health outcomes thus enabling us to better identify individuals at a high risk of malignancy.
Asunto(s)
Evolución Clonal , Hematopoyesis Clonal/genética , Células Madre Hematopoyéticas/metabolismo , Leucemia Mieloide/genética , Mutación , Enfermedad Aguda , Adulto , Anciano , Aprendizaje Profundo , Genética de Población/métodos , Genética de Población/estadística & datos numéricos , Células Madre Hematopoyéticas/citología , Humanos , Estimación de Kaplan-Meier , Leucemia Mieloide/patología , Persona de Mediana Edad , Modelos Genéticos , Evaluación de Resultado en la Atención de Salud/métodos , Evaluación de Resultado en la Atención de Salud/estadística & datos numéricosRESUMEN
Migration and admixture history of populations have always been curious and an interesting theme. The West Coast of India harbours a rich diversity, bestowing various ethno-linguistic groups, with many of them having well-documented history of migrations. The Roman Catholic is one such distinct group, whose origin was much debated. While some historians and anthropologists relating them to ancient group of Gaud Saraswat Brahmins, others relating them for being members of the Jews Lost Tribes in the first Century migration to India. Historical records suggests that this community was later forcibly converted to Christianity by the Portuguese in Goa during the Sixteenth Century. Till date, no genetic study was done on this group to infer their origin and genetic affinity. Hence, we analysed 110 Roman Catholics from three different locations of West Coast of India including Goa, Kumta and Mangalore using both uniparental and autosomal markers to understand their genetic history. We found that the Roman Catholics have close affinity with the Indo-European linguistic groups, particularly Brahmins. Additionally, we detected genetic signal of Jews in the linkage disequilibrium-based admixture analysis, which was absent in other Indo-European populations, who are inhabited in the same geographical regions. Haplotype-based analysis suggests that the Roman Catholics consist of South Asian-specific ancestry and showed high drift. Ancestry-specific historical population size estimation points to a possible bottleneck around the time of Goan inquisition (fifteenth century). Analysis of the Roman Catholics data along with ancient DNA data of Neolithic and bronze age revealed that the Roman Catholics fits well in a basic model of ancient ancestral composition, typical of most of the Indo-European caste groups of India. Mitochondrial DNA (mtDNA) analysis suggests that most of the Roman Catholics have aboriginal Indian maternal genetic ancestry; while the Y chromosomal DNA analysis indicates high frequency of R1a lineage, which is predominant in groups with higher ancestral North Indian (ANI) component. Therefore, we conclude that the Roman Catholics of Goa, Kumta and Mangalore regions are the remnants of very early lineages of Brahmin community of India, having Indo-Europeans genetic affinity along with cryptic Jewish admixture, which needs to be explored further.
Asunto(s)
Catolicismo , Etnicidad/genética , Evolución Molecular , Variación Genética , Genética de Población/estadística & datos numéricos , Geografía , Dinámica Poblacional , Etnicidad/estadística & datos numéricos , Europa (Continente) , Humanos , India , Judíos/genética , FilogeniaRESUMEN
Polygenic Risk Scores (PRS) for AD offer unique possibilities for reliable identification of individuals at high and low risk of AD. However, there is little agreement in the field as to what approach should be used for genetic risk score calculations, how to model the effect of APOE, what the optimal p-value threshold (pT) for SNP selection is and how to compare scores between studies and methods. We show that the best prediction accuracy is achieved with a model with two predictors (APOE and PRS excluding APOE region) with pT<0.1 for SNP selection. Prediction accuracy in a sample across different PRS approaches is similar, but individuals' scores and their associated ranking differ. We show that standardising PRS against the population mean, as opposed to the sample mean, makes the individuals' scores comparable between studies. Our work highlights the best strategies for polygenic profiling when assessing individuals for AD risk.
Asunto(s)
Enfermedad de Alzheimer/genética , Apolipoproteínas E/genética , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple , Alelos , Enfermedad de Alzheimer/diagnóstico , Estudios de Casos y Controles , Frecuencia de los Genes , Genética de Población/métodos , Genética de Población/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Genotipo , Humanos , Reproducibilidad de los Resultados , Medición de Riesgo/métodos , Medición de Riesgo/estadística & datos numéricos , Factores de Riesgo , Sensibilidad y EspecificidadRESUMEN
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Asunto(s)
Frecuencia de los Genes/genética , Genética de Población/estadística & datos numéricos , Modelos Genéticos , Mutación/genética , Animales , Variación Genética/genética , Genómica , Hominidae/genética , Humanos , Tasa de Mutación , Densidad de PoblaciónRESUMEN
Determining the number of contributors (NOC) accurately in a forensic DNA mixture profile can be challenging. To address this issue, there have been various studies that examined the uncertainty in estimating the NOC in a DNA mixture profile. However, the focus of these studies lies primarily on dominant populations residing within Europe and North America. Thus, there is limited representation of Asian populations in these studies. Further, the effects of allele dropout on the NOC estimation has not been explored. As such, this study assesses the uncertainty of NOC in simulated DNA mixture profiles of Chinese, Malay, and Indian populations, which are the predominant ethnic populations in Asia. The Caucasian ethnic population was also included to provide a basis of comparison with other similar studies. Our results showed that without considering allele dropout, the NOC from DNA mixture profiles derived from up to four contributors of the same ethnic population could be estimated with confidence in the Chinese, Malay, Indian and Caucasian populations. The same results can be observed on DNA mixture profiles originating from a combination of differing ethnic populations. The inclusion of an overall 30% allele dropout rate increased the probability (risk) of underestimating the NOC in a DNA mixture profile; even a 3-person DNA mixture profile has a > 99% risk of underestimating the NOC as two or fewer contributors. However, such risks could be mitigated when the highly polymorphic SE33 locus was included in the dataset. Lastly there was a negligible level of risk in misinterpreting the NOC in a mixture profile as deriving from a single source profile. In summary, our studies showcased novel results representative of the Chinese, Malay, and Indian ethnic populations when examining the uncertainty in NOC estimation in a DNA mixture profile. Our results would be useful in the estimation of NOC in a DNA mixture profile in the Asian context.
Asunto(s)
ADN/genética , Etnicidad/genética , Genética de Población/estadística & datos numéricos , Asia/epidemiología , China/epidemiología , Dermatoglifia del ADN/estadística & datos numéricos , Europa (Continente)/epidemiología , Humanos , India/epidemiología , Malaui/epidemiología , Repeticiones de Microsatélite/genética , Modelos Teóricos , América del Norte/epidemiología , Grupos de Población/genéticaRESUMEN
FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates.
Asunto(s)
Genética de Población/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Endogamia , Modelos Genéticos , Genotipo , Humanos , Linaje , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
We estimated HLA allele and haplotype frequencies of the Saudi Arabian population from a sample of 45,457 registered stem cell donors. The most frequent HLA alleles were A*02:01g (18.5%), C*06:02g (16.1%), B*51:01g (14.1%), DRB1*07:01g (16.2%), DQB1*02:01g (30.5%), and DPB1*04:01g (33.6%). The most frequent 5-locus haplotypes were A*02:05g~C*06:02g~B*50:01g~DRB1*07:01g~DQB1*02:01g (1.73%), A*02:01g~C*06:02g~B*50:01g~DRB1*07:01g~DQB1*02:01g (1.66%), and A*26:01g~C*07:02g~B*08:01g~DRB1*03:01g~DQB1*02:01g (1.38%). Furthermore, we used the calculated haplotype frequencies to estimate stem cell donor matching probabilities for Saudi Arabian donor and patient populations under various matching requirements. These results are relevant for strategic donor registry planning in the Kingdom of Saudi Arabia.
Asunto(s)
Selección de Donante/métodos , Antígenos HLA-D/genética , Trasplante de Células Madre Hematopoyéticas/métodos , Antígenos de Histocompatibilidad Clase I/genética , Alelos , Árabes/genética , Conjuntos de Datos como Asunto , Frecuencia de los Genes , Genética de Población/estadística & datos numéricos , Antígenos HLA-D/inmunología , Haplotipos , Antígenos de Histocompatibilidad Clase I/inmunología , Prueba de Histocompatibilidad , Humanos , Sistema de Registros/estadística & datos numéricos , Arabia Saudita , Donantes de TejidosRESUMEN
Several studies have shown that the Brazilian Northeast is a region with high rates of inbreeding as well as a high incidence of autosomal recessive diseases. The elaboration of public health policies focused on the epidemiological surveillance of congenital anomalies and rare genetic diseases in this region is urgently needed. However, the vast territory, socio-demographic heterogeneity, economic difficulties and low number of professionals with expertise in medical genetics make strategic planning a challenging task. Surnames can be compared to a genetic system with multiple neutral alleles and allow some approximation of population structure. Here, surname analysis of more than 37 million people was combined with health and socio-demographic indicators covering all 1794 municipalities of the nine states of the region. The data distribution showed a heterogeneous spatial pattern (Global Moran Index, GMI = 0.58; p < 0.001), with higher isonymy rates in the east of the region and the highest rates in the Quilombo dos Palmares region - the largest conglomerate of escaped slaves in Latin America. A positive correlation was found between the isonymy index and the frequency of live births with congenital anomalies (r = 0.268; p < 0.001), and the two indicators were spatially correlated (GMI = 0.50; p < 0.001). With this approach, quantitative information on the genetic structure of the Brazilian Northeast population was obtained, which may represent an economical and useful tool for decision-making in the medical field.
Asunto(s)
Genética Médica/estadística & datos numéricos , Genética de Población/estadística & datos numéricos , Nombres , Adolescente , Adulto , Anciano , Brasil , Femenino , Humanos , Masculino , Persona de Mediana Edad , Dinámica Poblacional , Adulto JovenRESUMEN
With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' http://clingen.igib.res.in/indigen/. The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.
Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano , Proyecto Genoma Humano , Programas Informáticos , Adulto , Exoma , Femenino , Genética de Población/estadística & datos numéricos , Humanos , India , Internet , Masculino , Anotación de Secuencia Molecular , Secuenciación Completa del GenomaRESUMEN
We investigated HLA class I (HLA-A, -B, and -C) and class II (HLA-DRB1, -DQB1, -DPA1, and -DPB1) alleles by NGS-based typing among 759 Brazilian individuals from three populations in the Rio de Janeiro city based on their self-declared skin color (Caucasian, N = 521, AFND-ID: 3730; Parda, N = 170, AFND-ID: 3728; Black, N = 68, AFND-ID: 3727) to calculate allelic and haplotypic frequencies, plus linkage disequilibrium. Only HLA-DRB1 locus deviated from Hardy-Weinberg equilibrium (in Caucasian and Black populations). The three populations shared the most frequent allele on HLA-A, -C, -DRB1, -DPA1, and -DPB1. Genotype and frequency data are available in the Allele Frequencies Net Database.
Asunto(s)
Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase I/genética , Alelos , Brasil , Frecuencia de los Genes , Genética de Población/estadística & datos numéricos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Desequilibrio de Ligamiento , Grupos de Población/genéticaRESUMEN
In this study, we report for the first time HLA allele and haplotype frequencies in the modern Panamanian population at a two-field (four digits) resolution level. Reported frequencies were calculated from genotype data for the HLA-A, -B, -C, -DPB1, -DQB1 and -DRB1 loci of 462 healthy unrelated Panamanian adults of Hispanic ethnicity. In addition to providing new insights on the allelic structure of the Panamanian population and its origin, these data are critical for better planning of healthcare strategies in the country and for future research exploring the association with certain chronic and infectious diseases.
Asunto(s)
Hispánicos o Latinos/genética , Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase I/genética , Adolescente , Adulto , Anciano , Alelos , Femenino , Frecuencia de los Genes , Genética de Población/estadística & datos numéricos , Haplotipos , Voluntarios Sanos , Humanos , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Panamá , Adulto JovenRESUMEN
We investigated HLA class I (HLA-A, -B, and -C) and class II (HLA-DRB1, -DQB1, -DPA1, and -DPB1) alleles by NGS-based typing among 478 Brazilian individuals from two populations in the Barra Mansa city based on their self-declared skin color (Caucasian, N = 405, AFND-ID: 3729; Black, N = 73, AFND-ID: 3731) to calculate allelic and haplotypic frequencies, plus linkage disequilibrium. No locus deviated from Hardy-Weinberg equilibrium. Both populations shared the most frequent allele on HLA-A, -C, -DPA1, and -DPB1. Genotype and frequency data are available in the Allele Frequencies Net Database.
Asunto(s)
Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase I/genética , Alelos , Brasil , Frecuencia de los Genes , Genética de Población/estadística & datos numéricos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Desequilibrio de Ligamiento , Grupos de Población/genéticaRESUMEN
Resources are rarely distributed uniformly within a population. Heterogeneity in the concentration of a drug, the quality of breeding sites, or wealth can all affect evolutionary dynamics. In this study, we represent a collection of properties affecting the fitness at a given location using a color. A green node is rich in resources while a red node is poorer. More colors can represent a broader spectrum of resource qualities. For a population evolving according to the birth-death Moran model, the first question we address is which structures, identified by graph connectivity and graph coloring, are evolutionarily equivalent. We prove that all properly two-colored, undirected, regular graphs are evolutionarily equivalent (where "properly colored" means that no two neighbors have the same color). We then compare the effects of background heterogeneity on properly two-colored graphs to those with alternative schemes in which the colors are permuted. Finally, we discuss dynamic coloring as a model for spatiotemporal resource fluctuations, and we illustrate that random dynamic colorings often diminish the effects of background heterogeneity relative to a proper two-coloring.
Asunto(s)
Evolución Biológica , Modelos Biológicos , Animales , Color , Biología Computacional , Gráficos por Computador , Simulación por Computador , Aptitud Genética , Genética de Población/estadística & datos numéricos , Humanos , Conceptos Matemáticos , Mutación , Dinámica Poblacional/estadística & datos numéricos , Probabilidad , Análisis Espacio-TemporalRESUMEN
BACKGROUND: Paeonia decomposita, endemic to China, has important ornamental, medicinal, and economic value and is regarded as an endangered plant. The genetic diversity and population structure have seldom been described. A conservation management plan is not currently available. RESULTS: In the present study, 16 pairs of simple sequence repeat (SSR) primers were used to evaluate the genetic diversity and population structure. A total of 122 alleles were obtained with a mean of 7.625 alleles per locus. The expected heterozygosity (He) varied from 0.043 to 0.901 (mean 0.492) in 16 primers. Moderate genetic diversity (He = 0.405) among populations was revealed, with Danba identified as the center of genetic diversity. Mantel tests revealed a positive correlation between geographic and genetic distance among populations (r = 0.592, P = 0.0001), demonstrating consistency with the isolation by distance model. Analysis of molecular variance (AMOVA) indicated that the principal molecular variance existed within populations (73.48%) rather than among populations (26.52%). Bayesian structure analysis and principal coordinate analysis (PCoA) supported the classification of the populations into three clusters. CONCLUSIONS: This is the first study of the genetic diversity and population structure of P. decomposita using SSR. Three management units were proposed as conservation measures. The results will be beneficial for the conservation and exploitation of the species, providing a theoretical basis for further research of its evolution and phylogeography.
Asunto(s)
Conservación de los Recursos Naturales , ADN de Plantas/genética , Especies en Peligro de Extinción/estadística & datos numéricos , Variación Genética , Genética de Población/estadística & datos numéricos , Paeonia/genética , Alelos , China , Pérdida de Heterocigocidad , Repeticiones de Microsatélite , Filogenia , FilogeografíaRESUMEN
Genetic surveillance of malaria parasites supports malaria control programmes, treatment guidelines and elimination strategies. Surveillance studies often pose questions about malaria parasite ancestry (e.g. how antimalarial resistance has spread) and employ statistical methods that characterise parasite population structure. Many of the methods used to characterise structure are unsupervised machine learning algorithms which depend on a genetic distance matrix, notably principal coordinates analysis (PCoA) and hierarchical agglomerative clustering (HAC). PCoA and HAC are sensitive to both the definition of genetic distance and algorithmic specification. Importantly, neither algorithm infers malaria parasite ancestry. As such, PCoA and HAC can inform (e.g. via exploratory data visualisation and hypothesis generation), but not answer comprehensively, key questions about malaria parasite ancestry. We illustrate the sensitivity of PCoA and HAC using 393 Plasmodium falciparum whole genome sequences collected from Cambodia and neighbouring regions (where antimalarial resistance has emerged and spread recently) and we provide tentative guidance for the use and interpretation of PCoA and HAC in malaria parasite genetic epidemiology. This guidance includes a call for fully transparent and reproducible analysis pipelines that feature (i) a clearly outlined scientific question; (ii) a clear justification of analytical methods used to answer the scientific question along with discussion of any inferential limitations; (iii) publicly available genetic distance matrices when downstream analyses depend on them; and (iv) sensitivity analyses. To bridge the inferential disconnect between the output of non-inferential unsupervised learning algorithms and the scientific questions of interest, tailor-made statistical models are needed to infer malaria parasite ancestry. In the absence of such models speculative reasoning should feature only as discussion but not as results.