RESUMO
Malaria parasites are haploid within humans, but infections often contain genetically distinct groups of clonal parasites. When the per-infection number of genetically distinct clones (i.e., the multiplicity of infection, MOI) exceeds one, and per-infection genetic data are generated in bulk, important information are obfuscated. For example, the MOI, the phases of the haploid genotypes of genetically distinct clones (i.e., how the alleles concatenate into sequences), and their frequencies. This complicates many downstream analyses, including relatedness estimation. MOIs, parasite sequences, their frequencies, and degrees of relatedness are used ubiquitously in malaria studies: for example, to monitor anti-malarial drug resistance and to track changes in transmission. In this article, MrsFreqPhase methods designed to estimate statistically malaria parasite MOI, relatedness, frequency and phase are reviewed. An overview, a historical account of the literature, and a statistical description of contemporary software is provided for each method class. The article ends with a look towards future method development, needed to make best use of new data types generated by cutting-edge malaria studies reliant on MrsFreqPhase methods.
Assuntos
Malária , Malária/parasitologia , Humanos , Plasmodium/genética , Plasmodium/classificaçãoRESUMO
A reliable pedigree serves as the backbone of genetic evolution in domesticated animals, providing guidance for daily management and breeding strategies. However, in commercial chicken breeding, pedigree errors and omissions are common. The large-scale application of genomic selection provides an opportunity to reconstruct chicken pedigrees using SNP markers. Here, to reconstruct pedigrees in chickens, we detected high-quality SNPs from 2866 parent-offspring pairs and calculated their genomic relationship and identity by descent (IBD). The results showed that the IBD values for parent-offspring pairs ranged from 0.48 to 0.58, clearly distinguishing them from nonparent-offspring pairs and demonstrating robustness in parentage assignment. In contrast, the genomic relatedness coefficients varied from 0.32 to 0.65. The accuracy of pedigree reconstruction significantly improved as the SNP number and minor allele frequency (MAF) increased. When the number of SNPs exceeded 200, better inference power was exhibited with IBD than with genomic relatedness. Upon reaching an effective SNP quantity of 350, despite a MAF of 0.01, the accuracy of the pedigrees inferred reached a remarkable level of 99%. Furthermore, with a doubled SNP quantity of 700 and a MAF of 0.05, the accuracy increased to a perfect 100%. This study demonstrated the feasibility of accurately constructing pedigrees in chickens using low-density SNP markers and emphasized the importance of considering the number and MAFs of these markers to achieve optimal outcomes. The adoption of the IBD as a suitable metric for pedigree inference is promising for improving the efficiency and accuracy of genetic breeding programs. These findings are paramount for the development of cost-effective yet accurate parentage verification systems.
RESUMO
Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.
RESUMO
An array of microhaplotypes - small segments of ≤200 nucleotides with heterozygous multiple-SNP exhibiting multiple allelic combinations - were identified in the Plasmodium vivax genome by Siegel et al. Interestingly, the microhaplotype has significant potential to distinguish relapse/reinfection and identify genetic relatedness across vivax-endemic areas. It is essential to validate the universal applicability of microhaplotypes.
RESUMO
Genetic genealogy provides crucial insights into the complex biological relationships within contemporary and ancient human populations by analyzing shared alleles and chromosomal segments that are identical by descent to understand kinship, migration patterns, and population dynamics. Within forensic science, forensic investigative genetic genealogy (FIGG) has gained prominence by leveraging next-generation sequencing technologies and population-specific genomic resources, opening new investigative avenues. In this review, we synthesize current knowledge, underscore recent advancements, and discuss the growing role of FIGG in forensic genomics. FIGG has been pivotal in revitalizing dormant inquiries and offering new genetic leads in numerous cold cases. Its effectiveness relies on the extensive single-nucleotide polymorphism profiles contributed by individuals from diverse populations to specialized genomic databases. Advances in computational genomics and the growth of human genomic databases have spurred a profound shift in the application of genetic genealogy across forensics, anthropology, and ancient DNA studies. As the field progresses, FIGG is evolving from a nascent practice into a more sophisticated and specialized discipline, shaping the future of forensic investigations.
RESUMO
The coalescent is a stochastic process representing ancestral lineages in a population undergoing neutral genetic drift. Originally defined for a well-mixed population, the coalescent has been adapted in various ways to accommodate spatial, age, and class structure, along with other features of real-world populations. To further extend the range of population structures to which coalescent theory applies, we formulate a coalescent process for a broad class of neutral drift models with arbitrary - but fixed - spatial, age, sex, and class structure, haploid or diploid genetics, and any fixed mating pattern. Here, the coalescent is represented as a random sequence of mappings [Formula: see text] from a finite set G to itself. The set G represents the "sites" (in individuals, in particular locations and/or classes) at which these alleles can live. The state of the coalescent, Ct:GâG, maps each site g∈G to the site containing g's ancestor, t time-steps into the past. Using this representation, we define and analyze coalescence time, coalescence branch length, mutations prior to coalescence, and stationary probabilities of identity-by-descent and identity-by-state. For low mutation, we provide a recipe for computing identity-by-descent and identity-by-state probabilities via the coalescent. Applying our results to a diploid population with arbitrary sex ratio r, we find that measures of genetic dissimilarity, among any set of sites, are scaled by 4r(1-r) relative to the even sex ratio case.
Assuntos
Deriva Genética , Genética Populacional , Modelos Genéticos , Mutação , Processos Estocásticos , Humanos , DiploideRESUMO
The case report by Mabry et al. (1970) of a family with four children with elevated tissue non-specific alkaline phosphatase, seizures and profound developmental disability, became the basis for phenotyping children with the features that became known as Mabry syndrome. Aside from improvements in the services available to patients and families, however, the diagnosis and treatment of this, and many other developmental disabilities, did not change significantly until the advent of massively parallel sequencing. As more patients with features of the Mabry syndrome were identified, exome and genome sequencing were used to identify the glycophosphatidylinositol (GPI) biosynthesis disorders (GPIBDs) as a group of congenital disorders of glycosylation (CDG). Biallelic variants of the phosphatidylinositol glycan (PIG) biosynthesis, type V (PIGV) gene identified in Mabry syndrome became evidence of the first in a phenotypic series that is numbered HPMRS1-6 in the order of discovery. HPMRS1 [MIM: 239300] is the phenotype resulting from inheritance of biallelic PIGV variants. Similarly, HPMRS2 (MIM 614749), HPMRS5 (MIM 616025) and HPMRS6 (MIM 616809) result from disruption of the PIGO, PIGW and PIGY genes expressed in the endoplasmic reticulum. By contrast, HPMRS3 (MIM 614207) and HPMRS4 (MIM 615716) result from disruption of post attachment to proteins PGAP2 (HPMRS3) and PGAP3 (HPMRS4). The GPI biosynthesis disorders (GPIBDs) are currently numbered GPIBD1-21. Working with Dr. Mabry, in 2020, we were able to use improved laboratory diagnostics to complete the molecular diagnosis of patients he had originally described in 1970. We identified biallelic variants of the PGAP2 gene in the first reported HPMRS patients. We discuss the longevity of the Mabry syndrome index patients in the context of the utility of pyridoxine treatment of seizures and evidence for putative glycolipid storage in patients with HPMRS3. From the perspective of the laboratory innovations made that enabled the identification of the HPMRS phenotype in Dr. Mabry's patients, the need for treatment innovations that will benefit patients and families affected by developmental disabilities is clear.
Assuntos
Defeitos Congênitos da Glicosilação , Deficiências do Desenvolvimento , Glicosilfosfatidilinositóis , Humanos , Deficiências do Desenvolvimento/genética , Glicosilfosfatidilinositóis/genética , Defeitos Congênitos da Glicosilação/genética , Fenótipo , Masculino , Mutação , Feminino , Proteínas de Membrana/genética , ManosiltransferasesRESUMO
Genomic surveillance is crucial for identifying at-risk populations for targeted malaria control and elimination. Identity-by-descent (IBD) is increasingly being used in Plasmodium population genomics to estimate genetic relatedness, effective population size (N e ), population structure, and signals of positive selection. Despite its potential, a thorough evaluation of IBD segment detection tools for species with high recombination rates, such as P. falciparum, remains absent. Here, we perform comprehensive benchmarking of IBD callers - probabilistic (hmmIBD, isoRelate), identity-by-state-based (hap-IBD, phased IBD) and others (Refined IBD) - using population genetic simulations tailored for high recombination, and IBD quality metrics at both the IBD segment level and the IBD-based downstream inference level. Our results demonstrate that low marker density per genetic unit, related to high recombination relative to mutation, significantly compromises the accuracy of detected IBD segments. In genomes with high recombination rates resembling P. falciparum, most IBD callers exhibit high false negative rates for shorter IBD segments, which can be partially mitigated through optimization of IBD caller parameters, especially those related to marker density. Notably, IBD detected with optimized parameters allows for more accurate capture of selection signals and population structure; IBD-based N e inference is very sensitive to IBD detection errors, with IBD called from hmmIBD uniquely providing less biased estimates of N e in this context. Validation with empirical data from the MalariaGEN Pf 7 database, representing different transmission settings, corroborates these findings. We conclude that context-specific evaluation and parameter optimization are essential for accurate IBD detection in high-recombining species and recommend hmmIBD for quality-sensitive analysis, such as estimation of N e in these species. Our optimization and high-level benchmarking methods not only improve IBD segment detection in high-recombining genomes but also enhance overall genomic analysis, paving the way for more accurate genomic surveillance and targeted intervention strategies for malaria.
RESUMO
Chromosomal microarrays (CMA) incorporate single nucleotide polymorphisms to enable the detection of regions of homozygosity (ROH). Here, we retrospectively analyzed 6288 prenatal cases who performed CMA to explored the clinical implications of large ROH in prenatal diagnosis. We analyzed cases with ROH larger than 10 megabases and reviewed the ultrasound findings; karyotype results and pregnancy follow-up data. Cases with possible imprinting disorders were assessed by methylation-specific multiplex ligation-dependent probe amplification. In total, we identified 50 cases with large ROH and chromosomes 1 and 2 were the most affected. About 59.18% of the ROH cases had ultrasound abnormalities, with the most common findings being ultrasound soft-marker abnormalities. There were seven fetuses had ROH which covered almost the entire chromosome and four had terminal ROH that involved almost the entire long arm of the chromosomes, which indicated uniparental disomy (UPD), of which 70% showed abnormal ultrasound findings. Ten cases with multiple ROH on different chromosomes indicated the third to fifth degree of consanguinity. In this study, we highlighted the clinical relevance of large ROH related to UPD. The analysis of ROH allowed us to gain further understanding of complex cytogenetic and disease mechanisms in prenatal diagnosis.
Assuntos
Homozigoto , Diagnóstico Pré-Natal , Dissomia Uniparental , Humanos , Feminino , Gravidez , Diagnóstico Pré-Natal/métodos , Dissomia Uniparental/genética , Dissomia Uniparental/diagnóstico , Estudos Retrospectivos , Polimorfismo de Nucleotídeo Único/genética , Impressão Genômica/genética , AdultoRESUMO
Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.
Assuntos
Povo Asiático , Genoma Humano , Humanos , Povo Asiático/genética , Genoma Humano/genética , Linhagem , Projetos de Pesquisa , ChinaRESUMO
Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most non-model wild populations. Specifically, large-sized or panmictic populations may be characterized by low variance in genetic relatedness among individuals which, in turn, can prevent accurate estimation of quantitative genetic parameters. We used estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci to estimate quantitative genetic parameters for ecologically important traits in nine-spined sticklebacks (Pungitius pungitius) from a large, outbred population. Using empirical and simulated datasets, with varying sample sizes and pedigree complexity, we assessed the performance of different crossing schemes in estimating additive genetic variance and heritability for all traits. We found that low variance in relatedness characteristic of wild outbred populations with high migration rate can impair the estimation of quantitative genetic parameters and bias heritability estimates downwards. On the other hand, the use of a half-sib/full-sib design allowed precise estimation of genetic variance components and revealed significant additive variance and heritability for all measured traits, with negligible dominance contributions. Genome-partitioning and QTL mapping analyses revealed that most traits had a polygenic basis and were controlled by genes at multiple chromosomes. Furthermore, different QTL contributed to variation in the same traits in different populations suggesting heterogeneous underpinnings of parallel evolution at the phenotypic level. Our results provide important guidelines for future studies aimed at estimating adaptive potential in the wild, particularly for those conducted in outbred large-sized populations.
Assuntos
Genoma , Herança Multifatorial , Humanos , Genoma/genética , Mapeamento Cromossômico , Fenótipo , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
RESUMO
Runs of homozygosity (ROH) and identity-by-descent (IBD) sharing can be studied in diploid coalescent models by noting that ROH and IBD-sharing at a genomic site are predicted to be inversely related to coalescence times-which in turn can be mathematically obtained in terms of parameters describing consanguinity rates. Comparing autosomal and X-chromosomal coalescent models, we consider ROH and IBD-sharing in relation to consanguinity that proceeds via multiple forms of first-cousin mating. We predict that across populations with different levels of consanguinity, (1) in a manner that is qualitatively parallel to the increase of autosomal IBD-sharing with autosomal ROH, X-chromosomal IBD-sharing increases with X-chromosomal ROH, owing to the dependence of both quantities on consanguinity levels; (2) even in the absence of consanguinity, X-chromosomal ROH and IBD-sharing levels exceed corresponding values for the autosomes, owing to the smaller population size and lower coalescence time for the X chromosome than for autosomes; (3) with matrilateral consanguinity, the relative increase in ROH and IBD-sharing on the X chromosome compared to the autosomes is greater than in the absence of consanguinity. Examining genome-wide SNPs in human populations for which consanguinity levels have been estimated, we find that autosomal and X-chromosomal ROH and IBD-sharing levels generally accord with the predictions. We find that each 1% increase in autosomal ROH is associated with an increase of 2.1% in X-chromosomal ROH, and each 1% increase in autosomal IBD-sharing is associated with an increase of 1.6% in X-chromosomal IBD-sharing. For each calculation, particularly for ROH, the estimate is reasonably close to the increase of 2% predicted by the population-size difference between autosomes and X chromosomes. The results support the utility of coalescent models for understanding patterns of genomic sharing and their dependence on sex-biased processes.
Assuntos
Genoma , Genômica , Humanos , Consanguinidade , Homozigoto , Cromossomo X , Polimorfismo de Nucleotídeo Único , EndogamiaRESUMO
[This corrects the article DOI: 10.3389/fgene.2022.1028662.].
RESUMO
Kinship testing is widely needed in forensic science practice. This paper reviews the definitions of common concepts, and summarizes the basic principles, advantages and disadvantages, and application scope of kinship analysis methods, including identity by state (IBS) method, likelihood ratio (LR) method, method of moment (MoM), and identity by descent (IBD) segment method. This paper also discusses the research hotspots of challenging kinship testing, complex kinship testing, forensic genetic genealogy analysis, and non-human biological samples.
Assuntos
Impressões Digitais de DNA , Genética Forense , Genética Forense/métodos , Ciências Forenses , Linhagem , HumanosRESUMO
OBJECTIVES: To calculate the likelihood ratios of incest cases using identity by descent (IBD) patterns. METHODS: The unique IBD pattern was formed by denoting the alleles from the members in a pedigree with a same digital. The probability of each IBD pattern was obtained by multiplying the prior probability by the frequency of non-IBD alleles. The pedigree likelihoods of incest cases under different hypotheses were obtained by summing all IBD pattern probabilities, and the likelihood ratio(LR) was calculated by comparing the likelihoods of different pedigrees. RESULTS: The IBD patterns and the formulae of calculating LR for father-daughter incest and brother-sister incest were obtained. CONCLUSIONS: The calculations of LR for incest cases were illustrated based on IBD patterns.
Assuntos
Incesto , Irmãos , Masculino , Humanos , ProbabilidadeRESUMO
Distant genetic relatives can be linked to a crime scene sample by computing identity-by-state (IBS) and identity-by-descent (IBD) shared by individuals. To test the methods of genetic genealogy estimation and optimal the parameters for forensic investigation, a family-based genetic genealogy analysis was performed using a dataset of 262 Han Chinese individuals from 11 families. The dataset covered relative pairs from 1st- to 14th degrees. But the 7th-degree relative is the most distant kinship to be fully investigated, and each individual has â¼200 relatives within the 7th degree. The KING algorithm by calculating IBS and IBD statistics can correctly discriminate the first-degree relationships of monozygotic twin, parent-offspring and full sibling. The inferred relationship was reliable within the fifth-degree, false positive rate <1.8%. The IBD segment algorithm, GERMLINE + ERSA, could provide reliable inference result prolonged to eighth degree. Analysis of IBD segments produced obviously false negative estimations (<27.4%) rather than false positives (0%) within the eighth-degree inferences. We studied different minimum IBD segment threshold settings (changed from >0 to 6 cM); the inferred results did not make much difference. In distant relative analysis, genetically undetectable relationships begin to occur from the sixth degree (second cousin once removed), which means the offspring after seven meiotic divisions may share no ancestor IBD segment at all. Application of KING and GERMLINE + ERSA worked complementarily to ensure accurate inference from first degree to eighth degree. Using simulated low call rate data, the KING algorithm shows better tolerance to marker decrease compared with the GERMLINE + ERSA segment algorithm.
Assuntos
População do Leste Asiático , Genética Forense , Polimorfismo de Nucleotídeo Único , Humanos , Algoritmos , LinhagemRESUMO
Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD). Yet, strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we utilized simulations, a true IBD inference algorithm, and empirical datasets from different malaria transmission settings to investigate the extent of such bias and explore potential correction strategies. We analyzed whole genome sequence data generated from 640 new and 4,026 publicly available Plasmodium falciparum clinical isolates. Our findings demonstrated that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discovered that the removal of IBD peak regions partially restored the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.
RESUMO
The effective size of a population (Ne) in the recent past can be estimated through analysis of identity-by-descent (IBD) segments. Several methods have been developed for estimating Ne from autosomal IBD segments, but no such effort has been made with X chromosome IBD segments. In this work, we propose a method to estimate the X chromosome effective population size from X chromosome IBD segments. We show how to use the estimated autosome Ne and X chromosome Ne to estimate the female and male effective population sizes. We demonstrate the accuracy of our autosome and X chromosome Ne estimation with simulated data. We find that the estimated female and male effective population sizes generally reflect the simulated sex-specific effective population sizes across the past 100 generations but that short-term differences between the estimated sex-specific Ne across tens of generations may not reliably indicate true sex-specific differences. We analyzed the effective size of populations represented by samples of sequenced UK White British and UK Indian individuals from the UK Biobank.
Assuntos
Genética Populacional , Cromossomo X , Humanos , Masculino , Feminino , Densidade DemográficaRESUMO
Several methods exist for detecting genetic relatedness or identity by comparing DNA information. These methods generally require genotype calls, either single-nucleotide polymorphisms or short tandem repeats, at the sites used for comparison. For some DNA samples, like those obtained from bone fragments or single rootless hairs, there is often not enough DNA present to generate genotype calls that are accurate and complete enough for these comparisons. Here, we describe IBDGem, a fast and robust computational procedure for detecting genomic regions of identity-by-descent by comparing low-coverage shotgun sequence data against genotype calls from a known query individual. At less than 1× genome coverage, IBDGem reliably detects segments of relatedness and can make high-confidence identity detections with as little as 0.01× genome coverage.