RESUMO
A reliable pedigree serves as the backbone of genetic evolution in domesticated animals, providing guidance for daily management and breeding strategies. However, in commercial chicken breeding, pedigree errors and omissions are common. The large-scale application of genomic selection provides an opportunity to reconstruct chicken pedigrees using SNP markers. Here, to reconstruct pedigrees in chickens, we detected high-quality SNPs from 2866 parent-offspring pairs and calculated their genomic relationship and identity by descent (IBD). The results showed that the IBD values for parent-offspring pairs ranged from 0.48 to 0.58, clearly distinguishing them from nonparent-offspring pairs and demonstrating robustness in parentage assignment. In contrast, the genomic relatedness coefficients varied from 0.32 to 0.65. The accuracy of pedigree reconstruction significantly improved as the SNP number and minor allele frequency (MAF) increased. When the number of SNPs exceeded 200, better inference power was exhibited with IBD than with genomic relatedness. Upon reaching an effective SNP quantity of 350, despite a MAF of 0.01, the accuracy of the pedigrees inferred reached a remarkable level of 99%. Furthermore, with a doubled SNP quantity of 700 and a MAF of 0.05, the accuracy increased to a perfect 100%. This study demonstrated the feasibility of accurately constructing pedigrees in chickens using low-density SNP markers and emphasized the importance of considering the number and MAFs of these markers to achieve optimal outcomes. The adoption of the IBD as a suitable metric for pedigree inference is promising for improving the efficiency and accuracy of genetic breeding programs. These findings are paramount for the development of cost-effective yet accurate parentage verification systems.
RESUMO
One of the limitations of implementing animal breeding programs in small-scale or extensive production systems is the lack of production records and genealogical records. In this context, molecular markers could help to gain information for the breeding program. This study addresses the inclusion of molecular data into traditional genetic evaluation models as a random effect by molecular pedigree reconstruction and as a fixed effect by Bayesian clustering. The methods were tested for lactation curve traits in 14 dairy goat herds with incomplete phenotypic data and pedigree information. The results showed an increment of 37.3% of the relationships regarding the originals with MOLCOAN and clustering into five genetic groups. Data leads to estimating additive variance, error variance, and heritability with four different models, including pedigree and molecular information. Deviance Information Criterion (DIC) values demonstrate a greater fitting of the models that include molecular information either as fixed (genetic clusters) or as random (molecular matrix) effects. The molecular information of simple markers can complement genetic improvement strategies in populations with little information.
Assuntos
Cabras , Lactação , Feminino , Animais , Linhagem , Teorema de Bayes , Lactação/genética , Fenótipo , Cabras/genética , Modelos Genéticos , LeiteRESUMO
The conservation and management of wildlife populations, particularly for threatened and endangered species are greatly aided with abundance, growth rate, and density measures. Traditional methods of estimating abundance and related metrics represent trade-offs in effort and precision of estimates. Pedigree reconstruction is an emerging, attractive alternate approach because its use of one-time, noninvasive sampling of individuals to infer the existence of unsampled individuals. However, advances in pedigree reconstruction could improve its utility, including forming a measure of precision for the method, establishing required spatial sampling effort for accurate estimates, ascertaining the spatial extent of abundance estimates derived from pedigree reconstruction, and assessing how population density affects the estimator's performance. Using established relationships for a stochastic, spatially explicit simulated moose (Alces americanus) population, pedigree reconstruction provided accurate estimates of the adult moose population size and trend. Novel bootstrapped confidence intervals performed as expected with intensive sampling but underperformed with moderate sampling efforts that could produce abundance estimates with low bias. Adult population estimates more closely reflected the total number of adults in the extant population, rather than number of adults inhabiting the area where sampling occurred. Increasing sampling effort, measured as the proportion of individuals sampled and as the proportion of a hypothetical study area, yielded similar asymptotic patterns over time. Simulations indicated a positive relationship between animal density and sampling effort required for unbiased estimates. These results indicate that pedigree reconstruction can produce accurate abundance estimates and may be particularly valuable for surveying smaller areas and low-density populations.
RESUMO
Robust estimates of demographic parameters are critical for effective wildlife conservation and management but are difficult to obtain for elusive species. We estimated the breeding and adult population sizes, as well as the minimum population size, in a high-density brown bear population on the Shiretoko Peninsula, in Hokkaido, Japan, using DNA-based pedigree reconstruction. A total of 1288 individuals, collected in and around the Shiretoko Peninsula between 1998 and 2020, were genotyped at 21 microsatellite loci. Among them, 499 individuals were identified by intensive genetic sampling conducted in two consecutive years (2019 and 2020) mainly by noninvasive methods (e.g., hair and fecal DNA). Among them, both parents were assigned for 330 bears, and either maternity or paternity was assigned to 47 and 76 individuals, respectively. The subsequent pedigree reconstruction indicated a range of breeding and adult (≥4 years old) population sizes: 128-173 for female breeders and 66-91 male breeders, and 155-200 for female adults and 84-109 male adults. The minimum population size was estimated to be 449 (252 females and 197 males) in 2019. Long-term continuous genetic sampling prior to a short-term intensive survey would enable parentage to be identified in a population with a high probability, thus enabling reliable estimates of breeding population size for elusive species.
RESUMO
Western redcedar (WRC) is an ecologically and economically important forest tree species characterized by low genetic diversity with high self-compatibility and high heartwood durability. Using sequence capture genotyping of target genic and non-genic regions, we genotyped 44 parent trees and 1520 offspring trees representing 26 polycross (PX) families collected from three progeny test sites using 45,378 SNPs. Trees were phenotyped for eight traits related to growth, heartwood and foliar chemistry associated with wood durability and deer browse resistance. We used the genomic realized relationship matrix for paternity assignment, maternal pedigree correction, and to estimate genetic parameters. We compared genomics-based (GBLUP) and two pedigree-based (ABLUP: polycross and reconstructed full-sib [FS] pedigrees) models. Models were extended to estimate dominance genetic effects. Pedigree reconstruction revealed significant unequal male contribution and separated the 26 PX families into 438 FS families. Traditional maternal PX pedigree analysis resulted in up to 51% overestimation in genetic gain and 44% in diversity. Genomic analysis resulted in up to 22% improvement in offspring breeding value (BV) theoretical accuracy, 35% increase in expected genetic gain for forward selection, and doubled selection intensity for backward selection. Overall, all traits showed low to moderate heritability (0.09-0.28), moderate genotype by environment interaction (type-B genetic correlation: 0.51-0.80), low to high expected genetic gain (6.01%-55%), and no significant negative genetic correlation reflecting no large trade-offs for multi-trait selection. Only three traits showed a significant dominance effect. GBLUP resulted in smaller but more accurate heritability estimates for five traits, but larger estimates for the wood traits. Comparison between all, genic-coding, genic-non-coding and intergenic SNPs showed little difference in genetic estimates. In summary, we show that GBLUP overcomes the PX limitations, successfully captures expected historical and hidden relatedness as well as linkage disequilibrium (LD), and results in increased breeding efficiency in WRC.
RESUMO
Genealogical relationships are fundamental components of genetic studies. However, it is often challenging to infer correct and complete pedigrees even when genome-wide information is available. For example, inbreeding can obscure genetic differences between individuals, making it difficult to even distinguish first-degree relatives such as parent-offspring from full siblings. Similarly, genotyping errors can interfere with the detection of genetic similarity between parents and their offspring. Inbreeding is common in natural, domesticated, and experimental populations and genotyping of these populations often has more errors than in human data sets, so efficient methods for building pedigrees under these conditions are necessary. Here, we present a new method for parent-offspring inference in inbred pedigrees called specific parent-offspring relationship estimation (spore). spore is vastly superior to existing pedigree-inference methods at detecting parent-offspring relationships, in particular when inbreeding is high or in the presence of genotyping errors, or both. spore therefore fills an important void in the arsenal of pedigree inference tools.
Assuntos
Endogamia , Modelos Genéticos , Genoma , Humanos , LinhagemRESUMO
Along with rapid advances in high-throughput-sequencing technology, the development and application of molecular markers has been critical for the progress that has been made in crop breeding and genetic research. Desirable molecular markers should be able to rapidly genotype tens of thousands of breeding accessions with tens to hundreds of markers. In this study, we developed a multiplex molecular marker, the haplotype-tag polymorphism (HTP), that integrates Maize6H-60K array data from 3,587 maize inbred lines with 6,375 blocks from the recombination block map. After applying strict filtering criteria, we obtained 6,163 highly polymorphic HTPs, which were evenly distributed in the genome. Furthermore, we developed a genome-wide HTP analysis toolkit, HTPtools, which we used to establish an HTP database (HTPdb) covering the whole genomes of 3,587 maize inbred lines commonly used in breeding. A total of 172,921 non-redundant HTP allelic variations were obtained. Three major HTPtools modules combine seven algorithms (e.g., chain Bayes probability and the heterotic-pattern prediction algorithm) and a new plotting engine named "BCplot" that enables rapid visualization of the background information of multiple backcross groups. HTPtools was designed for big-data analyses such as complex pedigree reconstruction and maize heterotic-pattern prediction. The HTP-based analytical strategy and the toolkit developed in this study are applicable for high-throughput genotyping and for genetic mapping, germplasm resource analyses, and genomics-informed breeding in maize.
Assuntos
Polimorfismo de Nucleotídeo Único , Zea mays , Teorema de Bayes , Genômica , Haplótipos , Melhoramento Vegetal , Zea mays/genéticaRESUMO
Applications of genetic-based estimates of population size are expanding, especially for species for which traditional demographic estimation methods are intractable due to the rarity of adult encounters. Estimates of breeding population size (NS ) are particularly amenable to genetic-based approaches as the parameter can be estimated using pedigrees reconstructed from genetic data gathered from discrete juvenile cohorts, therefore eliminating the need to sample adults in the population. However, a critical evaluation of how genotyping and sampling effort influence bias in pedigree reconstruction, and how these biases subsequently influence estimates of NS , is needed to evaluate the efficacy of the approach under a range of scenarios. We simulated a model system to understand the interactive effects of genotyping and sampling effort on error in genetic pedigrees reconstructed from the program COLONY. We then evaluated how errors in pedigree reconstruction influenced bias and precision in estimates of NS using three different rarefaction estimators. Results indicated that pedigree error can be minimal when adequate genetic data are available, such as when juvenile sample sizes are large and/or individuals are genotyped at many informative loci. However, even in cases for which data are limited, using results of the simulation analysis to understand the magnitude and sources of bias in reconstructed pedigrees can still be informative when estimating NS . We applied results of the simulation analysis to evaluate N Ì $$ \hat{N} $$ S for a population of federally endangered Atlantic sturgeon (Acipenser oxyrinchus oxyrinchus) in the Delaware River, USA. Our results indicated that NS is likely to be three orders of magnitude lower compared with historic breeding population sizes, which is a considerable advancement in our understanding of the population status of Atlantic sturgeon in the Delaware River. Our analyses are broadly applicable in the design and interpretation of studies seeking to estimate NS and can help to guide conservation decisions when ecological uncertainty is high. The utility of these results is expected to grow as rapid advances in genetic technologies increase the popularity of genetic population monitoring and estimation.
Assuntos
Cruzamento , Genética Populacional , Animais , Viés , Peixes/genética , Humanos , Linhagem , Densidade DemográficaRESUMO
BACKGROUND: Larix kaempferi is one of the major timber species in Northeast Asia. Demand for the reforestation of the species is rising in South Korea due to an increase in large timber production and utilization. However, progeny trials for the species have not been explored, making it challenging to foster advanced generations of tree improvement. In the present study, genetic testing and selection for diameter growth were conducted using pedigree reconstruction and phenotypic spatial distribution analysis in a plantation of L. kaempferi. The aim of the present study was to select the superior larch individuals using the pedigree reconstruction and phenotypic spatial distribution to substitute progeny trials. The plantation of seed orchard crops was established in 1990 and one-hundred and eighty-eight trees were selected as the study material. Genetic variation was investigated first to validate its adequacy as breeding material. Genetic testing was carried out using a model considering pedigree information and spatial autoregression of the phenotypes. RESULTS: The expected heterozygosity of the mother trees and offspring were 0.672 and 0.681 presenting the corresponding level of genetic variation between two groups. The pedigree reconstruction using maternity analysis assigned one to six progenies to ninety-two candidate mothers. The accuracy of genetic testing was exceedingly increased with the animal model considering AR1 â AR1 structure compared to the animal model only. The estimated genetic variance of the former was 9.086 whereas that of the latter was 4.9E-5 for DBH. The predicted breeding values of the offspring for DBH were ranged from -5.937 cm to 5.655 cm and the estimated heritability of diameter growth was 0.344. CONCLUSIONS: The genetic testing approach based on pedigree reconstruction and phenotypic spatial distribution analysis was considered a useful analytical scheme that could replace or supplement progeny trials.
Assuntos
Larix , Testes Genéticos , Larix/classificação , Larix/genética , Fenótipo , Melhoramento Vegetal , Análise EspacialRESUMO
Despite decades of methods development for classifying relatives in genetic studies, pairwise relatedness methods' recalls are above 90% only for first through third-degree relatives. The top-performing approaches, which leverage identity-by-descent segments, often use only kinship coefficients, while others, including estimation of recent shared ancestry (ERSA), use the number of segments relatives share. To quantify the potential for using segment numbers in relatedness inference, we leveraged information theory measures to analyze exact (i.e. produced by a simulator) identity-by-descent segments from simulated relatives. Over a range of settings, we found that the mutual information between the relatives' degree of relatedness and a tuple of their kinship coefficient and segment number is on average 4.6% larger than between the degree and the kinship coefficient alone. We further evaluated identity-by-descent segment number utility by building a Bayes classifier to predict first through sixth-degree relationships using different feature sets. When trained and tested with exact segments, the inclusion of segment numbers improves the recall by between 0.28% and 3% for second through sixth-degree relatives. However, the recalls improve by less than 1.8% per degree when using inferred segments, suggesting limitations due to identity-by-descent detection accuracy. Last, we compared our Bayes classifier that includes segment numbers with both ERSA and IBIS and found comparable recalls, with the Bayes classifier and ERSA slightly outperforming each other across different degrees. Overall, this study shows that identity-by-descent segment numbers can improve relatedness inference, but errors from current SNP array-based detection methods yield dampened signals in practice.
Assuntos
Genoma Humano , Teoria da Informação , Teorema de Bayes , Humanos , Linhagem , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Pedigree inference from genotype data is a challenging problem, particularly when pedigrees are sparsely sampled and individuals may be distantly related to their closest genotyped relatives. We present a method that infers small pedigrees of close relatives and then assembles them into larger pedigrees. To assemble large pedigrees, we introduce several formulas and tools including a likelihood for the degree separating two small pedigrees, a generalization of the fast DRUID point estimate of the degree separating two pedigrees, a method for detecting individuals who share background identity-by-descent (IBD) that does not reflect recent common ancestry, and a method for identifying the ancestral branches through which distant relatives are connected. Our method also takes several approaches that help to improve the accuracy and efficiency of pedigree inference. In particular, we incorporate age information directly into the likelihood rather than using ages only for consistency checks and we employ a heuristic branch-and-bound-like approach to more efficiently explore the space of possible pedigrees. Together, these approaches make it possible to construct large pedigrees that are challenging or intractable for current inference methods.
Assuntos
Genótipo , Linhagem , Algoritmos , Feminino , Humanos , Funções Verossimilhança , Masculino , Modelos GenéticosRESUMO
In social species, reproductive success and rates of dispersal vary among individuals resulting in spatially structured populations. Network analyses of familial relationships may provide insights on how these parameters influence population-level demographic patterns. These methods, however, have rarely been applied to genetically derived pedigree data from wild populations.Here, we use parent-offspring relationships to construct familial networks from polygamous boreal woodland caribou (Rangifer tarandus caribou) in Saskatchewan, Canada, to inform recovery efforts. We collected samples from 933 individuals at 15 variable microsatellite loci along with caribou-specific primers for sex identification. Using network measures, we assess the contribution of individual caribou to the population with several centrality measures and then determine which measures are best suited to inform on the population demographic structure. We investigate the centrality of individuals from eighteen different local areas, along with the entire population.We found substantial differences in centrality of individuals in different local areas, that in turn contributed differently to the full network, highlighting the importance of analyzing networks at different scales. The full network revealed that boreal caribou in Saskatchewan form a complex, interconnected familial network, as the removal of edges with high betweenness did not result in distinct subgroups. Alpha, betweenness, and eccentricity centrality were the most informative measures to characterize the population demographic structure and for spatially identifying areas of highest fitness levels and family cohesion across the range. We found varied levels of dispersal, fitness, and cohesion in family groups. Synthesis and applications: Our results demonstrate the value of different network measures in assessing genetically derived familial networks. The spatial application of the familial networks identified individuals presenting different fitness levels, short- and long-distance dispersing ability across the range in support of population monitoring and recovery efforts.
RESUMO
The proportion of samples with one or more close relatives in a genetic dataset increases rapidly with sample size, necessitating relatedness modeling and enabling pedigree-based analyses. Despite this, relatives are generally unreported and current inference methods typically detect only the degree of relatedness of sample pairs and not pedigree relationships. We developed CREST, an accurate and fast method that identifies the pedigree relationships of close relatives. CREST utilizes identity by descent (IBD) segments shared between a pair of samples and their mutual relatives, leveraging the fact that sharing rates among these individuals differ across pedigree configurations. Furthermore, CREST exploits the profound differences in sex-specific genetic maps to classify pairs as maternally or paternally related-e.g., paternal half-siblings-using the locations of autosomal IBD segments shared between the pair. In simulated data, CREST correctly classifies 91.5%-100% of grandparent-grandchild (GP) pairs, 80.0%-97.5% of avuncular (AV) pairs, and 75.5%-98.5% of half-siblings (HS) pairs compared to PADRE's rates of 38.5%-76.0% of GP, 60.5%-92.0% of AV, 73.0%-95.0% of HS pairs. Turning to the real 20,032 sample Generation Scotland (GS) dataset, CREST identified seven pedigrees with incorrect relationship types or maternal/paternal parent sexes, five of which we confirmed as mistakes, and two with uncertain relationships. After correcting these, CREST correctly determines relationship types for 93.5% of GP, 97.7% of AV, and 92.2% of HS pairs that have sufficient mutual relative data; the parent sex in 100% of HS and 99.6% of GP pairs; and it completes this analysis in 2.8 h including IBD detection in eight threads.
Assuntos
Genoma Humano/genética , Feminino , Ligação Genética/genética , Genótipo , Humanos , Masculino , Modelos Genéticos , Linhagem , EscóciaRESUMO
BACKGROUND: Orang-utans comprise three critically endangered species endemic to the islands of Borneo and Sumatra. Though whole-genome sequencing has recently accelerated our understanding of their evolutionary history, the costs of implementing routine genome screening and diagnostics remain prohibitive. Capitalizing on a tri-fold locus discovery approach, combining data from published whole-genome sequences, novel whole-exome sequencing, and microarray-derived genotype data, we aimed to develop a highly informative gene-focused panel of targets that can be used to address a broad range of research questions. RESULTS: We identified and present genomic co-ordinates for 175,186 SNPs and 2315 Y-chromosomal targets, plus 185 genes either known or presumed to be pathogenic in cardiovascular (N = 109) or respiratory (N = 43) diseases in humans - the primary and secondary causes of captive orang-utan mortality - or a majority of other human diseases (N = 33). As proof of concept, we designed and synthesized 'SeqCap' hybrid capture probes for these targets, demonstrating cost-effective target enrichment and reduced-representation sequencing. CONCLUSIONS: Our targets are of broad utility in studies of orang-utan ancestry, admixture and disease susceptibility and aetiology, and thus are of value in addressing questions key to the survival of these species. To facilitate comparative analyses, these targets could now be standardized for future orang-utan population genomic studies. The targets are broadly compatible with commercial target enrichment platforms and can be utilized as published here to synthesize applicable probes.
Assuntos
Genômica , Pongo , Animais , Bornéu , Suscetibilidade a Doenças , Humanos , Indonésia , Pongo/genéticaRESUMO
White Guinea yam is mostly a dioecious outcrossing crop with male and female flowers produced on distinct plants. Fertile parents produce high fruit set in an open pollination polycross block, which is a cost-effective and convenient way of generating variability in yam breeding. However, the pollen parent of progeny from polycross mating is usually unknown. This study aimed to determine paternity in white Guinea yam half-sib progenies from polycross mating design. A total of 394 half-sib progenies from random open pollination involving nine female and three male parents was genotyped with 6602 SNP markers from DArTSeq platform to recover full pedigree. A higher proportion of expected heterozygosity, allelic richness, and evenness were observed in the half-sib progenies. A complete pedigree was established for all progenies from two families (TDr1685 and TDr1688) with 100% accuracy, while in the remaining families, paternity was assigned successfully only for 56 to 98% of the progenies. Our results indicated unequal paternal contribution under natural open pollination in yam, suggesting unequal pollen migrations or gene flow among the crossing parents. A total of 3.8% of progenies lacking paternal identity due to foreign pollen contamination outside the polycross block was observed. This study established the efficient determination of parental reconstruction and allelic contributions in the white Guinea yam half-sib progenies generated from open pollination polycross using SNP markers. Findings are useful for parental reconstruction, accurate dissection of the genetic effects, and selection in white Guinea yam breeding program utilizing polycross mating design.
RESUMO
Sustainable and efficient forestry in a rapidly changing climate is a daunting task. The sessile nature of trees makes adaptation to climate change challenging; thereby, ecological services and economic potential are under risk. Current long-term and costly gene resources management practices have been primarily directed at a few economically important species and are confined to defined ecological boundaries. Here, we present a novel in situ gene-resource management approach that conserves forest biodiversity and improves productivity and adaptation through utilizing basic forest regeneration installations located across a wide range of environments without reliance on structured tree breeding/conservation methods. We utilized 4,267 25- to 35-year-old European larch trees growing in 21 reforestation installations across four distinct climatic regions in Austria. With the aid of marker-based pedigree reconstruction, we applied multi-trait, multi-site quantitative genetic analyses that enabled the identification of broadly adapted and productive individuals. Height and wood density, proxies to fitness and productivity, yielded in situ heritability estimates of 0.23 ± 0.07 and 0.30 ± 0.07, values similar to those from traditional "structured" pedigrees methods. In addition, individual trees selected with this approach are expected to yield genetic response of 1.1 and 0.7 standard deviations for fitness and productivity attributes, respectively, and be broadly adapted to a range of climatic conditions. Genetic evaluation across broad climatic gradients permitted the delineation of suitable reforestation areas under current and future climates. This simple and resource-efficient management of gene resources is applicable to most tree species.
RESUMO
Genomic tools are lacking for invasive and native populations of sea lamprey (Petromyzon marinus). Our objective was to discover single nucleotide polymorphism (SNP) loci to conduct pedigree analyses to quantify reproductive contributions of adult sea lampreys and dispersion of sibling larval sea lampreys of different ages in Great Lakes tributaries. Additional applications of data were explored using additional geographically expansive samples. We used restriction site-associated DNA sequencing (RAD-Seq) to discover genetic variation in Duffins Creek (DC), Ontario, Canada, and the St. Clair River (SCR), Michigan, USA. We subsequently developed RAD capture baits to genotype 3,446 RAD loci that contained 11,970 SNPs. Based on RAD capture assays, estimates of variance in SNP allele frequency among five Great Lakes tributary populations (mean F ST 0.008; range 0.00-0.018) were concordant with previous microsatellite-based studies; however, outlier loci were identified that contributed substantially to spatial population genetic structure. At finer scales within streams, simulations indicated that accuracy in genetic pedigree reconstruction was high when 200 or 500 independent loci were used, even in situations of high spawner abundance (e.g., 1,000 adults). Based on empirical collections of larval sea lamprey genotypes, we found that age-1 and age-2 families of full and half-siblings were widely but nonrandomly distributed within stream reaches sampled. Using the genomic scale set of SNP loci developed in this study, biologists can rapidly genotype sea lamprey in non-native and native ranges to investigate questions pertaining to population structuring and reproductive ecology at previously unattainable scales.
RESUMO
A probabilistic reconstruction of genealogies in a polyploid population (from 2x to 4x) is investigated, by considering genetic data analyzed as the probability of allele presence in a given genotype. Based on the likelihood of all possible crossbreeding patterns, our model enables us to infer and to quantify the whole potential genealogies in the population. We explain in particular how to deal with the uncertain allelic multiplicity that may occur with polyploids. Then we build an ad hoc penalized likelihood to compare genealogies and to decide whether a particular individual brings sufficient information to be included in the taken genealogy. This decision criterion enables us in a next part to suggest a greedy algorithm in order to explore missing links and to rebuild some connections in the genealogies, retrospectively. As a by-product, we also give a way to infer the individuals that may have been favored by breeders over the years. In the last part we highlight the results given by our model and our algorithm, firstly on a simulated population and then on a real population of rose bushes. Most of the methodology relies on the maximum likelihood principle and on graph theory.
Assuntos
Genes de Plantas , Modelos Genéticos , Linhagem , Poliploidia , Rosa/genética , Algoritmos , Funções Verossimilhança , ProbabilidadeRESUMO
As genetic datasets increase in size, the fraction of samples with one or more close relatives grows rapidly, resulting in sets of mutually related individuals. We present DRUID-deep relatedness utilizing identity by descent-a method that works by inferring the identical-by-descent (IBD) sharing profile of an ungenotyped ancestor of a set of close relatives. Using this IBD profile, DRUID infers relatedness between unobserved ancestors and more distant relatives, thereby combining information from multiple samples to remove one or more generations between the deep relationships to be identified. DRUID constructs sets of close relatives by detecting full siblings and also uses an approach to identify the aunts/uncles of two or more siblings, recovering 92.2% of real aunts/uncles with zero false positives. In real and simulated data, DRUID correctly infers up to 10.5% more relatives than PADRE when using data from two sets of distantly related siblings, and 10.7%-31.3% more relatives given two sets of siblings and their aunts/uncles. DRUID frequently infers relationships either correctly or within one degree of the truth, with PADRE classifying 43.3%-58.3% of tenth degree relatives in this way compared to 79.6%-96.7% using DRUID.
Assuntos
Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Feminino , Genética Populacional/métodos , Humanos , Masculino , Linhagem , IrmãosRESUMO
Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified â¼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in â¼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.