Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 126
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 110(1): 23-29, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36480927

RESUMEN

We present LDAK-GBAT, a tool for gene-based association testing using summary statistics from genome-wide association studies that is computationally efficient, produces well-calibrated p values, and is significantly more powerful than existing tools. LDAK-GBAT takes approximately 30 min to analyze imputed data (2.9M common, genic SNPs), requiring less than 10 Gb memory. It shows good control of type 1 error given an appropriate reference panel. Across 109 phenotypes (82 from the UK Biobank, 18 from the Million Veteran Program, and nine from the Psychiatric Genetics Consortium), LDAK-GBAT finds on average 19% (SE: 1%) more significant genes than the existing tool sumFREGAT-ACAT, with even greater gains in comparison with MAGMA, GCTA-fastBAT, sumFREGAT-SKAT-O, and sumFREGAT-PCA.


Asunto(s)
Pruebas Genéticas , Estudio de Asociación del Genoma Completo , Fenotipo , Polimorfismo de Nucleótido Simple/genética
2.
PLoS Genet ; 19(1): e1010054, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36656906

RESUMEN

We introduce a fast, new algorithm for inferring from allele count data the FST parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to FST values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining FST, via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of FST values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.


Asunto(s)
Algoritmos , Genética de Población , Humanos , Genotipo , Simulación por Computador , Alelos
3.
Am J Hum Genet ; 109(11): 2080-2087, 2022 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-36288729

RESUMEN

Genetic epilepsy with febrile seizures plus (GEFS+) is an autosomal dominant familial epilepsy syndrome characterized by distinctive phenotypic heterogeneity within families. The SCN1B c.363C>G (p.Cys121Trp) variant has been identified in independent, multi-generational families with GEFS+. Although the variant is present in population databases (at very low frequency), there is strong clinical, genetic, and functional evidence to support pathogenicity. Recurrent variants may be due to a founder event in which the variant has been inherited from a common ancestor. Here, we report evidence of a single founder event giving rise to the SCN1B c.363C>G variant in 14 independent families with epilepsy. A common haplotype was observed in all families, and the age of the most recent common ancestor was estimated to be approximately 800 years ago. Analysis of UK Biobank whole-exome-sequencing data identified 74 individuals with the same variant. All individuals carried haplotypes matching the epilepsy-affected families, suggesting all instances of the variant derive from a single mutational event. This unusual finding of a variant causing an autosomal dominant, early-onset disease in an outbred population that has persisted over many generations can be attributed to the relatively mild phenotype in most carriers and incomplete penetrance. Founder events are well established in autosomal recessive and late-onset disorders but are rarely observed in early-onset, autosomal dominant diseases. These findings suggest variants present in the population at low frequencies should be considered potentially pathogenic in mild phenotypes with incomplete penetrance and may be more important contributors to the genetic landscape than previously thought.


Asunto(s)
Epilepsia , Convulsiones Febriles , Niño , Humanos , Linaje , Electroencefalografía , Convulsiones Febriles/genética , Fenotipo , Epilepsia/genética
4.
Bioessays ; 44(5): e2100170, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35279859

RESUMEN

Complex-trait genetics has advanced dramatically through methods to estimate the heritability tagged by SNPs, both genome-wide and in genomic regions of interest such as those defined by functional annotations. The models underlying many of these analyses are inadequate, and consequently many SNP-heritability results published to date are inaccurate. Here, we review the modelling issues, both for analyses based on individual genotype data and association test statistics, highlighting the role of a low-dimensional model for the heritability of each SNP. We use state-of-art models to present updated results about how heritability is distributed with respect to functional annotations in the human genome, and how it varies with allele frequency, which can reflect purifying selection. Our results give finer detail to the picture that has emerged in recent years of complex trait heritability widely dispersed across the genome. Confounding due to population structure remains a problem that summary statistic analyses cannot reliably overcome. Also see the video abstract here: https://youtu.be/WC2u03V65MQ.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo Heredable
5.
Genet Epidemiol ; 46(7): 347-371, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35842778

RESUMEN

The inclusion of ancestrally diverse participants in genetic studies can lead to new discoveries and is important to ensure equitable health care benefit from research advances. Here, members of the Ethical, Legal, Social, Implications (ELSI) committee of the International Genetic Epidemiology Society (IGES) offer perspectives on methods and analysis tools for the conduct of inclusive genetic epidemiology research, with a focus on admixed and ancestrally diverse populations in support of reproducible research practices. We emphasize the importance of distinguishing socially defined population categorizations from genetic ancestry in the design, analysis, reporting, and interpretation of genetic epidemiology research findings. Finally, we discuss the current state of genomic resources used in genetic association studies, functional interpretation, and clinical and public health translation of genomic findings with respect to diverse populations.


Asunto(s)
Genética de Población , Genómica , Estudios Epidemiológicos , Estudios de Asociación Genética , Humanos , Epidemiología Molecular
6.
Mol Biol Evol ; 39(4)2022 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-35460423

RESUMEN

Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. Although classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we introduce a novel statistical model, specific to admixed populations, that identifies loci under selection while determining whether the selection likely occurred post-admixture or prior to admixture in one of the ancestral source populations. Through extensive simulations, we show that this method is able to detect selection, even in recently formed admixed populations, and to accurately differentiate between selection occurring in the ancestral or admixed population. We apply this method to genome-wide SNP data of ∼4,000 individuals in five admixed Latin American cohorts from Brazil, Chile, Colombia, Mexico, and Peru. Our approach replicates previous reports of selection in the human leukocyte antigen region that are consistent with selection post-admixture. We also report novel signals of selection in genomic regions spanning 47 genes, reinforcing many of these signals with an alternative, commonly used local-ancestry-inference approach. These signals include several genes involved in immunity, which may reflect responses to endemic pathogens of the Americas and to the challenge of infectious disease brought by European contact. In addition, some of the strongest signals inferred to be under selection in the Native American ancestral groups of modern Latin Americans overlap with genes implicated in energy metabolism phenotypes, plausibly reflecting adaptations to novel dietary sources available in the Americas.


Asunto(s)
Genética de Población , Genoma Humano , Genómica/métodos , Hispánicos o Latinos/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
7.
PLoS Comput Biol ; 18(3): e1009960, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35263345

RESUMEN

We present a novel algorithm, implemented in the software ARGinfer, for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. Our Markov Chain Monte Carlo algorithm takes advantage of the Succinct Tree Sequence data structure that has allowed great advances in simulation and point estimation, but not yet probabilistic inference. Unlike previous methods, which employ the Sequentially Markov Coalescent approximation, ARGinfer uses the Coalescent with Recombination, allowing more accurate inference of key evolutionary parameters. We show using simulations that ARGinfer can accurately estimate many properties of the evolutionary history of the sample, including the topology and branch lengths of the genealogical tree at each sequence site, and the times and locations of mutation and recombination events. ARGinfer approximates posterior probability distributions for these and other quantities, providing interpretable assessments of uncertainty that we show to be well calibrated. ARGinfer is currently limited to tens of DNA sequences of several hundreds of kilobases, but has scope for further computational improvements to increase its applicability.


Asunto(s)
Modelos Genéticos , Programas Informáticos , Algoritmos , Teorema de Bayes , Cadenas de Markov , Filogenia , Recombinación Genética/genética
8.
Nat Rev Genet ; 16(1): 33-44, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25404112

RESUMEN

Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.


Asunto(s)
Variación Genética , Genética de Población/métodos , Modelos Genéticos , Linaje , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Simulación por Computador , Humanos
9.
PLoS Genet ; 14(11): e1007774, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30383746

RESUMEN

Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological sample, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded samples. In addition, biological samples such as fingernails, old bones, teeth and hair have mtDNA but little or no autosomal DNA. The relatively low mutation rate of the mitochondrial genome (mitogenome) means that there can be large sets of matrilineal-related individuals sharing a common mitogenome. Here we present the mitolina simulation software that we use to describe the distribution of the number of mitogenomes in a population that match a given mitogenome, and investigate its dependence on population size and growth rate, and on a database count of the mitogenome. Further, we report on the distribution of the number of meioses separating pairs of individuals with matching mitogenome. Our results have important implications for assessing the weight of mtDNA profile evidence in forensic science, but mtDNA analysis has many non-human applications, for example in tracking the source of ivory. Our methods and software can also be used for simulations to help validate models of population history in human or non-human populations.


Asunto(s)
ADN Mitocondrial/genética , Genoma Mitocondrial , Modelos Genéticos , Cromosomas Humanos Y/genética , Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Femenino , Genética Forense/estadística & datos numéricos , Variación Genética , Genética de Población , Haplotipos , Humanos , Irán , Masculino , Mutación , Programas Informáticos , Estados Unidos
10.
Genet Epidemiol ; 43(8): 930-940, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31541496

RESUMEN

Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome-wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was an inadequate adjustment for confounding in the original GWAS analysis. We also show that the choice of a summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic for test statistics from a meta-analysis of GWAS with different covariate adjustments.


Asunto(s)
Sesgo , Interpretación Estadística de Datos , Patrón de Herencia , Modelos Genéticos , Simulación por Computador , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple
11.
Genome Res ; 27(10): 1715-1729, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28864458

RESUMEN

Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10-16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.


Asunto(s)
Biología Computacional/métodos , Epilepsia/genética , Genómica/métodos , Variantes Farmacogenómicas , Medicina de Precisión/métodos , Epilepsia/diagnóstico , Humanos
12.
PLoS Genet ; 13(11): e1007028, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-29099833

RESUMEN

The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA. The problem of evaluating evidential weight is even more challenging for Y profiles than for autosomal profiles. Numerous approaches have been proposed, but they fail to deal adequately with the fact that men with matching Y-profiles are related in extended patrilineal clans, many of which may not be represented in available databases. The higher mutation rates of modern profiling kits have led to increased discriminatory power but they have also exacerbated the problem of fairly conveying evidential value. Because the relevant population is difficult to define, yet the number of matching relatives is fixed as population size varies, it is typically infeasible to derive population-based match probabilities relevant to a specific crime. We propose a conceptually simple solution, based on a simulation model and software to approximate the distribution of the number of males with a matching Y profile. We show that this distribution is robust to different values for the variance in reproductive success and the population growth rate. We also use importance sampling reweighting to derive the distribution of the number of matching males conditional on a database frequency, finding that this conditioning typically has only a modest impact. We illustrate the use of our approach to quantify the value of Y profile evidence for a court in a way that is both scientifically valid and easily comprehensible by a judge or juror.


Asunto(s)
Cromosomas Humanos Y/genética , ADN/genética , Dermatoglifia del ADN/métodos , Genética Forense/métodos , Humanos , Masculino , Probabilidad , Reproducción , Programas Informáticos
13.
J Math Biol ; 78(6): 1727-1769, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30734077

RESUMEN

In population genetics, the Dirichlet (also called the Balding-Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright-Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes-Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta-Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa-Kishino-Yano and Tamura-Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.


Asunto(s)
Alelos , Análisis de Datos , Genética de Población/métodos , Modelos Genéticos , Simulación por Computador , Conjuntos de Datos como Asunto , Humanos , Tasa de Mutación
14.
PLoS Genet ; 12(9): e1006288, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27589268

RESUMEN

The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.


Asunto(s)
Cruzamiento , Modelos Estadísticos , Sitios de Carácter Cuantitativo/genética , Selección Genética , Animales , Variación Genética , Genómica , Genotipo , Humanos , Ratones , Fenotipo , Polimorfismo de Nucleótido Simple
15.
Bioinformatics ; 33(8): 1246-1247, 2017 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-28003266

RESUMEN

Motivation: Sequencing pools of individuals (Pool-Seq) is a cost-effective way to gain insight into the genetics of complex traits, but as yet no parametric method has been developed to both test for genetic effects and estimate their magnitude. Here, we propose GWAlpha, a flexible method to obtain parametric estimates of genetic effects genome-wide from Pool-Seq experiments. Results: We showed that GWAlpha powerfully replicates the results of Genome-Wide Association Studies (GWAS) from model organisms. We perform simulation studies that illustrate the effect on power of sample size and number of pools and test the method on different experimental data. Availability and Implementation: GWAlpha is implemented in python, designed to run on Linux operating system and tested on Mac OS. It is freely available at https://github.com/aflevel/GWAlpha . Contact: afournier@unimelb.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Genoma , Fenotipo , Tamaño de la Muestra
17.
PLoS Genet ; 11(8): e1005397, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26291793

RESUMEN

The Ari peoples of Ethiopia are comprised of different occupational groups that can be distinguished genetically, with Ari Cultivators and the socially marginalised Ari Blacksmiths recently shown to have a similar level of genetic differentiation between them (FST ≈ 0.023 - 0.04) as that observed among multiple ethnic groups sampled throughout Ethiopia. Anthropologists have proposed two competing theories to explain the origins of the Ari Blacksmiths as (i) remnants of a population that inhabited Ethiopia prior to the arrival of agriculturists (e.g. Cultivators), or (ii) relatively recently related to the Cultivators but presently marginalized in the community due to their trade. Two recent studies by different groups analysed genome-wide DNA from samples of Ari Blacksmiths and Cultivators and suggested that genetic patterns between the two groups were more consistent with model (i) and subsequent assimilation of the indigenous peoples into the expanding agriculturalist community. We analysed the same samples using approaches designed to attenuate signals of genetic differentiation that are attributable to allelic drift within a population. By doing so, we provide evidence that the genetic differences between Ari Blacksmiths and Cultivators can be entirely explained by bottleneck effects consistent with hypothesis (ii). This finding serves as both a cautionary tale about interpreting results from unsupervised clustering algorithms, and suggests that social constructions are contributing directly to genetic differentiation over a relatively short time period among previously genetically similar groups.


Asunto(s)
Etnicidad/genética , Alelos , Análisis por Conglomerados , Cultura , Etiopía , Flujo Genético , Genética Médica , Humanos
18.
Hum Mutat ; 38(1): 78-85, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27650164

RESUMEN

The aryl hydrocarbon receptor interacting protein (AIP) founder mutation R304* (or p.R304* ; NM_003977.3:c.910C>T, p.Arg304Ter) identified in Northern Ireland (NI) predisposes to acromegaly/gigantism; its population health impact remains unexplored. We measured R304* carrier frequency in 936 Mid Ulster, 1,000 Greater Belfast (both in NI) and 2,094 Republic of Ireland (ROI) volunteers and in 116 NI or ROI acromegaly/gigantism patients. Carrier frequencies were 0.0064 in Mid Ulster (95%CI = 0.0027-0.013; P = 0.0005 vs. ROI), 0.001 in Greater Belfast (0.00011-0.0047) and zero in ROI (0-0.0014). R304* prevalence was elevated in acromegaly/gigantism patients in NI (11/87, 12.6%, P < 0.05), but not in ROI (2/29, 6.8%) versus non-Irish patients (0-2.41%). Haploblock conservation supported a common ancestor for all the 18 identified Irish pedigrees (81 carriers, 30 affected). Time to most recent common ancestor (tMRCA) was 2550 (1,275-5,000) years. tMRCA-based simulations predicted 432 (90-5,175) current carriers, including 86 affected (18-1,035) for 20% penetrance. In conclusion, R304* is frequent in Mid Ulster, resulting in numerous acromegaly/gigantism cases. tMRCA is consistent with historical/folklore accounts of Irish giants. Forward simulations predict many undetected carriers; geographically targeted population screening improves asymptomatic carrier identification, complementing clinical testing of patients/relatives. We generated disease awareness locally, necessary for early diagnosis and improved outcomes of AIP-related disease.


Asunto(s)
Acromegalia/epidemiología , Acromegalia/genética , Predisposición Genética a la Enfermedad , Gigantismo/epidemiología , Gigantismo/genética , Péptidos y Proteínas de Señalización Intracelular/genética , Acromegalia/diagnóstico , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Sustitución de Aminoácidos , Mapeo Cromosómico , Estudios Transversales , Femenino , Frecuencia de los Genes , Genotipo , Gigantismo/diagnóstico , Heterocigoto , Humanos , Irlanda/epidemiología , Masculino , Tamizaje Masivo , Persona de Mediana Edad , Fenotipo , Riesgo , Adulto Joven
19.
Genome Res ; 24(9): 1550-7, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24963154

RESUMEN

BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.


Asunto(s)
Algoritmos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Carácter Cuantitativo Heredable , Animales , Humanos , Ratones
20.
Stat Appl Genet Mol Biol ; 15(5): 431-445, 2016 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-27416618

RESUMEN

In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD. We apply it to 108 laboratory-generated crime-scene profiles and demonstrate techniques of model validation that are novel in the field. We use the results to explore the benefits of modeling peak heights, finding that it is not always advantageous, and to assess the merits of pre-extraction replication. We also introduce an approximation that can reduce computational complexity when there are multiple low-level contributors who are not of interest to the investigation, and we present a simple approximate adjustment for linkage between loci, making it possible to accommodate linkage when evaluating complex DNA profiles.


Asunto(s)
Dermatoglifia del ADN , ADN/genética , Genética Forense , Algoritmos , Alelos , Simulación por Computador , Dermatoglifia del ADN/métodos , Dermatoglifia del ADN/normas , Genética Forense/métodos , Genética Forense/normas , Ligamiento Genético , Humanos , Funciones de Verosimilitud , Modelos Genéticos , Modelos Estadísticos , Reproducibilidad de los Resultados , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA