Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
G3 (Bethesda) ; 11(9)2021 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-34544142

RESUMEN

Eagle is an R package for multi-locus association mapping on a genome-wide scale. It is unlike other multi-locus packages in that it is easy to use for R users and non-users alike. It has two modes of use, command line and graphical user interface. Eagle is fully documented and has its own supporting website, http://eagle.r-forge.r-project.org/index.html. Eagle is a significant improvement over the method-of-choice, single-locus association mapping. It has greater power to detect SNP-trait associations. It is based on model selection, linear mixed models, and a clever idea on how random effects can be used to identify SNP-trait associations. Through an example with real mouse data, we demonstrate Eagle's ability to bring clarity and increased insight to single-locus findings. Initially, we see Eagle complementing single-locus analyses. However, over time, we hope the community will make, increasingly, multi-locus association mapping their method-of-choice for the analysis of genome-wide association study data.


Asunto(s)
Águilas , Estudio de Asociación del Genoma Completo , Animales , Mapeo Cromosómico , Genoma , Ratones , Polimorfismo de Nucleótido Simple
2.
Nat Genet ; 52(11): 1256-1264, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33128049

RESUMEN

Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.


Asunto(s)
Algoritmos , Cromosomas de las Plantas , Ligamiento Genético , Genoma de Planta , Poliploidía , Simulación por Computador , Genotipo , Ipomoea/genética , Fitomejoramiento , Poaceae/genética , Análisis por Matrices de Proteínas , Solanum tuberosum/genética
3.
Bioinformatics ; 36(5): 1509-1516, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31596455

RESUMEN

MOTIVATION: We present Eagle, a new method for multi-locus association mapping. The motivation for developing Eagle was to make multi-locus association mapping 'easy' and the method-of-choice. Eagle's strengths are that it (i) is considerably more powerful than single-locus association mapping, (ii) does not suffer from multiple testing issues, (iii) gives results that are immediately interpretable and (iv) has a computational footprint comparable to single-locus association mapping. RESULTS: By conducting a large simulation study, we will show that Eagle finds true and avoids false single-nucleotide polymorphism trait associations better than competing single- and multi-locus methods. We also analyze data from a published mouse study. Eagle found over 50% more validated findings than the state-of-the-art single-locus method. AVAILABILITY AND IMPLEMENTATION: Eagle has been implemented as an R package, with a browser-based Graphical User Interface for users less familiar with R. It is freely available via the CRAN website at https://cran.r-project.org. Videos, Quick Start guides, FAQs and Demos are available via the Eagle website http://eagle.r-forge.r-project.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Águilas , Animales , Genoma , Estudio de Asociación del Genoma Completo , Ratones , Polimorfismo de Nucleótido Simple , Programas Informáticos
4.
Front Genet ; 9: 237, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30023001

RESUMEN

The analysis of large genomic data is hampered by issues such as a small number of observations and a large number of predictive variables (commonly known as "large P small N"), high dimensionality or highly correlated data structures. Machine learning methods are renowned for dealing with these problems. To date machine learning methods have been applied in Genome-Wide Association Studies for identification of candidate genes, epistasis detection, gene network pathway analyses and genomic prediction of phenotypic values. However, the utility of two machine learning methods, Gradient Boosting Machine (GBM) and Extreme Gradient Boosting Method (XgBoost), in identifying a subset of SNP makers for genomic prediction of breeding values has never been explored before. In this study, using 38,082 SNP markers and body weight phenotypes from 2,093 Brahman cattle (1,097 bulls as a discovery population and 996 cows as a validation population), we examined the efficiency of three machine learning methods, namely Random Forests (RF), GBM and XgBoost, in (a) the identification of top 400, 1,000, and 3,000 ranked SNPs; (b) using the subsets of SNPs to construct genomic relationship matrices (GRMs) for the estimation of genomic breeding values (GEBVs). For comparison purposes, we also calculated the GEBVs from (1) 400, 1,000, and 3,000 SNPs that were randomly selected and evenly spaced across the genome, and (2) from all the SNPs. We found that RF and especially GBM are efficient methods in identifying a subset of SNPs with direct links to candidate genes affecting the growth trait. In comparison to the estimate of prediction accuracy of GEBVs from using all SNPs (0.43), the 3,000 top SNPs identified by RF (0.42) and GBM (0.46) had similar values to those of the whole SNP panel. The performance of the subsets of SNPs from RF and GBM was substantially better than that of evenly spaced subsets across the genome (0.18-0.29). Of the three methods, RF and GBM consistently outperformed the XgBoost in genomic prediction accuracy.

5.
Gates Open Res ; 2: 41, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-33062940

RESUMEN

Background: The chloroplast (cp) genome is an important resource for studying plant diversity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage (<1×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida. Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.

6.
Theor Appl Genet ; 128(6): 1163-74, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25800009

RESUMEN

KEY MESSAGE: We present new association mapping methods which address the unique challenges of analyzing genome-wide data from multi-environment plant studies. Association studies on a genome-wide scale are being performed in plants. Unlike human studies, plant studies contain replicates whose data may be recorded across different environments. Plant studies also often employ elaborate experimental designs for controlling extraneous phenotypic variation. As a result, the genome-wide analysis of data from plant studies can be challenging. In this paper, we present QK-based association mapping for the analysis of data from plant association studies. In doing so, we have developed: (a) a general multivariate QK framework for association mapping in plant studies of arbitrary complexity; (b) a new weighted two-stage analysis approach for QK-based association mapping; (c) a heuristic procedure for determining when two-stage analysis is appropriate; and (d) a Monte Carlo sampling procedure for controlling the genome-wide type I error rate. We conduct a simulation study to evaluate the performance of our genome-wide mapping technique. We also analyze data from a multi-environment association study in wheat.


Asunto(s)
Mapeo Cromosómico/métodos , Estudios de Asociación Genética , Modelos Genéticos , Plantas/genética , Simulación por Computador , Genoma de Planta , Genotipo , Modelos Lineales , Método de Montecarlo , Fenotipo , Sitios de Carácter Cuantitativo , Triticum/genética
7.
Theor Appl Genet ; 127(8): 1753-70, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24927820

RESUMEN

KEY MESSAGE: An efficient whole genome method of QTL analysis is presented for Multi-parent advanced generation integrated crosses. Multi-parent advanced generation inter-cross (MAGIC) populations have been developed for mice and several plant species and are useful for the genetic dissection of complex traits. The analysis of quantitative trait loci (QTL) in these populations presents some additional challenges compared with traditional mapping approaches. In particular, pedigree and marker information need to be integrated and founder genetic data needs to be incorporated into the analysis. Here, we present a method for QTL analysis that utilizes the probability of inheriting founder alleles across the whole genome simultaneously, either for intervals or markers. The probabilities can be found using three-point or Hidden Markov Model (HMM) methods. This whole-genome approach is evaluated in a simulation study and it is shown to be a powerful method of analysis. The HMM probabilities lead to low rates of false positives and low bias of estimated QTL effect sizes. An implementation of the approach is available as an R package. In addition, we illustrate the approach using a bread wheat MAGIC population.


Asunto(s)
Mapeo Cromosómico/métodos , Cruzamientos Genéticos , Genoma de Planta/genética , Sitios de Carácter Cuantitativo/genética , Triticum/genética , Animales , Cromosomas de las Plantas/genética , Simulación por Computador , Ligamiento Genético , Sitios Genéticos , Cadenas de Markov , Ratones , Probabilidad
8.
Plant Biotechnol J ; 10(7): 826-39, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22594629

RESUMEN

We present the first results from a novel multiparent advanced generation inter-cross (MAGIC) population derived from four elite wheat cultivars. The large size of this MAGIC population (1579 progeny), its diverse genetic composition and high levels of recombination all contribute to its value as a genetic resource. Applications of this resource include interrogation of the wheat genome and the analysis of gene-trait association in agronomically important wheat phenotypes. Here, we report the utilization of a MAGIC population for the first time for linkage map construction. We have constructed a linkage map with 1162 DArT, single nucleotide polymorphism and simple sequence repeat markers distributed across all 21 chromosomes. We benchmark this map against a high-density DArT consensus map created by integrating more than 100 biparental populations. The linkage map forms the basis for further exploration of the genetic architecture within the population, including characterization of linkage disequilibrium, founder contribution and inclusion of an alien introgression into the genetic map. Finally, we demonstrate the application of the resource for quantitative trait loci mapping using the complex traits plant height and hectolitre weight as a proof of principle.


Asunto(s)
Cruzamientos Genéticos , Triticum/genética , Mapeo Cromosómico , Cromosomas de las Plantas/genética , Marcadores Genéticos , Genética de Población , Genoma de Planta/genética , Endogamia , Desequilibrio de Ligamiento/genética , Modelos Genéticos , Sitios de Carácter Cuantitativo/genética , Recombinación Genética/genética , Reproducibilidad de los Resultados , Triticum/anatomía & histología
9.
Bioinformatics ; 27(5): 727-9, 2011 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-21217121

RESUMEN

UNLABELLED: Multiparent crosses of recombinant inbred lines provide opportunity to map markers and quantitative trait loci (QTL) with much greater resolution than is possible in biparental crosses. Realizing the full potential of these crosses requires computational tools capable of handling the increased statistical complexity of the analyses. R/mpMap provides a flexible and extensible environment, which interfaces easily with other packages to satisfy this demand. Functions in the package encompass simulation, marker map construction, haplotype reconstruction and QTL mapping. We demonstrate the easy-to-use features of mpMap through a simulated data example. AVAILABILITY: www.cmis.csiro.au/mpMap.


Asunto(s)
Mapeo Cromosómico/métodos , Cruzamientos Genéticos , Modelos Genéticos , Sitios de Carácter Cuantitativo , Programas Informáticos , Biología Computacional/métodos , Simulación por Computador , Marcadores Genéticos , Haplotipos , Funciones de Verosimilitud
10.
J Hered ; 101(4): 521-4, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20304977

RESUMEN

In polyploids, the copy number (or marker dosage) of a dominant marker is the number of copies of a dominant allele carried by a parent. It is estimated using a hypothesis testing procedure. This procedure, however, suffers from issues associated with multiple testing and assumes all chromosomes pair uniformly. In this paper, a new Bayesian approach is presented for estimating the copy number of a dominant marker in polyploids. By using a probability model that explicitly accounts for preferentially paired chromosomes, the Bayesian approach more closely reflects reality. Its superiority over the hypothesis testing procedure is evidenced through the analysis of simulated and sugarcane data. The Bayesian methodology described in this paper is implemented in a C program, bdose, which is freely available (see Supplementary Materials).


Asunto(s)
Teorema de Bayes , Dosificación de Gen/genética , Poliploidía , Algoritmos , Alelos , Cromosomas/genética , Genes Dominantes , Análisis de Regresión
11.
Theor Appl Genet ; 119(5): 899-911, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19585099

RESUMEN

In this paper, we present an innovative and powerful approach for mapping quantitative trait loci (QTL) in experimental populations. This deviates from the traditional approach of (composite) interval mapping which uses a QTL profile to simultaneously determine the number and location of QTL. Instead, we look before we leap by employing separate detection and localization stages. In the detection stage, we use an iterative variable selection process coupled with permutation to identify the number and synteny of QTL. In the localization stage, we position the detected QTL through a series of one-dimensional interval mapping scans. Results from a detailed simulation study and real analysis of wheat data are presented. We achieve impressive increases in the power of QTL detection compared to composite interval mapping. We also accurately estimate the size and position of QTL. An R library, DLMap, implements the methods described here and is freely available from CRAN ( http://cran.r-project.org/ ).


Asunto(s)
Mapeo Cromosómico/métodos , Sitios de Carácter Cuantitativo/genética , Triticum/genética , Cruzamientos Genéticos , Ligamiento Genético , Genoma de Planta/genética
12.
Theor Appl Genet ; 119(3): 483-96, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19449176

RESUMEN

Genetic studies in polyploid plants rely heavily on the collection of data from dominant marker loci. A dominant marker locus is a locus for which only the presence or absence of an observable (dominant) allele is recorded. Before these marker loci can be used for genetic exploration, the number of copies of a dominant allele carried by a parent (copy number) must be determined for each marker locus. Copy number in polyploids is estimated using a hypothesis testing procedure. The performance of this estimation procedure has never been evaluated. In this paper, I quantify whether the highly sought after single-copy markers can be accurately identified, if the performance of the estimation procedure improves with increasing sample size, and whether the estimation procedure is capable of accurately estimating the copy number of high copy markers. I found that the probability of incorrectly estimating copy number is quite low and that more data can actually reduce the accuracy of the estimation procedure when the testing assumptions are violated. Fortunately, when a significant result is obtained, it is almost always correct. The challenge often is in obtaining a significant result.


Asunto(s)
Dosificación de Gen , Genes de Plantas , Plantas/genética , Poliploidía , Alelos , Distribución de Chi-Cuadrado , Cromosomas de las Plantas , Genes Dominantes , Marcadores Genéticos , Modelos Genéticos , Modelos Estadísticos
13.
Genetics ; 171(2): 791-801, 2005 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-15965250

RESUMEN

Mapping markers from linkage data continues to be a task performed in many genetic epidemiological studies. Data collected in a study may be used to refine published map estimates and a study may use markers that do not appear in any published map. Furthermore, inaccuracies in meiotic maps can seriously bias linkage findings. To make best use of the available marker information, multilocus linkage analyses are performed. However, two computational issues greatly limit the number of markers currently mapped jointly; the number of candidate marker orders increases exponentially with marker number and computing exact multilocus likelihoods on general pedigrees is computationally demanding. In this article, a new Markov chain Monte Carlo (MCMC) approach that solves both these computational problems is presented. The MCMC approach allows many markers to be mapped jointly, using data observed on general pedigrees with unobserved individuals. The performance of the new mapping procedure is demonstrated through the analysis of simulated and real data. The MCMC procedure performs extremely well, even when there are millions of candidate orders, and gives results superior to those of CRI-MAP.


Asunto(s)
Mapeo Cromosómico/métodos , Marcadores Genéticos/genética , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo , Teorema de Bayes , Funciones de Verosimilitud , Linaje
14.
Hum Hered ; 59(2): 98-108, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15838179

RESUMEN

On extended pedigrees with extensive missing data, the calculation of multilocus likelihoods for linkage analysis is often beyond the computational bounds of exact methods. Growing interest therefore surrounds the implementation of Monte Carlo estimation methods. In this paper, we demonstrate the speed and accuracy of a new Markov chain Monte Carlo method for the estimation of linkage likelihoods through an analysis of real data from a study of early-onset Alzheimer's disease. For those data sets where comparison with exact analysis is possible, we achieved up to a 100-fold increase in speed. Our approach is implemented in the program lm_bayes within the framework of the freely available MORGAN 2.6 package for Monte Carlo genetic analysis (http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml).


Asunto(s)
Enfermedad de Alzheimer/genética , Escala de Lod , Cadenas de Markov , Método de Montecarlo , Ligamiento Genético , Marcadores Genéticos/genética , Humanos , Linaje , Programas Informáticos
15.
BMC Genet ; 6 Suppl 1: S141, 2005 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-16451601

RESUMEN

The Genetic Analysis Workshop 14 simulated data presents an interesting, challenging, and plausible example of a complex disease interaction in a dataset. This paper summarizes the ease of detection for each of the simulated Kofendrerd Personality Disorder (KPD) genes across all of the replicates for five standard linkage statistics. Using the KPD affection status, we have analyzed the microsatellite markers flanking each of the disease genes, plus an additional 2 markers that were not linked to any of the disease loci. All markers were analyzed using the following two-point linkage methods: 1) a MMLS, which is a standard admixture LOD score maximized over theta, alpha, and mode of inheritance, 2) a MLS calculated by GENEHUNTER, 3) the Kong and Cox LOD score as computed by MERLIN, 4) a MOD score (standard heterogeneity LOD maximized over theta, alpha, and a grid of genetic model parameters), and 5) the PPL, a Bayesian statistic that directly measures the strength of evidence for linkage to a marker. All of the major loci (D1-D4) were detectable with varying probabilities in the different populations. However, the modifier genes (D5 and D6) were difficult to detect, with similar distributions under the null and alternative across populations and statistics. The pooling of the four datasets in each replicate (n = 350 pedigrees) greatly improved the chance of detecting the major genes using all five methods, but failed to increase the chance to detect D5 and D6.


Asunto(s)
Mapeo Cromosómico/métodos , Enfermedad/genética , Repeticiones de Microsatélite/genética , Genética de Población , Humanos
16.
BMC Genet ; 6 Suppl 1: S44, 2005 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-16451655

RESUMEN

The calculation of multipoint likelihoods is computationally challenging, with the exact calculation of multipoint probabilities only possible on small pedigrees with many markers or large pedigrees with few markers. This paper explores the utility of calculating multipoint likelihoods using data on markers flanking a hypothesized position of the trait locus. The calculation of such likelihoods is often feasible, even on large pedigrees with missing data and complex structures. Performance characteristics of the flanking marker procedure are assessed through the calculation of multipoint heterogeneity LOD scores on data simulated for Genetic Analysis Workshop 14 (GAW14). Analysis is restricted to data on the Aipotu population on chromosomes 1, 3, and 4, where chromosomes 1 and 3 are known to contain disease loci. The flanking marker procedure performs well, even when missing data and genotyping errors are introduced.


Asunto(s)
Simulación por Computador , Bases de Datos Genéticas , Mapeo Cromosómico , Cromosomas Humanos Par 1/genética , Heterogeneidad Genética , Ligamiento Genético , Marcadores Genéticos , Genotipo , Humanos , Funciones de Verosimilitud , Escala de Lod , Sitios de Carácter Cuantitativo/genética , Programas Informáticos
17.
BMC Genet ; 4 Suppl 1: S71, 2003 Dec 31.
Artículo en Inglés | MEDLINE | ID: mdl-14975139

RESUMEN

Our Markov chain Monte Carlo (MCMC) methods were used in linkage analyses of the Framingham Heart Study data using all available pedigrees. Our goal was to detect and map loci associated with covariate-adjusted traits log triglyceride (lnTG) and high-density lipoprotein cholesterol (HDL) using multipoint LOD score analysis, Bayesian oligogenic linkage analysis and identity-by-descent (IBD) scoring methods. Each method used all marker data for all markers on a chromosome. Bayesian linkage analysis detected a linkage signal on chromosome 7 for lnTG and HDL, corroborating previously published results. However, these results were not replicated in a classical linkage analysis of the data or by using IBD scoring methods.We conclude that Bayesian linkage analysis provides a powerful paradigm for mapping trait loci but interpretation of the Bayesian linkage signals is subjective. In the absence of a LOD score method accommodating genetically complex traits and linkage heterogeneity, validation of these signals remains elusive.


Asunto(s)
Mapeo Cromosómico/estadística & datos numéricos , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo/genética , HDL-Colesterol/sangre , Cromosomas Humanos Par 7/genética , Femenino , Humanos , Desequilibrio de Ligamiento/genética , Masculino , Cadenas de Markov , Análisis por Apareamiento , Método de Montecarlo , Linaje , Hermanos , Triglicéridos/sangre
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA