RESUMO
Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.
Assuntos
Arabidopsis/genética , Biologia Computacional , Bases de Dados Genéticas , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Biologia Computacional/métodos , Técnicas de Inativação de Genes , Estudo de Associação Genômica Ampla/métodos , Genótipo , Mutação , Locos de Características Quantitativas , Característica Quantitativa Herdável , Análise de Sequência de RNA , NavegadorRESUMO
The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10-4, of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.
Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Interface Usuário-ComputadorRESUMO
Natural genetic variation makes it possible to discover evolutionary changes that have been maintained in a population because they are advantageous. To understand genotype-phenotype relationships and to investigate trait architecture, the existence of both high-resolution genotypic and phenotypic data is necessary. Arabidopsis thaliana is a prime model for these purposes. This herb naturally occurs across much of the Eurasian continent and North America. Thus, it is exposed to a wide range of environmental factors and has been subject to natural selection under distinct conditions. Full genome sequencing data for more than 1000 different natural inbred lines are available, and this has encouraged the distributed generation of many types of phenotypic data. To leverage these data for meta analyses, AraPheno (https://arapheno.1001genomes.org) provide a central repository of population-scale phenotypes for A. thaliana inbred lines. AraPheno includes various features to easily access, download and visualize the phenotypic data. This will facilitate a comparative analysis of the many different types of phenotypic data, which is the base to further enhance our understanding of the genotype-phenotype map.
Assuntos
Arabidopsis/genética , Arabidopsis/metabolismo , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Ferramenta de Busca , Sistemas de Gerenciamento de Base de Dados , Software , NavegadorRESUMO
Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, small size, short generation time, small genome size, and wide geographic distribution make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire A. thaliana community. To facilitate this, we present GWAPP, an interactive Web-based application for conducting GWAS in A. thaliana. Using an efficient implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with a mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and user-friendly interface that includes interactive Manhattan plots and linkage disequilibrium plots. It also facilitates exploratory data analysis by implementing features such as the inclusion of candidate polymorphisms in the model as cofactors.
Assuntos
Arabidopsis/genética , Estudo de Associação Genômica Ampla/métodos , Internet , Desequilíbrio de Ligação/genética , Software , Interface Usuário-ComputadorRESUMO
SUMMARY: We present JAWAMix5, an out-of-core open-source toolkit for association mapping using high-throughput sequence data. Taking advantage of its HDF5-based implementation, JAWAMix5 stores genotype data on disk and accesses them as though stored in main memory. Therefore, it offers a scalable and fast analysis without concerns about memory usage, whatever the size of the dataset. We have implemented eight functions for association studies, including standard methods (linear models, linear mixed models, rare variants test, analysis in nested association mapping design and local variance component analysis), as well as a novel Bayesian local variance component analysis. Application to real data demonstrates that JAWAMix5 is reasonably fast compared with traditional solutions that load the complete dataset into memory, and that the memory usage is efficient regardless of the dataset size. AVAILABILITY: The source code, a 'batteries-included' executable and user manual can be freely downloaded from http://code.google.com/p/jawamix5/.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Software , Teorema de Bayes , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Modelos LinearesRESUMO
Colonization of new habitats is expected to require genetic adaptations to overcome environmental challenges. Here, we use full genome re-sequencing and extensive common garden experiments to investigate demographic and selective processes associated with colonization of Japan by Lotus japonicus over the past ~20,000 years. Based on patterns of genomic variation, we infer the details of the colonization process where L. japonicus gradually spread from subtropical conditions to much colder climates in northern Japan. We identify genomic regions with extreme genetic differentiation between northern and southern subpopulations and perform population structure-corrected association mapping of phenotypic traits measured in a common garden. Comparing the results of these analyses, we find that signatures of extreme subpopulation differentiation overlap strongly with phenotype association signals for overwintering and flowering time traits. Our results provide evidence that these traits were direct targets of selection during colonization and point to associated candidate genes.
Assuntos
Aclimatação/genética , Lotus/genética , Evolução Biológica , Genes de Plantas/genética , Variação Genética , Genoma de Planta/genética , Estudo de Associação Genômica Ampla , Genótipo , Geografia , Japão , Lotus/crescimento & desenvolvimento , Lotus/fisiologia , Fenótipo , Seleção GenéticaRESUMO
Genome-wide association studies (GWAS) are an effective method for investigating the genetics of natural phenotypic variation in many different model organisms.Here we present GWA-Portal, an interactive web application that enables researchers to upload their phenotypes and easily carry out GWAS directly in the browser. We will present all the steps needed-from uploading the phenotype to interpreting the results-using a published root phenotype.
Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Software , Bases de Dados de Ácidos Nucleicos , Genética Populacional/métodos , Genômica/métodos , Plantas/genética , Interface Usuário-Computador , NavegadorRESUMO
Large-scale studies such as the Arabidopsis thaliana '1,001 Genomes' Project require routine genotyping of stocks to avoid sample contamination. To genotype samples efficiently and economically, sequencing must be inexpensive and data processing simple. Here we present SNPmatch, a tool that identifies strains (or inbred lines, or accessions) by matching them to a SNP database. We tested the tool by performing low-coverage resequencing of over 2,000 strains from our lab seed stock collection. SNPmatch correctly genotyped samples from 1-fold coverage sequencing data, and could also identify the parents of F1 or F2 individuals. SNPmatch can be run either on the command line or through AraGeno (https://arageno.gmi.oeaw.ac.at), a web interface that permits sample genotyping from a user-uploaded VCF or BED file.
Assuntos
Arabidopsis , Técnicas de Genotipagem , Arabidopsis/classificação , Arabidopsis/genética , Genoma de Planta , Análise de Sequência de DNARESUMO
Seed dormancy is a complex life history trait that determines the timing of germination and is crucial for local adaptation. Genetic studies of dormancy are challenging, because the trait is highly plastic and strongly influenced by the maternal environment. Using a combination of statistical and experimental approaches, we show that multiple alleles at the previously identified dormancy locus DELAY OF GERMINATION1 jointly explain as much as 57% of the variation observed in Swedish Arabidopsis thaliana, but give rise to spurious associations that seriously mislead genome-wide association studies unless modeled correctly. Field experiments confirm that the major alleles affect germination as well as survival under natural conditions, and demonstrate that locally adaptive traits can sometimes be dissected genetically.
Assuntos
Arabidopsis/genética , Arabidopsis/fisiologia , Variação Genética , Dormência de Plantas , Alelos , Estudo de Associação Genômica Ampla , SuéciaRESUMO
BACKGROUND: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. RESULTS: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. CONCLUSIONS: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.
RESUMO
Genome-wide association (GWA) mapping is a powerful technique to address the molecular basis of genotype to phenotype relationships and to map regulators of biological processes. This chapter presents a protocol for genome-wide association mapping in Arabidopsis thaliana using the user-friendly internet application GWAPP, and provides a specific protocol for acquiring root trait data suitable for GWA studies using the semi-automated, high-throughput phenotyping pipeline BRAT for early root growth.
Assuntos
Arabidopsis/genética , Mapeamento Cromossômico , Estudo de Associação Genômica Ampla , Raízes de Plantas/genética , Característica Quantitativa Herdável , Arabidopsis/crescimento & desenvolvimento , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Raízes de Plantas/crescimento & desenvolvimento , NavegadorRESUMO
Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition.
Assuntos
Arabidopsis/genética , Variação Genética , Genoma de Planta , Seleção Genética , Mapeamento Cromossômico , Cromossomos de Plantas , Variações do Número de Cópias de DNA , Evolução Molecular , Genética Populacional , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , SuéciaRESUMO
Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but they do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying new associations and evidence for allelic heterogeneity. We also show how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large data sets (n > 10,000) practicable.
Assuntos
Loci Gênicos , Genoma Humano , Genoma de Planta , Modelos Genéticos , Grupos Populacionais/genética , Arabidopsis/genética , Teorema de Bayes , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Desequilíbrio de Ligação , Simulação de Dinâmica Molecular , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/