RESUMO
Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence-absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.
Assuntos
Arabidopsis , Genômica , Genômica/métodos , Genoma de Planta , Análise de Sequência de DNA , Anotação de Sequência Molecular , Plantas/genética , Arabidopsis/genéticaRESUMO
Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection. Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.
Assuntos
Evolução Molecular , Mutação INDEL , Humanos , Íntrons , Genoma , GenômicaRESUMO
B chromosomes are enigmatic elements in thousands of plant and animal genomes that persist in populations despite being nonessential. They circumvent the laws of Mendelian inheritance but the molecular mechanisms underlying this behavior remain unknown. Here we present the sequence, annotation, and analysis of the maize B chromosome providing insight into its drive mechanism. The sequence assembly reveals detailed locations of the elements involved with the cis and trans functions of its drive mechanism, consisting of nondisjunction at the second pollen mitosis and preferential fertilization of the egg by the B-containing sperm. We identified 758 protein-coding genes in 125.9 Mb of B chromosome sequence, of which at least 88 are expressed. Our results demonstrate that transposable elements in the B chromosome are shared with the standard A chromosome set but multiple lines of evidence fail to detect a syntenic genic region in the A chromosomes, suggesting a distant origin. The current gene content is a result of continuous transfer from the A chromosomal complement over an extended evolutionary time with subsequent degradation but with selection for maintenance of this nonvital chromosome.
Assuntos
Cromossomos de Plantas/genética , Evolução Molecular , Pólen/genética , Proteínas da Gravidez/genética , Zea mays/genética , Meiose/genética , Mitose/genéticaRESUMO
OBJECTIVE: Substance abuse is common among patients with schizophrenia, is related to worse course and outcome of illness. Unfortunately, little is known about how substance abuse affects the cognitive function of schizophrenia patients, whose cognitive function is often already comprised. Neurocognitive functioning includes inhibition control and decision-making, and both schizophrenia and substance use disorder are related to impairments of inhibition control. However, the influence of substance abuse on inhibition capacities among schizophrenia patients is unclear. Methods: This study measured the influence of substance use disorder on inhibition capacities and risky decision-making in a group of 39 schizophrenia patients that were evaluated using a socio-demographic questionnaire and clinical assessment using the Positive and Negative Syndromes Scale for Schizophrenia. To assess inhibition control we utilized the Matching Familiar Figure Test (MFFT) and the Stroop task, and to evaluate decision-making we used the Iowa Gambling Task (IGT) and self-report questionnaire, the Barratt Impulsiveness Scale. Results: Univariate analysis found significant differences between the groups with regard to criminal history (χ2 = 5.97, p=.015), smoking status (χ2 = 12.30, p<.001), and total BIS score (t= -2.69, df = 37, p=.01). Our model did not find a significant effect of substance abuse on the first response time and number of errors on the MFFT or in the total interference index of Stroop performance and net score on risky decision-making in the IGT. The two groups did not differ significantly either in first response time or in number of errors on the MFFT (F = 0.54, p=.47, d = 0.24, 95% CI [-0.4, 0.88]; F = 0.28, p=.60, d = 0.61, 95% CI [0, 1.26], respectively), nor did they differ in the total interference index of the Stroop task (F(1)=0.49, p=.49, d = 0.25, 95% CI [-0.38, 0.88]). Conclusion: The analyses did not detect any statistically significant effect of substance abuse on inhibition control or risky decision-making processes in outpatients diagnosed with schizophrenia, despite increased impulsivity, criminal history and smoking status. These results neither support nor disprove previous findings.
Assuntos
Jogo de Azar , Esquizofrenia , Transtornos Relacionados ao Uso de Substâncias , Tomada de Decisões , Humanos , Testes Neuropsicológicos , Pacientes Ambulatoriais , Esquizofrenia/complicações , Transtornos Relacionados ao Uso de Substâncias/complicaçõesRESUMO
The study of intraspecific genomic variation in eukaryotic species has been the focus of numerous genome resequencing projects in recent years. One emerging approach for the analysis of intraspecific diversity uses the concept of a pan-genome, which theoretically represents the full set of genomic sequences and coding genes from all individuals of a given species. This approach has many advantages over reference-based methods and has been successfully applied to study both prokaryotic and eukaryotic species. However, the process of pan-genome construction still presents considerable scientific and technical challenges, especially for eukaryotic species with large and complex genomes. Although general approaches for the construction of pan-genomes have been devised, currently available software tools implement only certain modules of the entire computational procedure. Therefore, each pan-genome project requires the development of tailored analysis pipelines, thus complicating and prolonging the process and impairing research reproducibility and comparison across studies. Here, we present Panoramic, a software package for the automatic construction of eukaryotic pan-genomes. Panoramic takes raw sequencing reads as input and applies two alternative approaches for pan-genome construction. Panoramic makes pan-genome construction a considerably easier task by providing simple user interface and efficient data processing algorithms. We demonstrate the use of Panoramic by constructing the pan-genome of the model plant species Arabidopsis thaliana from sequencing data of 20 diverse ecotypes.
Assuntos
Eucariotos , Genoma , Genômica , Software , Eucariotos/genética , Reprodutibilidade dos TestesRESUMO
Deciphering the global distribution of polyploid plants is fundamental for understanding plant evolution and ecology. Many factors have been hypothesized to affect the uneven distribution of polyploid plants across the globe. Nevertheless, the lack of large comparative datasets has restricted such studies to local floras and to narrow taxonomical scopes, limiting our understanding of the underlying drivers of polyploid plant distribution. We present a map portraying the worldwide polyploid frequencies, based on extensive spatial data coupled with phylogeny-based polyploidy inference for tens of thousands of species. This allowed us to assess the potential global drivers affecting polyploid distribution. Our data reveal a clear latitudinal trend, with polyploid frequency increasing away from the equator. Climate, especially temperature, appears to be the most influential predictor of polyploid distribution. However, we find this effect to be mostly indirect, mediated predominantly by variation in plant lifeforms and, to a lesser extent, by taxonomical composition and species richness. Thus, our study presents an emerging view of polyploid distribution that highlights attributes that facilitate the establishment of new polyploid lineages by providing polyploids with sufficient time (that is, perenniality) and space (low species richness) to compete with pre-adapted diploid relatives.
Assuntos
Evolução Biológica , Filogeografia , Plantas/genética , Poliploidia , FlorestasRESUMO
Phylogeny reconstruction is a key instrument in numerous biological analyses, ranging from evolutionary and ecology research, to conservation and systems biology. The increasing accumulation of genomic data makes it possible to reconstruct phylogenies with both high accuracy and at increasingly finer resolution. Yet, taking advantage of the enormous amount of sequence data available requires the use of computational tools for efficient data retrieval and processing, or else the process could quickly become an error-prone endeavour. Here, we present OneTwoTree (http://onetwotree.tau.ac.il/), a Web-based tool for tree reconstruction based on the supermatrix paradigm. Given a list of taxa names of interest as the sole input requirement, OneTwoTree retrieves all available sequence data from NCBI GenBank, clusters these into orthology groups, identifies the most informative set of markers, searches for an appropriate outgroup, and assembles a partitioned sequence matrix that is then used for the final phylogeny reconstruction step. OneTwoTree further allows users to control various steps of the process, such as the merging of sequences from similar clusters, or phylogeny reconstruction based on markers from a specific genome type. By comparing the performance of OneTwoTree to a manually reconstructed phylogeny of the Antirrhineae tribe, we show that the use of OneTwoTree resulted in substantially higher data coverage in terms of both taxon sampling and the number of informative markers assembled. OneTwoTree provides a flexible online tool for species-tree reconstruction, aimed to assist researchers ranging in their level of prior expertise in the task of phylogeny reconstruction.
Assuntos
Biologia Computacional/métodos , Filogenia , Internet , Plantaginaceae/classificação , Plantaginaceae/genéticaRESUMO
PREMISE OF THE STUDY: Flowering plants display a variety of sexual systems, ranging from complete cosexuality (hermaphroditism) to separate-sexed individuals (dioecy). While dioecy is relatively rare, it has evolved many times and is present in many plant families. Transitions in sexual systems are hypothesized to be affected by large genomic events such as whole-genome duplication, or polyploidy, and several models have been proposed to explain the observed patterns of association. METHODS: In this study, we assessed the association between ploidy and sexual system (separate or combined sexes). To this end, we assembled a database of ploidy levels and sexual systems for â¼1000 species, spanning 18 genera and 15 families. We applied several phylogenetic comparative approaches, including Pagel's coevolutionary framework and sister clade analyses, for detecting correlations between ploidy level and sexual system. KEY RESULTS: Our results indicate a broad association between polyploidy and sexual system dimorphism, with low evolutionary stability of the diploid-dioecious condition observed in several clades. A detailed examination of the clades exhibiting this correlation reveals that it is underlain by various patterns of transition rate asymmetry. CONCLUSIONS: We conclude that the long-hypothesized connection between ploidy and sexual system holds in some clades, although it may well be affected by factors that differ from clade to clade. Our results further demonstrate that to better understand the evolutionary processes involved, more sophisticated methods and extensive and detailed data sets are required for both broad and focused inquiry.
Assuntos
Genoma de Planta/genética , Magnoliopsida/genética , Modelos Genéticos , Poliploidia , Evolução Biológica , Diploide , Filogenia , Reprodução/genéticaRESUMO
Dioecy, the sexual system in which male and female organs are found in separate individuals, allows greater specialization for sex-specific functions and can be advantageous under various ecological and environmental conditions. However, dioecy is rare among flowering plants. Previous studies identified contradictory trends regarding the relative diversification rates of dioecious lineages vs their nondioecious counterparts, depending on the methods and data used. We gathered detailed species-level data for dozens of genera that contain both dioecious and nondioecious species. We then applied a probabilistic approach that accounts for differential speciation, extinction, and transition rates between states to examine whether there is an association between dioecy and lineage diversification. We found a bimodal distribution, whereby dioecious lineages exhibited higher diversification in certain genera but lower diversification in others. Additional analyses did not uncover an ecological or life history trait that could explain a context-dependent effect of dioecy on diversification. Furthermore, in-depth simulations of neutral characters demonstrated that such bimodality is also found when simulating neutral characters across the observed trees. Our analyses suggest that - at least for these genera with the currently available data - dioecy neither consistently places a strong brake on diversification nor is a strong driver.
Assuntos
Biodiversidade , Magnoliopsida/fisiologia , Filogenia , Simulação por Computador , Bases de Dados como Assunto , Probabilidade , Característica Quantitativa Herdável , ReproduçãoRESUMO
We announce the release of chromEvol version 2.0, a software tool for inferring the pattern of chromosome number change along a phylogeny. The software facilitates the inference of the expected number of polyploidy and dysploidy transitions along each branch of a phylogeny and estimates ancestral chromosome numbers at internal nodes. The new version features a novel extension of the model accounting for general multiplication events, other than doubling of the number of chromosomes. This allows the monoploid number (commonly referred to as x, or the base-number) of a group of interest to be inferred in a statistical framework. In addition, we devise an inference scheme, which allows explicit categorization of each terminal taxon as either diploid or polyploid. The new version also supports intraspecific variation in chromosome number and allows hypothesis testing regarding the root chromosome number. The software, alongside a detailed usage manual, is available at http://www.tau.ac.il/â¼itaymay/cp/chromEvol/.