RESUMO
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
Assuntos
Arabidopsis/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Transcrição Gênica/genética , Arabidopsis/classificação , Proteínas de Arabidopsis/genética , Sequência de Bases , Genes de Plantas/genética , Genômica , Haplótipos/genética , Mutação INDEL/genética , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Proteoma/genética , Plântula/genética , Análise de Sequência de DNARESUMO
The classic phytohormones cytokinin and auxin play essential roles in the maintenance of stem-cell systems embedded in shoot and root meristems, and exhibit complex functional interactions. Here we show that the activity of both hormones directly converges on the promoters of two A-type ARABIDOPSIS RESPONSE REGULATOR (ARR) genes, ARR7 and ARR15, which are negative regulators of cytokinin signalling and have important meristematic functions. Whereas ARR7 and ARR15 expression in the shoot apical meristem (SAM) is induced by cytokinin, auxin has a negative effect, which is, at least in part, mediated by the AUXIN RESPONSE FACTOR5/MONOPTEROS (MP) transcription factor. Our results provide a mechanistic framework for hormonal control of the apical stem-cell niche and demonstrate how root and shoot stem-cell systems differ in their response to phytohormones.
Assuntos
Arabidopsis/citologia , Arabidopsis/metabolismo , Reguladores de Crescimento de Plantas/metabolismo , Brotos de Planta/citologia , Nicho de Células-Tronco/citologia , Nicho de Células-Tronco/metabolismo , Células-Tronco/citologia , Arabidopsis/efeitos dos fármacos , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Citocininas/metabolismo , Citocininas/farmacologia , Proteínas de Ligação a DNA/deficiência , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica de Plantas , Ácidos Indolacéticos/metabolismo , Ácidos Indolacéticos/farmacologia , Meristema/citologia , Meristema/efeitos dos fármacos , Meristema/genética , Meristema/metabolismo , Reguladores de Crescimento de Plantas/farmacologia , Raízes de Plantas , Brotos de Planta/efeitos dos fármacos , Brotos de Planta/metabolismo , Transdução de Sinais/efeitos dos fármacos , Nicho de Células-Tronco/efeitos dos fármacos , Células-Tronco/efeitos dos fármacos , Fatores de Transcrição/deficiência , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Assuntos
RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Sequência de Bases , Internet , SoftwareRESUMO
What is more inspiring than a discussion with the leading scientists in your field? As a student or a young researcher, you have likely been influenced by mentors guiding you in your career and leading you to your current position. Any discussion with or advice from an expert is certainly very helpful for young people. But how often do we have the opportunity to meet experts? Do we make the most out of these situations? Meetings organized for young scientists are a great opportunity not only for the attendees: they are an opportunity for experts to meet bright students and learn from them in return. In this article, we introduce several successful events organized by Regional Student Groups all around the world, bridging the gap between experts and young scientists. We highlight how rewarding it is for all participants: young researchers, experts, and organizers. We then discuss the various benefits and emphasize the importance of organizing and attending such meetings. As a young researcher, seeking mentorship and additional skills training is a crucial step in career development. Keep in mind that one day, you may be an inspiring mentor, too.
Assuntos
Educação Continuada/métodos , Mentores , Biologia de Sistemas/educaçãoRESUMO
Exchanging ideas with like-minded, enthusiastic people interested in the same topic is crucial for the advancement of a scientist's career. Several Regional Student Groups (RSGs) of the International Society for Computational Biology (ISCB) Student Council have cooperated in the last six years to organize scientific workshops and conferences. With motivated students, it is possible to create a memorable event for fellow scientists; in doing so, the organizers gain valuable experiences. While collaborating across borders and time zones can be difficult, feedback from event organizers was always positive. When limited resources are juxtaposed with great ideas and a network of contacts, the outcome is always an amazing experience, despite organizers being separated geographically across different countries.
Assuntos
Biologia Computacional/organização & administração , Comunicação , Biologia Computacional/métodos , Humanos , Cooperação Internacional , Ciência , Sociedades Científicas , EstudantesRESUMO
BACKGROUND: Extreme weather events induced by climate change, particularly droughts, have detrimental consequences for crop yields and food security. Concurrently, these conditions provoke substantial changes in the soil bacterial microbiota and affect plant health. Early recognition of soil affected by drought enables farmers to implement appropriate agricultural management practices. In this context, interpretable machine learning holds immense potential for drought stress classification of soil based on marker taxa. RESULTS: This study demonstrates that the 16S rRNA-based metagenomic approach of Differential Abundance Analysis methods and machine learning-based Shapley Additive Explanation values provide similar information. They exhibit their potential as complementary approaches for identifying marker taxa and investigating their enrichment or depletion under drought stress in grass lineages. Additionally, the Random Forest Classifier trained on a diverse range of relative abundance data from the soil bacterial micobiome of various plant species achieves a high accuracy of 92.3 % at the genus rank for drought stress prediction. It demonstrates its generalization capacity for the lineages tested. CONCLUSIONS: In the detection of drought stress in soil bacterial microbiota, this study emphasizes the potential of an optimized and generalized location-based ML classifier. By identifying marker taxa, this approach holds promising implications for microbe-assisted plant breeding programs and contributes to the development of sustainable agriculture practices. These findings are crucial for preserving global food security in the face of climate change.
RESUMO
Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Different tools have been developed to extract differentially methylated regions (DMRs), often built upon assumptions from mammalian data. Here, we present MethylScore, a pipeline to analyse WGBS data and to account for the substantially more complex and variable nature of plant DNA methylation. MethylScore uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation. It processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1,001 Genomes dataset to unveil known and unknown genotype-epigenotype associations .
RESUMO
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
RESUMO
MOTIVATION: Understanding transcriptional regulation is one of the main challenges in computational biology. An important problem is the identification of transcription factor (TF) binding sites in promoter regions of potential TF target genes. It is typically approached by position weight matrix-based motif identification algorithms using Gibbs sampling, or heuristics to extend seed oligos. Such algorithms succeed in identifying single, relatively well-conserved binding sites, but tend to fail when it comes to the identification of combinations of several degenerate binding sites, as those often found in cis-regulatory modules. RESULTS: We propose a new algorithm that combines the benefits of existing motif finding with the ones of support vector machines (SVMs) to find degenerate motifs in order to improve the modeling of regulatory modules. In experiments on microarray data from Arabidopsis thaliana, we were able to show that the newly developed strategy significantly improves the recognition of TF targets. AVAILABILITY: The python source code (open source-licensed under GPL), the data for the experiments and a Galaxy-based web service are available at http://www.fml.mpg.de/raetsch/suppl/kirmes/.
Assuntos
Biologia Computacional/métodos , Eucromatina/química , Regiões Promotoras Genéticas , Software , Algoritmos , Arabidopsis/genética , Sequência de Bases , Sítios de Ligação , Fatores de Transcrição/químicaRESUMO
BACKGROUND: Assessment of seed germination is an essential task for seed researchers to measure the quality and performance of seeds. Usually, seed assessments are done manually, which is a cumbersome, time consuming and error-prone process. Classical image analyses methods are not well suited for large-scale germination experiments, because they often rely on manual adjustments of color-based thresholds. We here propose a machine learning approach using modern artificial neural networks with region proposals for accurate seed germination detection and high-throughput seed germination experiments. RESULTS: We generated labeled imaging data of the germination process of more than 2400 seeds for three different crops, Zea mays (maize), Secale cereale (rye) and Pennisetum glaucum (pearl millet), with a total of more than 23,000 images. Different state-of-the-art convolutional neural network (CNN) architectures with region proposals have been trained using transfer learning to automatically identify seeds within petri dishes and to predict whether the seeds germinated or not. Our proposed models achieved a high mean average precision (mAP) on a hold-out test data set of approximately 97.9%, 94.2% and 94.3% for Zea mays, Secale cereale and Pennisetum glaucum respectively. Further, various single-value germination indices, such as Mean Germination Time and Germination Uncertainty, can be computed more accurately with the predictions of our proposed model compared to manual countings. CONCLUSION: Our proposed machine learning-based method can help to speed up the assessment of seed germination experiments for different seed cultivars. It has lower error rates and a higher performance compared to conventional and manual methods, leading to more accurate germination indices and quality assessments of seeds.
RESUMO
BACKGROUND: Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. RESULTS: We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. CONCLUSIONS: We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.
Assuntos
Cruzamento/normas , Bovinos/genética , Genoma , Genômica/normas , Polimorfismo Genético , Animais , Cruzamento/métodos , Genômica/métodos , RNA-Seq/métodos , RNA-Seq/normas , Padrões de Referência , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normasRESUMO
We have conducted a study on the long-term availability of bioinformatics Web services: an observation of 927 Web services published in the annual Nucleic Acids Research Web Server Issues between 2003 and 2009. We found that 72% of Web sites are still available at the published addresses, only 9% of services are completely unavailable. Older addresses often redirect to new pages. We checked the functionality of all available services: for 33%, we could not test functionality because there was no example data or a related problem; 13% were truly no longer working as expected; we could positively confirm functionality only for 45% of all services. Additionally, we conducted a survey among 872 Web Server Issue corresponding authors; 274 replied. 78% of all respondents indicate their services have been developed solely by students and researchers without a permanent position. Consequently, these services are in danger of falling into disrepair after the original developers move to another institution, and indeed, for 24% of services, there is no plan for maintenance, according to the respondents. We introduce a Web service quality scoring system that correlates with the number of citations: services with a high score are cited 1.8 times more often than low-scoring services. We have identified key characteristics that are predictive of a service's survival, providing reviewers, editors, and Web service developers with the means to assess or improve Web services. A Web service conforming to these criteria receives more citations and provides more reliable service for its users. The most effective way of ensuring continued access to a service is a persistent Web address, offered either by the publishing journal, or created on the authors' own initiative, for example at http://bioweb.me. The community would benefit the most from a policy requiring any source code needed to reproduce results to be deposited in a public repository.
Assuntos
Biologia Computacional/normas , Armazenamento e Recuperação da Informação/normas , Internet/normas , Publicações Periódicas como Assunto , Biologia Computacional/métodos , Humanos , Reprodutibilidade dos TestesRESUMO
The challenge of identifying cis-regulatory modules (CRMs) is an important milestone for the ultimate goal of understanding transcriptional regulation in eukaryotic cells. It has been approached, among others, by motif-finding algorithms that identify overrepresented motifs in regulatory sequences. These methods succeed in finding single, well-conserved motifs, but fail to identify combinations of degenerate binding sites, like the ones often found in CRMs. We have developed a method that combines the abilities of existing motif finding with the discriminative power of a machine learning technique to model the regulation of genes (Schultheiss et al. (2009) Bioinformatics 25, 2126-2133). Our software is called KIRMES: , which stands for kernel-based identification of regulatory modules in eukaryotic sequences. Starting from a set of genes thought to be co-regulated, KIRMES: can identify the key CRMs responsible for this behavior and can be used to determine for any other gene not included on that list if it is also regulated by the same mechanism. Such gene sets can be derived from microarrays, chromatin immunoprecipitation experiments combined with next-generation sequencing or promoter/whole genome microarrays. The use of an established machine learning method makes the approach fast to use and robust with respect to noise. By providing easily understood visualizations for the results returned, they become interpretable and serve as a starting point for further analysis. Even for complex regulatory relationships, KIRMES: can be a helpful tool in directing the design of biological experiments.
Assuntos
Biologia Computacional/métodos , Sequências Reguladoras de Ácido Nucleico/genética , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Inteligência Artificial , Sítios de Ligação , Genoma de Planta/genética , Proteínas de Homeodomínio/metabolismo , SoftwareRESUMO
Despite the independent evolution of multicellularity in plants and animals, the basic organization of their stem cell niches is remarkably similar. Here, we report the genome-wide regulatory potential of WUSCHEL, the key transcription factor for stem cell maintenance in the shoot apical meristem of the reference plant Arabidopsis thaliana. WUSCHEL acts by directly binding to at least two distinct DNA motifs in more than 100 target promoters and preferentially affects the expression of genes with roles in hormone signaling, metabolism, and development. Striking examples are the direct transcriptional repression of CLAVATA1, which is part of a negative feedback regulation of WUSCHEL, and the immediate regulation of transcriptional repressors of the TOPLESS family, which are involved in auxin signaling. Our results shed light on the complex transcriptional programs required for the maintenance of a dynamic and essential stem cell niche.