RESUMO
Aging is associated with progressive phenotypic changes. Virtually all cellular phenotypes are produced by proteins, and their structural alterations can lead to age-related diseases. However, we still lack comprehensive knowledge of proteins undergoing structural-functional changes during cellular aging and their contributions to age-related phenotypes. Here, we conducted proteome-wide analysis of early age-related protein structural changes in budding yeast using limited proteolysis-mass spectrometry (LiP-MS). The results, compiled in online ProtAge catalog, unraveled age-related functional changes in regulators of translation, protein folding, and amino acid metabolism. Mechanistically, we found that folded glutamate synthase Glt1 polymerizes into supramolecular self-assemblies during aging, causing breakdown of cellular amino acid homeostasis. Inhibiting Glt1 polymerization by mutating the polymerization interface restored amino acid levels in aged cells, attenuated mitochondrial dysfunction, and led to lifespan extension. Altogether, this comprehensive map of protein structural changes enables identifying mechanisms of age-related phenotypes and offers opportunities for their reversal.
Assuntos
Senescência Celular , Longevidade , Longevidade/genética , Polimerização , AminoácidosRESUMO
Increasing the proportion of locally produced plant protein in currently meat-rich diets could substantially reduce greenhouse gas emissions and loss of biodiversity1. However, plant protein production is hampered by the lack of a cool-season legume equivalent to soybean in agronomic value2. Faba bean (Vicia faba L.) has a high yield potential and is well suited for cultivation in temperate regions, but genomic resources are scarce. Here, we report a high-quality chromosome-scale assembly of the faba bean genome and show that it has expanded to a massive 13 Gb in size through an imbalance between the rates of amplification and elimination of retrotransposons and satellite repeats. Genes and recombination events are evenly dispersed across chromosomes and the gene space is remarkably compact considering the genome size, although with substantial copy number variation driven by tandem duplication. Demonstrating practical application of the genome sequence, we develop a targeted genotyping assay and use high-resolution genome-wide association analysis to dissect the genetic basis of seed size and hilum colour. The resources presented constitute a genomics-based breeding platform for faba bean, enabling breeders and geneticists to accelerate the improvement of sustainable protein production across the Mediterranean, subtropical and northern temperate agroecological zones.
Assuntos
Produtos Agrícolas , Diploide , Variação Genética , Genoma de Planta , Genômica , Melhoramento Vegetal , Proteínas de Plantas , Vicia faba , Cromossomos de Plantas/genética , Produtos Agrícolas/genética , Produtos Agrícolas/metabolismo , Variações do Número de Cópias de DNA/genética , DNA Satélite/genética , Amplificação de Genes/genética , Genes de Plantas/genética , Variação Genética/genética , Genoma de Planta/genética , Estudo de Associação Genômica Ampla , Geografia , Melhoramento Vegetal/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Recombinação Genética , Retroelementos/genética , Sementes/anatomia & histologia , Sementes/genética , Vicia faba/anatomia & histologia , Vicia faba/genética , Vicia faba/metabolismoRESUMO
[This corrects the article DOI: 10.1371/journal.pcbi.1007419.].
RESUMO
Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (https://bitbucket.org/plyusnin/ads/). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial.
Assuntos
Biologia Computacional/métodos , Anotação de Sequência Molecular/classificação , Anotação de Sequência Molecular/métodos , Algoritmos , Benchmarking/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Ontologia Genética/tendências , Reprodutibilidade dos Testes , SoftwareRESUMO
We present AAI-profiler, a web server for exploratory analysis and quality control in comparative genomics. AAI-profiler summarizes proteome-wide sequence search results to identify novel species, assess the need for taxonomic reclassification and detect multi-isolate and contaminated samples. AAI-profiler visualises results using a scatterplot that shows the Average Amino-acid Identity (AAI) from the query proteome to all similar species in the sequence database. Taxonomic groups are indicated by colour and marker styles, making outliers easy to spot. AAI-profiler uses SANSparallel to perform high-performance homology searches, making proteome-wide analysis possible. We demonstrate the efficacy of AAI-profiler in the discovery of a close relationship between two bacterial symbionts of an omnivorous pirate bug (Orius) and a thrip (Frankliniella occidentalis), an important pest in agriculture. The symbionts represent novel species within the genus Rosenbergiella so far described only in floral nectar. AAI-profiler is easy to use, the analysis presented only required two mouse clicks and was completed in a few minutes. AAI-profiler is available at http://ekhidna2.biocenter.helsinki.fi/AAI.
Assuntos
Proteínas de Bactérias/genética , Chlamydia trachomatis/classificação , Erwinia/classificação , Filogenia , Proteoma/genética , Software , Sequência de Aminoácidos , Animais , Proteínas de Bactérias/classificação , Proteínas de Bactérias/metabolismo , Chlamydia trachomatis/genética , Chlamydia trachomatis/isolamento & purificação , Erwinia/genética , Erwinia/isolamento & purificação , Expressão Gênica , Genômica/métodos , Heterópteros/microbiologia , Internet , Anotação de Sequência Molecular , Proteoma/classificação , Proteoma/metabolismo , Homologia de Sequência de Aminoácidos , Simbiose/fisiologia , Tisanópteros/microbiologiaRESUMO
The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.
Assuntos
Biologia Computacional/tendências , Ontologia Genética/tendências , Internet , Software , Algoritmos , Bases de Dados de Proteínas/tendências , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência MolecularRESUMO
Colorectal cancer (CRC) genome is unstable and different types of instabilities, such as chromosomal instability (CIN) and microsatellite instability (MSI) are thought to reflect distinct cancer initiating mechanisms. Although 85% of sporadic CRC reveal CIN, 15% reveal mismatch repair (MMR) malfunction and MSI, the hallmarks of Lynch syndrome with inherited heterozygous germline mutations in MMR genes. Our study was designed to comprehensively follow genome-wide expression changes and their implications during colon tumorigenesis. We conducted a long-term feeding experiment in the mouse to address expression changes arising in histologically normal colonic mucosa as putative cancer preceding events, and the effect of inherited predisposition (Mlh1+/-) and Western-style diet (WD) on those. During the 21-month experiment, carcinomas developed mainly in WD-fed mice and were evenly distributed between genotypes. Unexpectedly, the heterozygote (B6.129-Mlh1tm1Rak) mice did not show MSI in their CRCs. Instead, both wildtype and heterozygote CRC mice showed a distinct mRNA expression profile and shortage of several chromosomal segregation gene-specific transcripts (Mlh1, Bub1, Mis18a, Tpx2, Rad9a, Pms2, Cenpe, Ncapd3, Odf2 and Dclre1b) in their colon mucosa, as well as an increased mitotic activity and abundant numbers of unbalanced/atypical mitoses in tumours. Our genome-wide expression profiling experiment demonstrates that cancer preceding changes are already seen in histologically normal colon mucosa and that decreased expressions of Mlh1 and other chromosomal segregation genes may form a field-defect in mucosa, which trigger MMR-proficient, chromosomally unstable CRC.
Assuntos
Colo/metabolismo , Neoplasias do Colo/genética , Mucosa Intestinal/metabolismo , Proteína 1 Homóloga a MutL/deficiência , Animais , Neoplasias do Colo/metabolismo , Neoplasias Colorretais Hereditárias sem Polipose/genética , Reparo de Erro de Pareamento de DNA/genética , Feminino , Predisposição Genética para Doença/genética , Mutação em Linhagem Germinativa/genética , Heterozigoto , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Instabilidade de Microssatélites , Mitose/genéticaRESUMO
BACKGROUND: Competitive gene set analysis is a standard exploratory tool for gene expression data. Permutation-based competitive gene set analysis methods are preferable to parametric ones because the latter make strong statistical assumptions which are not always met. For permutation-based methods, we permute samples, as opposed to genes, as doing so preserves the inter-gene correlation structure. Unfortunately, up until now, sample permutation-based methods have required a minimum of six replicates per sample group. RESULTS: We propose a new permutation-based competitive gene set analysis method for multi-group gene expression data with as few as three replicates per group. The method is based on advanced sample permutation technique that utilizes all groups within a data set for pairwise comparisons. We present a comprehensive evaluation of different permutation techniques, using multiple data sets and contrast the performance of our method, mGSZm, with other state of the art methods. We show that mGSZm is robust, and that, despite only using less than six replicates, we are able to consistently identify a high proportion of the top ranked gene sets from the analysis of a substantially larger data set. Further, we highlight other methods where performance is highly variable and appears dependent on the underlying data set being analyzed. CONCLUSIONS: Our results demonstrate that robust gene set analysis of multi-group gene expression data is permissible with as few as three replicates. In doing so, we have extended the applicability of such approaches to resource constrained experiments where additional data generation is prohibitively difficult or expensive. An R package implementing the proposed method and supplementary materials are available from the website http://ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZm.html .
Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Animais , Interpretação Estatística de Dados , Humanos , CamundongosRESUMO
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Assuntos
Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da EspécieRESUMO
MOTIVATION: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. RESULTS: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.
Assuntos
Mineração de Dados , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/metabolismo , Vocabulário Controlado , Análise por Conglomerados , Biologia Computacional/métodos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Ontologia Genética , Humanos , Proteínas/genéticaRESUMO
MOTIVATION: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P-value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations. RESULTS: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P-values. Asymptotic P-value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P-value estimation demonstrates a clear advantage of asymptotic P-value over empirical P-value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P-values. We also propose well-known asymptotic distribution models for three of the compared methods. AVAILABILITY AND IMPLEMENTATION: mGSZ is available as R package from cran.r-project.org.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Interpretação Estatística de Dados , Escherichia coli/genética , Feminino , Regulação Leucêmica da Expressão Gênica , Humanos , Leucemia/genética , Masculino , Modelos Estatísticos , Fatores Sexuais , Software , Proteína Supressora de Tumor p53/genéticaRESUMO
Soft rot disease is economically one of the most devastating bacterial diseases affecting plants worldwide. In this study, we present novel insights into the phylogeny and virulence of the soft rot model Pectobacterium sp. SCC3193, which was isolated from a diseased potato stem in Finland in the early 1980s. Genomic approaches, including proteome and genome comparisons of all sequenced soft rot bacteria, revealed that SCC3193, previously included in the species Pectobacterium carotovorum, can now be more accurately classified as Pectobacterium wasabiae. Together with the recently revised phylogeny of a few P. carotovorum strains and an increasing number of studies on P. wasabiae, our work indicates that P. wasabiae has been unnoticed but present in potato fields worldwide. A combination of genomic approaches and in planta experiments identified features that separate SCC3193 and other P. wasabiae strains from the rest of soft rot bacteria, such as the absence of a type III secretion system that contributes to virulence of other soft rot species. Experimentally established virulence determinants include the putative transcriptional regulator SirB, two partially redundant type VI secretion systems and two horizontally acquired clusters (Vic1 and Vic2), which contain predicted virulence genes. Genome comparison also revealed other interesting traits that may be related to life in planta or other specific environmental conditions. These traits include a predicted benzoic acid/salicylic acid carboxyl methyltransferase of eukaryotic origin. The novelties found in this work indicate that soft rot bacteria have a reservoir of unknown traits that may be utilized in the poorly understood latent stage in planta. The genomic approaches and the comparison of the model strain SCC3193 to other sequenced Pectobacterium strains, including the type strain of P. wasabiae, provides a solid basis for further investigation of the virulence, distribution and phylogeny of soft rot bacteria and, potentially, other bacteria as well.
Assuntos
Transferência Genética Horizontal , Família Multigênica , Pectobacterium/genética , Pectobacterium/patogenicidade , Filogenia , Doenças das Plantas/genética , Fatores de Virulência/genética , Doenças das Plantas/microbiologia , Raízes de Plantas/microbiologia , Solanum tuberosum/microbiologia , Fatores de Virulência/metabolismoRESUMO
BACKGROUND: Lynch syndrome (LS) is one of the most common hereditary cancer syndromes worldwide. Dominantly inherited mutation in one of four DNA mismatch repair genes combined with somatic events leads to mismatch repair deficiency and microsatellite instability (MSI) in tumours. Due to a high lifetime risk of cancer, regular surveillance plays a key role in cancer prevention; yet the observation of frequent interval cancers points to insufficient cancer prevention by colonoscopy-based methods alone. This study aimed to identify precancerous functional changes in colonic mucosa that could facilitate the monitoring and prevention of cancer development in LS. METHODS: The study material comprised colon biopsy specimens (n = 71) collected during colonoscopy examinations from LS carriers (tumour-free, or diagnosed with adenoma, or diagnosed with carcinoma) and a control group, which included sporadic cases without LS or neoplasia. The majority (80%) of LS carriers had an inherited genetic MLH1 mutation. The remaining 20% included MSH2 mutation carriers (13%) and MSH6 mutation carriers (7%). The transcriptomes were first analysed with RNA-sequencing and followed up with Gorilla Ontology analysis and Reactome Knowledgebase and Ingenuity Pathway Analyses to detect functional changes that might be associated with the initiation of the neoplastic process in LS individuals. FINDINGS: With pathway and gene ontology analyses combined with measurement of mitotic perimeters from colonic mucosa and tumours, we found an increased tendency to chromosomal instability (CIN), already present in macroscopically normal LS mucosa. Our results suggest that CIN is an earlier aberration than MSI and may be the initial cancer driving aberration, whereas MSI accelerates tumour formation. Furthermore, our results suggest that MLH1 deficiency plays a significant role in the development of CIN. INTERPRETATION: The results validate our previous findings from mice and highlight early mitotic abnormalities as an important contributor and precancerous marker of colorectal tumourigenesis in LS. FUNDING: This work was supported by grants from the Jane and Aatos Erkko Foundation, the Academy of Finland (330606 and 331284), Cancer Foundation Finland sr, and the Sigrid Jusélius Foundation. Open access is funded by Helsinki University Library.
Assuntos
Neoplasias Colorretais Hereditárias sem Polipose , Instabilidade de Microssatélites , Mitose , Humanos , Neoplasias Colorretais Hereditárias sem Polipose/genética , Neoplasias Colorretais Hereditárias sem Polipose/patologia , Neoplasias Colorretais Hereditárias sem Polipose/complicações , Feminino , Masculino , Mitose/genética , Pessoa de Meia-Idade , Mutação , Adulto , Idoso , Proteína 1 Homóloga a MutL/genética , Perfilação da Expressão Gênica , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Neoplasias Colorretais/etiologia , Carcinogênese/genética , Reparo de Erro de Pareamento de DNA/genética , TranscriptomaRESUMO
BACKGROUND: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. RESULTS: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. CONCLUSIONS: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.
Assuntos
Ontologia Genética , Genes , Anotação de Sequência Molecular/métodos , Proteínas/genética , Software , Inteligência Artificial , Anotação de Sequência Molecular/classificação , Proteínas/química , Proteínas/classificação , Ferramenta de Busca/métodos , Software/classificação , Vocabulário ControladoRESUMO
WRKY transcription factors (TFs) have been mainly associated with plant defense, but recent studies have suggested additional roles in the regulation of other physiological processes. Here, we explored the possible contribution of two related group III WRKY TFs, WRKY70 and WRKY54, to osmotic stress tolerance. These TFs are positive regulators of plant defense, and co-operate as negative regulators of salicylic acid (SA) biosynthesis and senescence. We employed single and double mutants of wrky54 and wrky70, as well as a WRKY70 overexpressor line, to explore the role of these TFs in osmotic stress (polyethylene glycol) responses. Their effect on gene expression was characterized by microarrays and verified by quantitative PCR. Stomatal phenotypes were assessed by water retention and stomatal conductance measurements. The wrky54wrky70 double mutants exhibited clearly enhanced tolerance to osmotic stress. However, gene expression analysis showed reduced induction of osmotic stress-responsive genes in addition to reduced accumulation of the osmoprotectant proline. By contrast, the enhanced tolerance was correlated with improved water retention and enhanced stomatal closure. These findings demonstrate that WRKY70 and WRKY54 co-operate as negative regulators of stomatal closure and, consequently, osmotic stress tolerance in Arabidopsis, suggesting that they have an important role, not only in plant defense, but also in abiotic stress signaling.
Assuntos
Ácido Abscísico/farmacologia , Proteínas de Arabidopsis/genética , Arabidopsis/fisiologia , Reguladores de Crescimento de Plantas/farmacologia , Estômatos de Plantas/fisiologia , Ácido Salicílico/farmacologia , Ácido Abscísico/análise , Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos , Pressão Osmótica , Reguladores de Crescimento de Plantas/análise , Estômatos de Plantas/genética , Plantas Geneticamente Modificadas , Prolina/análise , Ácido Salicílico/análise , Transdução de Sinais , Estresse Fisiológico , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Água/metabolismoRESUMO
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Assuntos
Proteínas , Bases de Dados de Proteínas , Alinhamento de Sequência , Proteínas/química , Domínios Proteicos , Estrutura Secundária de ProteínaRESUMO
The facility of next-generation sequencing has led to an explosion of gene catalogs for novel genomes, transcriptomes and metagenomes, which are functionally uncharacterized. Computational inference has emerged as a necessary substitute for first-hand experimental evidence. PANNZER (Protein ANNotation with Z-scoRE) is a high-throughput functional annotation web server that stands out among similar publically accessible web servers in supporting submission of up to 100,000 protein sequences at once and providing both Gene Ontology (GO) annotations and free text description predictions. Here, we demonstrate the use of PANNZER and discuss future plans and challenges. We present two case studies to illustrate problems related to data quality and method evaluation. Some commonly used evaluation metrics and evaluation datasets promote methods that favor unspecific and broad functional classes over more informative and specific classes. We argue that this can bias the development of automated function prediction methods. The PANNZER web server and source code are available at http://ekhidna2.biocenter.helsinki.fi/sanspanz/.
Assuntos
Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas , Software , Proteínas/química , Proteínas/genéticaRESUMO
Tegmental nuclei in the ventral midbrain and anterior hindbrain control motivated behavior, mood, memory, and movement. These nuclei contain inhibitory GABAergic and excitatory glutamatergic neurons, whose molecular diversity and development remain largely unraveled. Many tegmental neurons originate in the embryonic ventral rhombomere 1 (r1), where GABAergic fate is regulated by the transcription factor (TF) Tal1. We used single-cell mRNA sequencing of the mouse ventral r1 to characterize the Tal1-dependent and independent neuronal precursors. We describe gene expression dynamics during bifurcation of the GABAergic and glutamatergic lineages and show how active Notch signaling promotes GABAergic fate selection in post-mitotic precursors. We identify GABAergic precursor subtypes that give rise to distinct tegmental nuclei and demonstrate that Sox14 and Zfpm2, two TFs downstream of Tal1, are necessary for the differentiation of specific tegmental GABAergic neurons. Our results provide a framework for understanding the development of cellular diversity in the tegmental nuclei.
Assuntos
Neurônios GABAérgicos/metabolismo , Ácido Glutâmico/metabolismo , Rombencéfalo/metabolismo , Tegmento Mesencefálico/metabolismo , Animais , Diferenciação Celular , Linhagem da Célula , Proteínas de Ligação a DNA/metabolismo , Núcleo Dorsal da Rafe/metabolismo , Embrião de Mamíferos/citologia , Feminino , Proteína Forkhead Box O1/metabolismo , Proteínas de Homeodomínio/metabolismo , Masculino , Camundongos Endogâmicos C57BL , Células-Tronco Neurais/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores Notch/metabolismo , Fatores de Transcrição SOXB2/metabolismo , Transdução de Sinais/efeitos dos fármacos , Proteína 1 de Leucemia Linfocítica Aguda de Células T/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5'-C-phosphate-G-3' (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants' whole blood from 2011 follow-up using Illumina Infinium HumanMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0-2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR) ≤ 0.05, among which is olfactory receptor activity, the flagship novel finding of the present study. Overall, we extended the current knowledge by identifying: (i) three novel smoking related CpG sites, (ii) similar effects as aging on average methylation in shore, and (iii) a novel finding that olfactory receptor activity pathway responds to tobacco smoke and toxin exposure through epigenetic mechanisms.
Assuntos
Fumar Cigarros/efeitos adversos , Metilação de DNA , Epigênese Genética , Adulto , Envelhecimento/genética , Fumar Cigarros/sangue , Fumar Cigarros/genética , Ilhas de CpG/genética , Epigenoma/genética , Feminino , Finlândia , Seguimentos , Estudo de Associação Genômica Ampla , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , não Fumantes , Estudos Prospectivos , Receptores Odorantes/metabolismo , Transdução de Sinais/genética , Olfato/genética , Fumaça/efeitos adversos , Fumantes , Nicotiana/efeitos adversosRESUMO
BACKGROUND: The analysis of over-represented functional classes in a list of genes is one of the most essential bioinformatics research topics. Typical examples of such lists are the differentially expressed genes from transcriptional analysis which need to be linked to functional information represented in the Gene Ontology (GO). Despite the importance of this procedure, there is a little work on consistent evaluation of various GO analysis methods. Especially, there is no literature on creating benchmark datasets for GO analysis tools. RESULTS: We propose a methodology for the evaluation of GO analysis tools, which consists of creating gene lists with a selected signal level and a selected number of independent over-represented classes. The methodology starts with a real life GO data matrix, and therefore the generated datasets have similar features to real positive datasets. The user can select the signal level for over-representation, the number of independent positive classes in the dataset, and the size of the final gene list. We present the use of the effective number and various normalizations while embedding the signal to a selected class or classes and the use of binary correlation to ensure that the selected signal classes are independent with each other. The usefulness of generated datasets is demonstrated by comparing different GO class ranking and GO clustering methods. CONCLUSION: The presented methods aid the development and evaluation of GO analysis methods as they enable thorough testing with different signal types and different signal levels. As an example, our comparisons reveal clear differences between compared GO clustering and GO de-correlation methods. The implementation is coded in Matlab and is freely available at the dedicated website http://ekhidna.biocenter.helsinki.fi/users/petri/public/POSGODA/POSGODA.html.