Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 12.680
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 185(2): 345-360.e28, 2022 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-35063075

RESUMO

We present a whole-cell fully dynamical kinetic model (WCM) of JCVI-syn3A, a minimal cell with a reduced genome of 493 genes that has retained few regulatory proteins or small RNAs. Cryo-electron tomograms provide the cell geometry and ribosome distributions. Time-dependent behaviors of concentrations and reaction fluxes from stochastic-deterministic simulations over a cell cycle reveal how the cell balances demands of its metabolism, genetic information processes, and growth, and offer insight into the principles of life for this minimal cell. The energy economy of each process including active transport of amino acids, nucleosides, and ions is analyzed. WCM reveals how emergent imbalances lead to slowdowns in the rates of transcription and translation. Integration of experimental data is critical in building a kinetic model from which emerges a genome-wide distribution of mRNA half-lives, multiple DNA replication events that can be compared to qPCR results, and the experimentally observed doubling behavior.


Assuntos
Células/citologia , Simulação por Computador , Trifosfato de Adenosina/metabolismo , Ciclo Celular/genética , Proliferação de Células/genética , Células/metabolismo , Replicação do DNA/genética , Regulação da Expressão Gênica , Imageamento Tridimensional , Cinética , Lipídeos/química , Redes e Vias Metabólicas , Metaboloma , Anotação de Sequência Molecular , Nucleotídeos/metabolismo , Termodinâmica , Fatores de Tempo
2.
Cell ; 184(13): 3542-3558.e16, 2021 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-34051138

RESUMO

Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.


Assuntos
Ecótipo , Variação Genética , Genoma de Planta , Oryza/genética , Adaptação Fisiológica/genética , Agricultura , Domesticação , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Variação Estrutural do Genoma , Anotação de Sequência Molecular , Fenótipo
3.
Cell ; 182(1): 145-161.e23, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32553272

RESUMO

Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.


Assuntos
Produtos Agrícolas/genética , Regulação da Expressão Gênica de Plantas , Variação Estrutural do Genoma , Solanum lycopersicum/genética , Alelos , Sistema Enzimático do Citocromo P-450/genética , Ecótipo , Epistasia Genética , Frutas/genética , Duplicação Gênica , Genoma de Planta , Genótipo , Endogamia , Anotação de Sequência Molecular , Fenótipo , Melhoramento Vegetal , Locos de Características Quantitativas/genética
4.
Cell ; 182(1): 162-176.e13, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32553274

RESUMO

Soybean is one of the most important vegetable oil and protein feed crops. To capture the entire genomic diversity, it is needed to construct a complete high-quality pan-genome from diverse soybean accessions. In this study, we performed individual de novo genome assemblies for 26 representative soybeans that were selected from 2,898 deeply sequenced accessions. Using these assembled genomes together with three previously reported genomes, we constructed a graph-based genome and performed pan-genome analysis, which identified numerous genetic variations that cannot be detected by direct mapping of short sequence reads onto a single reference genome. The structural variations from the 2,898 accessions that were genotyped based on the graph-based genome and the RNA sequencing (RNA-seq) data from the representative 26 accessions helped to link genetic variations to candidate genes that are responsible for important traits. This pan-genome resource will promote evolutionary and functional genomics studies in soybean.


Assuntos
Genoma de Planta , Glycine max/crescimento & desenvolvimento , Glycine max/genética , Sequência de Bases , Cromossomos de Plantas/genética , Domesticação , Ecótipo , Duplicação Gênica , Regulação da Expressão Gênica de Plantas , Fusão Gênica , Geografia , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Poliploidia
5.
Cell ; 183(4): 875-889.e17, 2020 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-33035453

RESUMO

Banyan trees are distinguished by their extraordinary aerial roots. The Ficus genus includes species that have evolved a species-specific mutualism system with wasp pollinators. We sequenced genomes of the Chinese banyan tree, F. microcarpa, and a species lacking aerial roots, F. hispida, and one wasp genome coevolving with F. microcarpa, Eupristina verticillata. Comparative analysis of the two Ficus genomes revealed dynamic karyotype variation associated with adaptive evolution. Copy number expansion of auxin-related genes from duplications and elevated auxin production are associated with aerial root development in F. microcarpa. A male-specific AGAMOUS paralog, FhAG2, was identified as a candidate gene for sex determination in F. hispida. Population genomic analyses of Ficus species revealed genomic signatures of morphological and physiological coadaptation with their pollinators involving terpenoid- and benzenoid-derived compounds. These three genomes offer insights into and genomic resources for investigating the geneses of aerial roots, monoecy and dioecy, and codiversification in a symbiotic system.


Assuntos
Evolução Biológica , Ficus/genética , Genoma de Planta , Polinização/fisiologia , Árvores/genética , Vespas/fisiologia , Animais , Cromossomos de Plantas/genética , Elementos de DNA Transponíveis/genética , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Ácidos Indolacéticos/metabolismo , Anotação de Sequência Molecular , Filogenia , Raízes de Plantas/crescimento & desenvolvimento , Duplicações Segmentares Genômicas/genética , Cromossomos Sexuais/genética , Compostos Orgânicos Voláteis/análise
6.
Cell ; 183(7): 2020-2035.e16, 2020 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-33326746

RESUMO

Thousands of proteins localize to the nucleus; however, it remains unclear which contain transcriptional effectors. Here, we develop HT-recruit, a pooled assay where protein libraries are recruited to a reporter, and their transcriptional effects are measured by sequencing. Using this approach, we measure gene silencing and activation for thousands of domains. We find a relationship between repressor function and evolutionary age for the KRAB domains, discover that Homeodomain repressor strength is collinear with Hox genetic organization, and identify activities for several domains of unknown function. Deep mutational scanning of the CRISPRi KRAB maps the co-repressor binding surface and identifies substitutions that improve stability/silencing. By tiling 238 proteins, we find repressors as short as ten amino acids. Finally, we report new activator domains, including a divergent KRAB. These results provide a resource of 600 human proteins containing effectors and demonstrate a scalable strategy for assigning functions to protein domains.


Assuntos
Ensaios de Triagem em Larga Escala , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Sistemas CRISPR-Cas/genética , Feminino , Inativação Gênica , Genes Reporter , Células HEK293 , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Humanos , Células K562 , Lentivirus/fisiologia , Anotação de Sequência Molecular , Mutação/genética , Proteínas Nucleares/metabolismo , Regiões Promotoras Genéticas/genética , Domínios Proteicos , Proteínas Repressoras/química , Proteínas Repressoras/metabolismo , Reprodutibilidade dos Testes , Transcrição Gênica , Dedos de Zinco
7.
Cell ; 179(5): 1068-1083.e21, 2019 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-31730850

RESUMO

Ocean microbial communities strongly influence the biogeochemistry, food webs, and climate of our planet. Despite recent advances in understanding their taxonomic and genomic compositions, little is known about how their transcriptomes vary globally. Here, we present a dataset of 187 metatranscriptomes and 370 metagenomes from 126 globally distributed sampling stations and establish a resource of 47 million genes to study community-level transcriptomes across depth layers from pole-to-pole. We examine gene expression changes and community turnover as the underlying mechanisms shaping community transcriptomes along these axes of environmental variation and show how their individual contributions differ for multiple biogeochemically relevant processes. Furthermore, we find the relative contribution of gene expression changes to be significantly lower in polar than in non-polar waters and hypothesize that in polar regions, alterations in community activity in response to ocean warming will be driven more strongly by changes in organismal composition than by gene regulatory mechanisms. VIDEO ABSTRACT.


Assuntos
Regulação da Expressão Gênica , Metagenoma , Oceanos e Mares , Transcriptoma/genética , Geografia , Microbiota/genética , Anotação de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Água do Mar/microbiologia , Temperatura
8.
Cell ; 172(5): 910-923.e16, 2018 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-29474919

RESUMO

To better understand the gene regulatory mechanisms that program developmental processes, we carried out simultaneous genome-wide measurements of mRNA, translation, and protein through meiotic differentiation in budding yeast. Surprisingly, we observed that the levels of several hundred mRNAs are anti-correlated with their corresponding protein products. We show that rather than arising from canonical forms of gene regulatory control, the regulation of at least 380 such cases, or over 8% of all measured genes, involves temporally regulated switching between production of a canonical, translatable transcript and a 5' extended isoform that is not efficiently translated into protein. By this pervasive mechanism for the modulation of protein levels through a natural developmental program, a single transcription factor can coordinately activate and repress protein synthesis for distinct sets of genes. The distinction is not based on whether or not an mRNA is induced but rather on the type of transcript produced.


Assuntos
Meiose/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Regulação Fúngica da Expressão Gênica , Genes Fúngicos , Modelos Biológicos , Anotação de Sequência Molecular , Biossíntese de Proteínas , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/citologia , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo
9.
Cell ; 171(2): 287-304.e15, 2017 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-28985561

RESUMO

The evolution of land flora transformed the terrestrial environment. Land plants evolved from an ancestral charophycean alga from which they inherited developmental, biochemical, and cell biological attributes. Additional biochemical and physiological adaptations to land, and a life cycle with an alternation between multicellular haploid and diploid generations that facilitated efficient dispersal of desiccation tolerant spores, evolved in the ancestral land plant. We analyzed the genome of the liverwort Marchantia polymorpha, a member of a basal land plant lineage. Relative to charophycean algae, land plant genomes are characterized by genes encoding novel biochemical pathways, new phytohormone signaling pathways (notably auxin), expanded repertoires of signaling pathways, and increased diversity in some transcription factor families. Compared with other sequenced land plants, M. polymorpha exhibits low genetic redundancy in most regulatory pathways, with this portion of its genome resembling that predicted for the ancestral land plant. PAPERCLIP.


Assuntos
Evolução Biológica , Embriófitas/genética , Genoma de Planta , Marchantia/genética , Adaptação Biológica , Embriófitas/fisiologia , Regulação da Expressão Gênica de Plantas , Marchantia/fisiologia , Anotação de Sequência Molecular , Transdução de Sinais , Transcrição Gênica
10.
Nat Immunol ; 20(7): 902-914, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31209404

RESUMO

Lupus nephritis is a potentially fatal autoimmune disease for which the current treatment is ineffective and often toxic. To develop mechanistic hypotheses of disease, we analyzed kidney samples from patients with lupus nephritis and from healthy control subjects using single-cell RNA sequencing. Our analysis revealed 21 subsets of leukocytes active in disease, including multiple populations of myeloid cells, T cells, natural killer cells and B cells that demonstrated both pro-inflammatory responses and inflammation-resolving responses. We found evidence of local activation of B cells correlated with an age-associated B-cell signature and evidence of progressive stages of monocyte differentiation within the kidney. A clear interferon response was observed in most cells. Two chemokine receptors, CXCR4 and CX3CR1, were broadly expressed, implying a potentially central role in cell trafficking. Gene expression of immune cells in urine and kidney was highly correlated, which would suggest that urine might serve as a surrogate for kidney biopsies.


Assuntos
Rim/imunologia , Nefrite Lúpica/imunologia , Biomarcadores , Biópsia , Análise por Conglomerados , Biologia Computacional/métodos , Células Epiteliais/metabolismo , Citometria de Fluxo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Imunofenotipagem , Interferons/metabolismo , Rim/metabolismo , Rim/patologia , Leucócitos/imunologia , Leucócitos/metabolismo , Nefrite Lúpica/genética , Nefrite Lúpica/metabolismo , Nefrite Lúpica/patologia , Linfócitos/imunologia , Linfócitos/metabolismo , Anotação de Sequência Molecular , Células Mieloides/imunologia , Células Mieloides/metabolismo , Análise de Célula Única , Transcriptoma
11.
Cell ; 163(6): 1539-54, 2015 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-26638078

RESUMO

Lifespan is a remarkably diverse trait ranging from a few days to several hundred years in nature, but the mechanisms underlying the evolution of lifespan differences remain elusive. Here we de novo assemble a reference genome for the naturally short-lived African turquoise killifish, providing a unique resource for comparative and experimental genomics. The identification of genes under positive selection in this fish reveals potential candidates to explain its compressed lifespan. Several aging genes are under positive selection in this short-lived fish and long-lived species, raising the intriguing possibility that the same gene could underlie evolution of both compressed and extended lifespans. Comparative genomics and linkage analysis identify candidate genes associated with lifespan differences between various turquoise killifish strains. Remarkably, these genes are clustered on the sex chromosome, suggesting that short lifespan might have co-evolved with sex determination. Our study provides insights into the evolutionary forces that shape lifespan in nature.


Assuntos
Evolução Biológica , Peixes Listrados/genética , Envelhecimento , Animais , DNA Helicases/genética , Genoma , Humanos , Longevidade , Anotação de Sequência Molecular , Dados de Sequência Molecular , Seleção Genética
12.
Nature ; 632(8023): 166-173, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39020176

RESUMO

Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Regulação da Expressão Gênica de Plantas , Domínios Proteicos , Fatores de Transcrição , Ativação Transcricional , Arabidopsis/química , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/classificação , Proteínas de Arabidopsis/metabolismo , Sequência Conservada/genética , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica de Plantas/genética , Ácidos Indolacéticos/metabolismo , Proteínas Intrinsicamente Desordenadas , Anotação de Sequência Molecular , Redes Neurais de Computação , Proteoma/química , Proteoma/metabolismo , Fatores de Transcrição/química , Fatores de Transcrição/classificação , Fatores de Transcrição/metabolismo , Ativação Transcricional/genética
13.
Nat Immunol ; 18(10): 1160-1172, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28783152

RESUMO

Regulatory T cells (Treg cells) perform two distinct functions: they maintain self-tolerance, and they support organ homeostasis by differentiating into specialized tissue Treg cells. We found that epigenetic modifications defined the molecular characteristics of tissue Treg cells. Tagmentation-based whole-genome bisulfite sequencing revealed more than 11,000 regions that were methylated differentially in pairwise comparisons of tissue Treg cell populations and lymphoid T cells. Similarities in the epigenetic landscape led to the identification of a common tissue Treg cell population that was present in many organs and was characterized by gain and loss of DNA methylation that included many gene sites associated with the TH2 subset of helper T cells, such as the gene encoding cytokine IL-33 receptor ST2, as well as the production of tissue-regenerative factors. Furthermore, the ST2-expressing population was dependent on the transcriptional regulator BATF and could be expanded by IL-33. Thus, tissue Treg cells integrate multiple waves of epigenetic reprogramming that define their tissue-restricted specialization.


Assuntos
Metilação de DNA , Estudo de Associação Genômica Ampla , Linfócitos T Reguladores/metabolismo , Animais , Biomarcadores , Análise por Conglomerados , Biologia Computacional/métodos , Ilhas de CpG , Epigênese Genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Imunofenotipagem , Camundongos , Camundongos Transgênicos , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Especificidade de Órgãos/imunologia , Regiões Promotoras Genéticas , Células Th2/metabolismo , Sítio de Iniciação de Transcrição , Transcriptoma
14.
Nature ; 622(7981): 41-47, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37794265

RESUMO

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Assuntos
Genes , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotação de Sequência Molecular/normas , Anotação de Sequência Molecular/tendências , Isoformas de Proteínas/genética , Projeto Genoma Humano , Pseudogenes , RNA/genética
15.
Nature ; 622(7983): 646-653, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37704037

RESUMO

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the ß-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.


Assuntos
Bases de Dados de Proteínas , Aprendizado Profundo , Anotação de Sequência Molecular , Dobramento de Proteína , Proteínas , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Internet , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo
16.
Nature ; 622(7983): 637-645, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37704730

RESUMO

Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm-Foldseek cluster-that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.


Assuntos
Algoritmos , Análise por Conglomerados , Proteínas , Homologia Estrutural de Proteína , Humanos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo , Alinhamento de Sequência , Anotação de Sequência Molecular , Células Procarióticas/química , Filogenia , Especificidade da Espécie , Evolução Molecular
17.
Nature ; 604(7905): 310-315, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35388217

RESUMO

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Genoma , Humanos , Disseminação de Informação , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos
18.
Mol Cell ; 79(3): 504-520.e9, 2020 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-32707033

RESUMO

Protein kinases are essential for signal transduction and control of most cellular processes, including metabolism, membrane transport, motility, and cell cycle. Despite the critical role of kinases in cells and their strong association with diseases, good coverage of their interactions is available for only a fraction of the 535 human kinases. Here, we present a comprehensive mass-spectrometry-based analysis of a human kinase interaction network covering more than 300 kinases. The interaction dataset is a high-quality resource with more than 5,000 previously unreported interactions. We extensively characterized the obtained network and were able to identify previously described, as well as predict new, kinase functional associations, including those of the less well-studied kinases PIM3 and protein O-mannose kinase (POMK). Importantly, the presented interaction map is a valuable resource for assisting biomedical studies. We uncover dozens of kinase-disease associations spanning from genetic disorders to complex diseases, including cancer.


Assuntos
Redes Reguladoras de Genes , Doenças Genéticas Inatas/genética , Neoplasias/genética , Proteínas Quinases/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Proto-Oncogênicas/genética , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Ontologia Genética , Doenças Genéticas Inatas/enzimologia , Doenças Genéticas Inatas/patologia , Humanos , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Distrofias Musculares/enzimologia , Distrofias Musculares/genética , Distrofias Musculares/patologia , Neoplasias/enzimologia , Neoplasias/patologia , Doenças Neurodegenerativas/enzimologia , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/patologia , Mapeamento de Interação de Proteínas/métodos , Proteínas Quinases/química , Proteínas Quinases/classificação , Proteínas Quinases/metabolismo , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Proto-Oncogênicas/química , Proteínas Proto-Oncogênicas/metabolismo , Transdução de Sinais
19.
Genome Res ; 34(5): 757-768, 2024 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-38866548

RESUMO

Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic-, and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data are sufficient for making gene predictions with "high confidence." The genes situated in the genomic space between the high-confidence genes are predicted in the next stage. The set of high-confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperforms gene finders using a single type of extrinsic evidence. Comparisons with gene finders MAKER2 and TSEBRA, those that use both transcript- and protein-derived extrinsic evidence, show that GeneMark-ETP delivers state-of-the-art gene-prediction accuracy, with the margin of outperforming existing approaches increasing in its application to larger and more complex eukaryotic genomes.


Assuntos
Anotação de Sequência Molecular , Anotação de Sequência Molecular/métodos , Animais , Software , Genoma , Genômica/métodos , Eucariotos/genética , Algoritmos
20.
Genome Res ; 34(5): 769-777, 2024 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-38866550

RESUMO

Gene prediction has remained an active area of bioinformatics research for a long time. Still, gene prediction in large eukaryotic genomes presents a challenge that must be addressed by new algorithms. The amount and significance of the evidence available from transcriptomes and proteomes vary across genomes, between genes, and even along a single gene. User-friendly and accurate annotation pipelines that can cope with such data heterogeneity are needed. The previously developed annotation pipelines BRAKER1 and BRAKER2 use RNA-seq or protein data, respectively, but not both. A further significant performance improvement integrating all three data types was made by the recently released GeneMark-ETP. We here present the BRAKER3 pipeline that builds on GeneMark-ETP and AUGUSTUS, and further improves accuracy using the TSEBRA combiner. BRAKER3 annotates protein-coding genes in eukaryotic genomes using both short-read RNA-seq and a large protein database, along with statistical models learned iteratively and specifically for the target genome. We benchmarked the new pipeline on genomes of 11 species under an assumed level of relatedness of the target species proteome to available proteomes. BRAKER3 outperforms BRAKER1 and BRAKER2. The average transcript-level F1-score is increased by about 20 percentage points on average, whereas the difference is most pronounced for species with large and complex genomes. BRAKER3 also outperforms other existing tools, MAKER2, Funannotate, and FINDER. The code of BRAKER3 is available on GitHub and as a ready-to-run Docker container for execution with Docker or Singularity. Overall, BRAKER3 is an accurate, easy-to-use tool for eukaryotic genome annotation.


Assuntos
Anotação de Sequência Molecular , Software , Anotação de Sequência Molecular/métodos , Humanos , RNA-Seq/métodos , Algoritmos , Animais , Genoma , Biologia Computacional/métodos , Genômica/métodos , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA