Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 626(7998): 377-384, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38109938

RESUMO

Many of the Earth's microbes remain uncultured and understudied, limiting our understanding of the functional and evolutionary aspects of their genetic material, which remain largely overlooked in most metagenomic studies1. Here we analysed 149,842 environmental genomes from multiple habitats2-6 and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel (FESNov) gene families exclusive to uncultivated prokaryotic taxa. All FESNov families span multiple species, exhibit strong signals of purifying selection and qualify as new orthologous groups, thus nearly tripling the number of bacterial and archaeal gene families described to date. The FESNov catalogue is enriched in clade-specific traits, including 1,034 novel families that can distinguish entire uncultivated phyla, classes and orders, probably representing synapomorphies that facilitated their evolutionary divergence. Using genomic context analysis and structural alignments we predicted functional associations for 32.4% of FESNov families, including 4,349 high-confidence associations with important biological processes. These predictions provide a valuable hypothesis-driven framework that we used for experimental validatation of a new gene family involved in cell motility and a novel set of antimicrobial peptides. We also demonstrate that the relative abundance profiles of novel families can discriminate between environments and clinical conditions, leading to the discovery of potentially new biomarkers associated with colorectal cancer. We expect this work to enhance future metagenomics studies and expand our knowledge of the genetic repertory of uncultivated organisms.


Assuntos
Archaea , Bactérias , Ecossistema , Evolução Molecular , Genes Arqueais , Genes Bacterianos , Genômica , Conhecimento , Peptídeos Antimicrobianos/genética , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biomarcadores , Movimento Celular/genética , Neoplasias Colorretais/genética , Genômica/métodos , Genômica/tendências , Metagenômica/tendências , Família Multigênica , Filogenia , Reprodutibilidade dos Testes
2.
Nucleic Acids Res ; 51(D1): D389-D394, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36399505

RESUMO

The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 535 reference species, expands functional annotations, and implements new functionality. In total, eggNOG 6.0 provides a hierarchy of over 17M orthologous groups (OGs) computed at 1601 taxonomic levels, spanning 10 756 bacterial, 457 archaeal and 1322 eukaryotic organisms. OGs have been thoroughly annotated using recent knowledge from functional databases, including KEGG, Gene Ontology, UniProtKB, BiGG, CAZy, CARD, PFAM and SMART. eggNOG also offers phylogenetic trees for all OGs, maximising utility and versatility for end users while allowing researchers to investigate the evolutionary history of speciation and duplication events as well as the phylogenetic distribution of functional terms within each OG. Furthermore, the eggNOG 6.0 website contains new functionality to mine orthology and functional data with ease, including the possibility of generating phylogenetic profiles for multiple OGs across species or identifying single-copy OGs at custom taxonomic levels. eggNOG 6.0 is available at http://eggnog6.embl.de.


Assuntos
Bases de Dados Genéticas , Genômica , Filogenia , Biologia Computacional , Eucariotos/genética
3.
Methods Mol Biol ; 2512: 121-152, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35818004

RESUMO

The pangenome of a species is the sum of the genomes of its individuals. As coding sequences often represent only a small fraction of each genome, analyzing the pangene set can be a cost-effective strategy for plants with large genomes or highly heterozygous species. Here, we describe a step-by-step protocol to analyze plant pangene sets with the software GET_HOMOLOGUES-EST . After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate pantranscriptomes and gene sets of plants. The recipes include instructions on how to call core and accessory genes, how to compute a presence-absence pangenome matrix, and how to identify and analyze private genes, present only in some genotypes. Downstream phylogenetic analyses are also discussed.


Assuntos
Software , Humanos , Filogenia
4.
Nucleic Acids Res ; 50(W1): W577-W582, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35544233

RESUMO

Phylogenomics data have grown exponentially over the last decades. It is currently common for genome-wide projects to generate hundreds or even thousands of phylogenetic trees and multiple sequence alignments, which may also be very large in size. However, the analysis and interpretation of such data still depends on custom bioinformatic and visualisation workflows that are largely unattainable for non-expert users. Here, we present PhyloCloud, an online platform aimed at hosting, indexing and exploring large phylogenetic tree collections, providing also seamless access to common analyses and operations, such as node annotation, searching, topology editing, automatic tree rooting, orthology detection and more. In addition, PhyloCloud provides quick access to tools that allow users to build their own phylogenies using fast predefined workflows, graphically compare tree topologies, or query taxonomic databases such as NBCI or GTDB. Finally, PhyloCloud offers a novel tree visualisation system based on ETE Toolkit v4.0, which can be used to explore very large trees and enhance them with custom annotations and multiple sequence alignments. The platform allows for sharing tree collections and specific tree views via private links, or make them fully public, serving also as a repository of phylogenomic data. PhyloCloud is available at https://phylocloud.cgmlab.org.


Assuntos
Biologia Computacional , Genoma , Filogenia , Alinhamento de Sequência , Bases de Dados Genéticas
5.
Nature ; 601(7892): 252-256, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34912116

RESUMO

Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats1-3, little is known about the distribution of genes across the global biosphere, with implications for human and planetary health. Here we constructed a non-redundant gene catalogue of 303 million species-level genes (clustered at 95% nucleotide identity) from 13,174 publicly available metagenomes across 14 major habitats and use it to show that most genes are specific to a single habitat. The small fraction of genes found in multiple habitats is enriched in antibiotic-resistance genes and markers for mobile genetic elements. By further clustering these species-level genes into 32 million protein families, we observed that a small fraction of these families contain the majority of the genes (0.6% of families account for 50% of the genes). The majority of species-level genes and protein families are rare. Furthermore, species-level genes, and in particular the rare ones, show low rates of positive (adaptive) selection, supporting a model in which most genetic variability observed within each protein family is neutral or nearly neutral.


Assuntos
Metagenoma , Metagenômica , Antibacterianos/farmacologia , Resistência Microbiana a Medicamentos , Ecossistema , Humanos , Metagenoma/genética
6.
Science ; 374(6568): 717-723, 2021 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-34735222

RESUMO

The evolutionary origin of metazoan cell types such as neurons and muscles is not known. Using whole-body single-cell RNA sequencing in a sponge, an animal without nervous system and musculature, we identified 18 distinct cell types. These include nitric oxide­sensitive contractile pinacocytes, amoeboid phagocytes, and secretory neuroid cells that reside in close contact with digestive choanocytes that express scaffolding and receptor proteins. Visualizing neuroid cells by correlative x-ray and electron microscopy revealed secretory vesicles and cellular projections enwrapping choanocyte microvilli and cilia. Our data show a communication system that is organized around sponge digestive chambers, using conserved modules that became incorporated into the pre- and postsynapse in the nervous systems of other animals.


Assuntos
Evolução Biológica , Poríferos/citologia , Animais , Comunicação Celular , Extensões da Superfície Celular/ultraestrutura , Cílios/fisiologia , Cílios/ultraestrutura , Sistema Digestório/citologia , Mesoderma/citologia , Sistema Nervoso/citologia , Fenômenos Fisiológicos do Sistema Nervoso , Óxido Nítrico/metabolismo , Poríferos/genética , Poríferos/metabolismo , RNA-Seq , Vesículas Secretórias/ultraestrutura , Transdução de Sinais , Análise de Célula Única , Transcriptoma
7.
Mol Biol Evol ; 38(12): 5825-5829, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34597405

RESUMO

Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.


Assuntos
Bases de Dados Genéticas , Metagenômica , Genômica , Metagenoma , Anotação de Sequência Molecular , Filogenia , Software
8.
Front Plant Sci ; 10: 434, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31031782

RESUMO

The Spanish Barley Core Collection (SBCC) is a source of genetic variability of potential interest for breeding, particularly for adaptation to Mediterranean environments. Two backcross populations (BC2F5) were developed using the elite cultivar Cierzo as the recurrent parent. The donor parents, namely SBCC042 and SBCC073, were selected from the SBCC lines due to their outstanding yield in drought environments. Flowering time, yield and drought-related traits were evaluated in two field trials in Zaragoza (Spain) during the 2014-15 and 2015-16 seasons and validated in the 2017-18 season. Two hundred sixty-four lines of each population were genotyped with the Barley Illumina iSelect 50k SNP chip. Genetic maps for each population were generated. The map for SBCC042 × Cierzo contains 12,893 SNPs distributed in 9 linkage groups. The map for SBCC073 × Cierzo includes 12,026 SNPs in 7 linkage groups. Both populations shared two QTL hotspots. There are QTLs for flowering time, thousand-kernel weight (TKW), and hectoliter weight on a segment of 23 Mb at ~515 Mb on chromosome 1H, which encompasses the HvFT3 gene. In both populations, flowering was accelerated by the landrace allele, which also increased the TKW. In the same region, better soil coverage was contributed by SBCC042 but coincident with a lower hectoliter weight. The second large hotspot was on chromosome 6H and contained QTLs with wide intervals for grain yield, plant height and TKW. Landrace alleles contributed to increased plant height and TKW and reduced grain yield. Only SBCC042 contributed favorable alleles for "green area," with three significant QTLs that increased ground coverage after winter, which might be exploited as an adaptive trait of this landrace. Some genes of interest found in or very close to the peaks of the QTLs are highlighted. Strategies to deploy the QTLs found for breeding and pre-breeding are proposed.

9.
Mol Ecol ; 28(8): 1994-2012, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30614595

RESUMO

Landraces are local populations of crop plants adapted to a particular environment. Extant landraces are surviving genetic archives, keeping signatures of the selection processes experienced by them until settling in their current niches. This study intends to establish relationships between genetic diversity of barley (Hordeum vulgare L.) landraces collected in Spain and the climate of their collection sites. A high-resolution climatic data set (5 × 5 km spatial, 1-day temporal grid) was computed from over 2,000 temperature and 7,000 precipitation stations across peninsular Spain. This data set, spanning the period 1981-2010, was used to derive agroclimatic variables meaningful for cereal production at the collection sites of 135 barley landraces. Variables summarize temperature, precipitation, evapotranspiration, potential vernalization and frost probability at different times of the year and time scales (season and month). SNP genotyping of the landraces was carried out combining Illumina Infinium assays and genotyping-by-sequencing, yielding 9,920 biallelic markers (7,479 with position on the barley reference genome). The association of these SNPs with agroclimatic variables was analysed at two levels of genetic diversity, with and without taking into account population structure. The whole data sets and analysis pipelines are documented and available at https://eead-csic-compbio.github.io/barley-agroclimatic-association. We found differential adaptation of the germplasm groups identified to be dominated by reactions to cold temperature and late-season frost occurrence, as well as to water availability. Several significant associations pointing at specific adaptations to agroclimatic features related to temperature and water availability were observed, and candidate genes underlying some of the main regions are proposed.


Assuntos
Adaptação Fisiológica/genética , Clima , Hordeum/genética , Seleção Genética/genética , Meio Ambiente , Europa (Continente) , Variação Genética/genética , Genoma de Planta/genética , Genótipo , Hordeum/crescimento & desenvolvimento , Repetições de Microssatélites/genética , Fenótipo , Estações do Ano , Espanha
10.
New Phytol ; 218(4): 1631-1644, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29206296

RESUMO

Few pan-genomic studies have been conducted in plants, and none of them have focused on the intraspecific diversity and evolution of their plastid genomes. We address this issue in Brachypodium distachyon and its close relatives B. stacei and B. hybridum, for which a large genomic data set has been compiled. We analyze inter- and intraspecific plastid comparative genomics and phylogenomic relationships within a family-wide framework. Major indel differences were detected between Brachypodium plastomes. Within B. distachyon, we detected two main lineages, a mostly Extremely Delayed Flowering (EDF+) clade and a mostly Spanish (S+) - Turkish (T+) clade, plus nine chloroplast capture and two plastid DNA (ptDNA) introgression and micro-recombination events. Early Oligocene (30.9 million yr ago (Ma)) and Late Miocene (10.1 Ma) divergence times were inferred for the respective stem and crown nodes of Brachypodium and a very recent Mid-Pleistocene (0.9 Ma) time for the B. distachyon split. Flowering time variation is a main factor driving rapid intraspecific divergence in B. distachyon, although it is counterbalanced by repeated introgression between previously isolated lineages. Swapping of plastomes between the three different genomic groups, EDF+, T+, S+, probably resulted from random backcrossing followed by stabilization through selection pressure.


Assuntos
Brachypodium/classificação , Brachypodium/genética , Ecótipo , Flores/fisiologia , Genomas de Plastídeos , Genômica , Filogenia , Recombinação Genética/genética , Sequência de Bases , Evolução Molecular , Genes de Plantas , Variação Genética , Geografia , Haplótipos/genética , Região do Mediterrâneo , Fatores de Tempo
11.
Front Plant Sci ; 8: 647, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28507554

RESUMO

Drought causes important losses in crop production every season. Improvement for drought tolerance could take advantage of the diversity held in germplasm collections, much of which has not been incorporated yet into modern breeding. Spanish landraces constitute a promising resource for barley breeding, as they were widely grown until last century and still show good yielding ability under stress. Here, we study the transcriptome expression landscape in two genotypes, an outstanding Spanish landrace-derived inbred line (SBCC073) and a modern cultivar (Scarlett). Gene expression of adult plants after prolonged stresses, either drought or drought combined with heat, was monitored. Transcriptome of mature leaves presented little changes under severe drought, whereas abundant gene expression changes were observed under combined mild drought and heat. Developing inflorescences of SBCC073 exhibited mostly unaltered gene expression, whereas numerous changes were found in the same tissues for Scarlett. Genotypic differences in physiological traits and gene expression patterns confirmed the different behavior of landrace SBCC073 and cultivar Scarlett under abiotic stress, suggesting that they responded to stress following different strategies. A comparison with related studies in barley, addressing gene expression responses to drought, revealed common biological processes, but moderate agreement regarding individual differentially expressed transcripts. Special emphasis was put in the search of co-expressed genes and underlying common regulatory motifs. Overall, 11 transcription factors were identified, and one of them matched cis-regulatory motifs discovered upstream of co-expressed genes involved in those responses.

12.
Front Plant Sci ; 8: 184, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28261241

RESUMO

The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.

13.
Plant Genome ; 9(2)2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27898833

RESUMO

Powdery mildew causes severe yield losses in barley production worldwide. Although many resistance genes have been described, only a few have already been cloned. A strong QTL (quantitative trait locus) conferring resistance to a wide array of powdery mildew isolates was identified in a Spanish barley landrace on the long arm of chromosome 7H. Previous studies narrowed down the QTL position, but were unable to identify candidate genes or physically locate the resistance. In this study, the exome of three recombinant lines from a high-resolution mapping population was sequenced and analyzed, narrowing the position of the resistance down to a single physical contig. Closer inspection of the region revealed a cluster of closely related NBS-LRR (nucleotide-binding site-leucine-rich repeat containing protein) genes. Large differences were found between the resistant lines and the reference genome of cultivar Morex, in the form of PAV (presence-absence variation) in the composition of the NBS-LRR cluster. Finally, a template-guided assembly was performed and subsequent expression analysis revealed that one of the new assembled candidate genes is transcribed. In summary, the results suggest that NBS-LRR genes, absent from the reference and the susceptible genotypes, could be functional and responsible for the powdery mildew resistance. The procedure followed is an example of the use of NGS (next-generation sequencing) tools to tackle the challenges of gene cloning when the target gene is absent from the reference genome.


Assuntos
Resistência à Doença/genética , Hordeum/genética , Família Multigênica/genética , Proteínas NLR/genética , Mapeamento Cromossômico , Cromossomos de Plantas/genética , Fungos/fisiologia , Hordeum/microbiologia , Locos de Características Quantitativas
14.
Methods Mol Biol ; 1482: 279-95, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27557774

RESUMO

The plant-dedicated mirror of the Regulatory Sequence Analysis Tools (RSAT, http://plants.rsat.eu ) offers specialized options for researchers dealing with plant transcriptional regulation. The website contains whole-sequenced genomes from species regularly updated from Ensembl Plants and other sources (currently 40), and supports an array of tasks frequently required for the analysis of regulatory sequences, such as retrieving upstream sequences, motif discovery, motif comparison, and pattern matching. RSAT::Plants also integrates the footprintDB collection of DNA motifs. This protocol explains step-by-step how to discover DNA motifs in regulatory regions of clusters of co-expressed genes in plants. It also explains how to empirically control the significance of the result, and how to associate the discovered motifs with putative binding factors.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Elementos Reguladores de Transcrição/genética , Software , Regulação da Expressão Gênica de Plantas , Genoma de Planta/genética , Motivos de Nucleotídeos/genética , Fatores de Transcrição/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...