Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849569

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

2.
Nucleic Acids Res ; 52(D1): D255-D264, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971353

RESUMO

RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.


Assuntos
Bases de Dados Genéticas , Escherichia coli K12 , Regulação Bacteriana da Expressão Gênica , Biologia Computacional/métodos , Escherichia coli K12/genética , Internet , Transcrição Gênica
3.
Nat Rev Genet ; 20(11): 693-701, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31455890

RESUMO

Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.


Assuntos
Pesquisa Biomédica , Genoma Humano , Projeto Genoma Humano , Europa (Continente) , Humanos
5.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-38038272

RESUMO

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Assuntos
Benchmarking , Proteômica , Fluxo de Trabalho , Software , Proteínas , Análise de Dados
6.
Nucleic Acids Res ; 50(D1): D1062-D1068, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718760

RESUMO

PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.


Assuntos
Bases de Dados Genéticas , Evolução Molecular , Genoma/genética , Software , Animais , Humanos , Bases de Conhecimento , Anotação de Sequência Molecular , Filogenia , Plantas/genética , Proteoma/genética
7.
Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35552456

RESUMO

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.


Assuntos
Benchmarking , Genômica , Filogenia , Genômica/métodos , Proteoma
8.
Nucleic Acids Res ; 48(W1): W538-W545, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32374845

RESUMO

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.


Assuntos
Família Multigênica , Proteoma , Software , Animais , Benchmarking , Consenso , Genômica , Humanos , Camundongos , Filogenia , Ratos
9.
Nature ; 505(7484): 546-9, 2014 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-24352233

RESUMO

Sugar beet (Beta vulgaris ssp. vulgaris) is an important crop of temperate climates which provides nearly 30% of the world's annual sugar production and is a source for bioethanol and animal feed. The species belongs to the order of Caryophylalles, is diploid with 2n = 18 chromosomes, has an estimated genome size of 714-758 megabases and shares an ancient genome triplication with other eudicot plants. Leafy beets have been cultivated since Roman times, but sugar beet is one of the most recently domesticated crops. It arose in the late eighteenth century when lines accumulating sugar in the storage root were selected from crosses made with chard and fodder beet. Here we present a reference genome sequence for sugar beet as the first non-rosid, non-asterid eudicot genome, advancing comparative genomics and phylogenetic reconstructions. The genome sequence comprises 567 megabases, of which 85% could be assigned to chromosomes. The assembly covers a large proportion of the repetitive sequence content that was estimated to be 63%. We predicted 27,421 protein-coding genes supported by transcript data and annotated them on the basis of sequence homology. Phylogenetic analyses provided evidence for the separation of Caryophyllales before the split of asterids and rosids, and revealed lineage-specific gene family expansions and losses. We sequenced spinach (Spinacia oleracea), another Caryophyllales species, and validated features that separate this clade from rosids and asterids. Intraspecific genomic variation was analysed based on the genome sequences of sea beet (Beta vulgaris ssp. maritima; progenitor of all beet crops) and four additional sugar beet accessions. We identified seven million variant positions in the reference genome, and also large regions of low variability, indicating artificial selection. The sugar beet genome sequence enables the identification of genes affecting agronomically relevant traits, supports molecular breeding and maximizes the plant's potential in energy biotechnology.


Assuntos
Beta vulgaris/genética , Produtos Agrícolas/genética , Genoma de Planta/genética , Biocombustíveis/provisão & distribuição , Metabolismo dos Carboidratos , Cromossomos de Plantas/genética , Etanol/metabolismo , Genômica , Hibridização in Situ Fluorescente , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Spinacia oleracea/genética
10.
Nat Methods ; 13(5): 425-30, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Assuntos
Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie
11.
Bioinformatics ; 34(2): 323-329, 2018 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-28968857

RESUMO

The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

12.
Nucleic Acids Res ; 45(W1): W484-W489, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28531339

RESUMO

A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Software , Sistema Enzimático do Citocromo P-450 , Mineração de Dados , Genes , Internet , Fígado/efeitos dos fármacos
13.
Genome Res ; 25(9): 1256-67, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26194102

RESUMO

Selenoproteins are proteins that incorporate selenocysteine (Sec), a nonstandard amino acid encoded by UGA, normally a stop codon. Sec synthesis requires the enzyme Selenophosphate synthetase (SPS or SelD), conserved in all prokaryotic and eukaryotic genomes encoding selenoproteins. Here, we study the evolutionary history of SPS genes, providing a map of selenoprotein function spanning the whole tree of life. SPS is itself a selenoprotein in many species, although functionally equivalent homologs that replace the Sec site with cysteine (Cys) are common. Many metazoans, however, possess SPS genes with substitutions other than Sec or Cys (collectively referred to as SPS1). Using complementation assays in fly mutants, we show that these genes share a common function, which appears to be distinct from the synthesis of selenophosphate carried out by the Sec- and Cys- SPS genes (termed SPS2), and unrelated to Sec synthesis. We show here that SPS1 genes originated through a number of independent gene duplications from an ancestral metazoan selenoprotein SPS2 gene that most likely already carried the SPS1 function. Thus, in SPS genes, parallel duplications and subsequent convergent subfunctionalization have resulted in the segregation to different loci of functions initially carried by a single gene. This evolutionary history constitutes a remarkable example of emergence and evolution of gene function, which we have been able to trace thanks to the singular features of SPS genes, wherein the amino acid at a single site determines unequivocally protein function and is intertwined to the evolutionary fate of the entire selenoproteome.


Assuntos
Evolução Biológica , Fosfotransferases/genética , Fosfotransferases/metabolismo , Animais , Biomarcadores , Eucariotos/genética , Eucariotos/metabolismo , Duplicação Gênica , Humanos , Insetos , Filogenia , Células Procarióticas/metabolismo , Seleção Genética , Selênio/metabolismo , Selenoproteínas/genética , Selenoproteínas/metabolismo , Urocordados , Vertebrados
15.
Plant Cell ; 27(4): 954-68, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25873387

RESUMO

Here, we report the draft genome sequence of Solanum commersonii, which consists of ∼830 megabases with an N50 of 44,303 bp anchored to 12 chromosomes, using the potato (Solanum tuberosum) genome sequence as a reference. Compared with potato, S. commersonii shows a striking reduction in heterozygosity (1.5% versus 53 to 59%), and differences in genome sizes were mainly due to variations in intergenic sequence length. Gene annotation by ab initio prediction supported by RNA-seq data produced a catalog of 1703 predicted microRNAs, 18,882 long noncoding RNAs of which 20% are shown to target cold-responsive genes, and 39,290 protein-coding genes with a significant repertoire of nonredundant nucleotide binding site-encoding genes and 126 cold-related genes that are lacking in S. tuberosum. Phylogenetic analyses indicate that domesticated potato and S. commersonii lineages diverged ∼2.3 million years ago. Three duplication periods corresponding to genome enrichment for particular gene families related to response to salt stress, water transport, growth, and defense response were discovered. The draft genome sequence of S. commersonii substantially increases our understanding of the domesticated germplasm, facilitating translation of acquired knowledge into advances in crop stability in light of global climate and environmental changes.


Assuntos
Genoma de Planta/genética , Solanum tuberosum/genética , Solanum/genética , Aclimatação , Evolução Biológica , Filogenia , Solanum/classificação , Solanum tuberosum/classificação
16.
PLoS Biol ; 12(11): e1002005, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25423365

RESUMO

Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.


Assuntos
Artrópodes/genética , Genoma , Sintenia , Animais , Peptídeos e Proteínas de Sinalização do Ritmo Circadiano/genética , Metilação de DNA , Evolução Molecular , Feminino , Genoma Mitocondrial , Hormônios/genética , Masculino , Família Multigênica , Filogenia , Polimorfismo Genético , Proteínas Quinases/genética , RNA não Traduzido/genética , Receptores Odorantes/genética , Selenoproteínas/genética , Cromossomos Sexuais , Fatores de Transcrição/genética
18.
Nucleic Acids Res ; 42(7): e54, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24476915

RESUMO

Reconstructing the evolutionary relationships of species is a major goal in biology. Despite the increasing number of completely sequenced genomes, a large number of phylogenetic projects rely on targeted sequencing and analysis of a relatively small sample of marker genes. The selection of these phylogenetic markers should ideally be based on accurate predictions of their combined, rather than individual, potential to accurately resolve the phylogeny of interest. Here we present and validate a new phylogenomics strategy to efficiently select a minimal set of stable markers able to reconstruct the underlying species phylogeny. In contrast to previous approaches, our methodology does not only rely on the ability of individual genes to reconstruct a known phylogeny, but it also explores the combined power of sets of concatenated genes to accurately infer phylogenetic relationships of species not previously analyzed. We applied our approach to two broad sets of cyanobacterial and ascomycetous fungal species, and provide two minimal sets of six and four genes, respectively, necessary to fully resolve the target phylogenies. This approach paves the way for the informed selection of phylogenetic markers in the effort of reconstructing the tree of life.


Assuntos
Genômica/métodos , Filogenia , Ascomicetos/classificação , Ascomicetos/genética , Cianobactérias/classificação , Cianobactérias/genética , Genes Bacterianos , Genes Fúngicos , Marcadores Genéticos
19.
Nucleic Acids Res ; 42(Database issue): D897-902, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24275491

RESUMO

Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.


Assuntos
Bases de Dados Genéticas , Genoma , Filogenia , Perfilação da Expressão Gênica , Humanos , Internet , Proteínas/classificação , Proteínas/genética , Proteoma
20.
Proc Natl Acad Sci U S A ; 110(13): 5247-52, 2013 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-23503846

RESUMO

Red seaweeds are key components of coastal ecosystems and are economically important as food and as a source of gelling agents, but their genes and genomes have received little attention. Here we report the sequencing of the 105-Mbp genome of the florideophyte Chondrus crispus (Irish moss) and the annotation of the 9,606 genes. The genome features an unusual structure characterized by gene-dense regions surrounded by repeat-rich regions dominated by transposable elements. Despite its fairly large size, this genome shows features typical of compact genomes, e.g., on average only 0.3 introns per gene, short introns, low median distance between genes, small gene families, and no indication of large-scale genome duplication. The genome also gives insights into the metabolism of marine red algae and adaptations to the marine environment, including genes related to halogen metabolism, oxylipins, and multicellularity (microRNA processing and transcription factors). Particularly interesting are features related to carbohydrate metabolism, which include a minimalistic gene set for starch biosynthesis, the presence of cellulose synthases acquired before the primary endosymbiosis showing the polyphyly of cellulose synthesis in Archaeplastida, and cellulases absent in terrestrial plants as well as the occurrence of a mannosylglycerate synthase potentially originating from a marine bacterium. To explain the observations on genome structure and gene content, we propose an evolutionary scenario involving an ancestral red alga that was driven by early ecological forces to lose genes, introns, and intergenetic DNA; this loss was followed by an expansion of genome size as a consequence of activity of transposable elements.


Assuntos
Chondrus/genética , Evolução Molecular , Genes de Plantas , Sequência de Bases , MicroRNAs/genética , Dados de Sequência Molecular , Proteínas de Plantas/genética , RNA de Plantas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA