Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 331
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 583(7818): 693-698, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728248

RESUMO

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Metilação de DNA , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Genômica/normas , Genômica/tendências , Histonas/metabolismo , Humanos , Camundongos , Anotação de Sequência Molecular/normas , Controle de Qualidade , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo
2.
Nucleic Acids Res ; 52(D1): D1246-D1252, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37956338

RESUMO

Advancements in high-throughput technology offer researchers an extensive range of multi-omics data that provide deep insights into the complex landscape of cancer biology. However, traditional statistical models and databases are inadequate to interpret these high-dimensional data within a multi-omics framework. To address this limitation, we introduce DriverDBv4, an updated iteration of the DriverDB cancer driver gene database (http://driverdb.bioinfomics.org/). This updated version offers several significant enhancements: (i) an increase in the number of cohorts from 33 to 70, encompassing approximately 24 000 samples; (ii) inclusion of proteomics data, augmenting the existing types of omics data and thus expanding the analytical scope; (iii) implementation of multiple multi-omics algorithms for identification of cancer drivers; (iv) new visualization features designed to succinctly summarize high-context data and redesigned existing sections to accommodate the increased volume of datasets and (v) two new functions in Customized Analysis, specifically designed for multi-omics driver identification and subgroup expression analysis. DriverDBv4 facilitates comprehensive interpretation of multi-omics data across diverse cancer types, thereby enriching the understanding of cancer heterogeneity and aiding in the development of personalized clinical approaches. The database is designed to foster a more nuanced understanding of the multi-faceted nature of cancer.


Assuntos
Bases de Dados Genéticas , Multiômica , Neoplasias , Humanos , Algoritmos , Bases de Dados Genéticas/normas , Neoplasias/genética , Neoplasias/fisiopatologia
3.
Nucleic Acids Res ; 52(D1): D174-D182, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962376

RESUMO

JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.


Assuntos
Bases de Dados Genéticas , Ligação Proteica , Fatores de Transcrição , Animais , Humanos , Camundongos , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Plantas/genética
4.
Am J Hum Genet ; 108(10): 1813-1816, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34626580

RESUMO

The use of approved nomenclature in publications is vital to enable effective scientific communication and is particularly crucial when discussing genes of clinical relevance. Here, we discuss several examples of cases where the failure of researchers to use a HUGO Gene Nomenclature Committee (HGNC)-approved symbol in publications has led to confusion between unrelated human genes in the literature. We also inform authors of the steps they can take to ensure that they use approved nomenclature in their manuscripts and discuss how referencing HGNC IDs can remove ambiguity when referring to genes that have previously been published with confusing alias symbols.


Assuntos
Bases de Dados Genéticas/normas , Genes/genética , Genoma Humano , Pesquisadores/normas , Terminologia como Assunto , Genômica , Humanos
6.
Trends Genet ; 36(6): 390-394, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32396832

RESUMO

Although public repository requirements are aimed at researchers and designed to ensure that the utility of the limited data we have is optimized, these policies also have ramifications for research participants. In this opinion article, I discuss how the nature of such repositories can subject participants whose data are 'banked' to unwitting participation in scientific projects they might find objectionable. In addition, concerns about the privacy of banked genomic data are exacerbated by recent projects that demonstrate the ability to re-identify genomic data, raising the specter of discriminatory or oppressive use of this information. These concerns are most likely to discourage participation in research that requires data sharing among those who have experienced these phenomena and are less likely to discount their likelihood.


Assuntos
Variação Biológica da População , Pesquisa Biomédica/normas , Bases de Dados Genéticas/normas , Genômica/normas , Disseminação de Informação/métodos , Metadados/normas , Humanos , Seleção de Pacientes , Privacidade
7.
Brief Bioinform ; 22(1): 288-297, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31998941

RESUMO

Circular RNAs (circRNAs) are covalently closed RNA molecules that have been linked to various diseases, including cancer. However, a precise function and working mechanism are lacking for the larger majority. Following many different experimental and computational approaches to identify circRNAs, multiple circRNA databases were developed as well. Unfortunately, there are several major issues with the current circRNA databases, which substantially hamper progression in the field. First, as the overlap in content is limited, a true reference set of circRNAs is lacking. This results from the low abundance and highly specific expression of circRNAs, and varying sequencing methods, data-analysis pipelines, and circRNA detection tools. A second major issue is the use of ambiguous nomenclature. Thus, redundant or even conflicting names for circRNAs across different databases contribute to the reproducibility crisis. Third, circRNA databases, in essence, rely on the position of the circRNA back-splice junction, whereas alternative splicing could result in circRNAs with different length and sequence. To uniquely identify a circRNA molecule, the full circular sequence is required. Fourth, circRNA databases annotate circRNAs' microRNA binding and protein-coding potential, but these annotations are generally based on presumed circRNA sequences. Finally, several databases are not regularly updated, contain incomplete data or suffer from connectivity issues. In this review, we present a comprehensive overview of the current circRNA databases and their content, features, and usability. In addition to discussing the current issues regarding circRNA databases, we come with important suggestions to streamline further research in this growing field.


Assuntos
Bases de Dados Genéticas/normas , RNA Circular/genética , Animais , Bases de Dados Genéticas/tendências , Genômica/métodos , Humanos , RNA Circular/química
8.
Brief Bioinform ; 22(1): 545-556, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32026945

RESUMO

MOTIVATION: Although gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected datasets and biological reasoning on the relevance of resulting enriched gene sets. RESULTS: We develop an extensible framework for reproducible benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization and detection of relevant processes. This framework incorporates a curated compendium of 75 expression datasets investigating 42 human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods, identifying significant differences in runtime and applicability to RNA-seq data, fraction of enriched gene sets depending on the null hypothesis tested and recovery of the predefined relevance rankings. We make practical recommendations on how methods originally developed for microarray data can efficiently be applied to RNA-seq data, how to interpret results depending on the type of gene set test conducted and which methods are best suited to effectively prioritize gene sets with high phenotype relevance. AVAILABILITY: http://bioconductor.org/packages/GSEABenchmarkeR. CONTACT: ludwig.geistlinger@sph.cuny.edu.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , RNA-Seq/métodos , Animais , Benchmarking , Bases de Dados Genéticas/normas , Perfilação da Expressão Gênica/normas , Genômica/normas , Humanos , RNA-Seq/normas , Software
9.
Brief Bioinform ; 22(1): 463-473, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885040

RESUMO

Small noncoding RNAs (sRNA/sncRNAs) are generated from different genomic loci and play important roles in biological processes, such as cell proliferation and the regulation of gene expression. Next-generation sequencing (NGS) has provided an unprecedented opportunity to discover and quantify diverse kinds of sncRNA, such as tRFs (tRNA-derived small RNA fragments), phasiRNAs (phased, secondary, small-interfering RNAs), Piwi-interacting RNA (piRNAs) and plant-specific 24-nt short interfering RNAs (siRNAs). However, currently available web-based tools do not provide approaches to comprehensively analyze all of these diverse sncRNAs. This study presents a novel integrated platform, sRNAtools (https://bioinformatics.caf.ac.cn/sRNAtools), that can be used in conjunction with high-throughput sequencing to identify and functionally annotate sncRNAs, including profiling microRNAss, piRNAs, tRNAs, small nuclear RNAs, small nucleolar RNAs and rRNAs and discovering isomiRs, tRFs, phasiRNAs and plant-specific 24-nt siRNAs for up to 21 model organisms. Different modules, including single case, batch case, group case and target case, are developed to provide users with flexible ways of studying sncRNA. In addition, sRNAtools supports different ways of uploading small RNA sequencing data in a very interactive queue system, while local versions based on the program package/Docker/virtureBox are also available. We believe that sRNAtools will greatly benefit the scientific community as an integrated tool for studying sncRNAs.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pequeno RNA não Traduzido/genética , Software , Animais , Bases de Dados Genéticas/normas , Humanos , Pequeno RNA não Traduzido/química
10.
Nucleic Acids Res ; 49(D1): D743-D750, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33221926

RESUMO

Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.


Assuntos
Curadoria de Dados , Bases de Dados Genéticas/normas , Metadados/normas , Metagenoma , Humanos , Metagenômica , Padrões de Referência , Interface Usuário-Computador
11.
PLoS Comput Biol ; 17(7): e1009113, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34228723

RESUMO

PCR amplification plays an integral role in the measurement of mixed microbial communities via high-throughput DNA sequencing of the 16S ribosomal RNA (rRNA) gene. Yet PCR is also known to introduce multiple forms of bias in 16S rRNA studies. Here we present a paired modeling and experimental approach to characterize and mitigate PCR NPM-bias (PCR bias from non-primer-mismatch sources) in microbiota surveys. We use experimental data from mock bacterial communities to validate our approach and human gut microbiota samples to characterize PCR NPM-bias under real-world conditions. Our results suggest that PCR NPM-bias can skew estimates of microbial relative abundances by a factor of 4 or more, but that this bias can be mitigated using log-ratio linear models.


Assuntos
Bactérias/genética , Bases de Dados Genéticas/normas , Microbioma Gastrointestinal/genética , Reação em Cadeia da Polimerase/normas , Viés , DNA Bacteriano/genética , Humanos
12.
Nat Methods ; 15(8): 595-597, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30013044

RESUMO

Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.


Assuntos
Bases de Dados Genéticas/estatística & dados numéricos , Variação Genética , Algoritmos , Benchmarking , Linhagem Celular Tumoral , Bases de Dados Genéticas/normas , Diploide , Feminino , Genoma Humano , Homozigoto , Humanos , Mola Hidatiforme/genética , Gravidez , Biologia Sintética , Neoplasias Uterinas/genética , Sequenciamento Completo do Genoma/estatística & dados numéricos
13.
BMC Cancer ; 21(1): 810, 2021 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-34266411

RESUMO

BACKGROUND: Bladder cancer (BC) is the ninth most common malignant tumor. We constructed a risk signature using immune-related gene pairs (IRGPs) to predict the prognosis of BC patients. METHODS: The mRNA transcriptome, simple nucleotide variation and clinical data of BC patients were downloaded from The Cancer Genome Atlas (TCGA) database (TCGA-BLCA). The mRNA transcriptome and clinical data were also extracted from Gene Expression Omnibus (GEO) datasets (GSE31684). A risk signature was built based on the IRGPs. The ability of the signature to predict prognosis was analyzed with survival curves and Cox regression. The relationships between immunological parameters [immune cell infiltration, immune checkpoints, tumor microenvironment (TME) and tumor mutation burden (TMB)] and the risk score were investigated. Finally, gene set enrichment analysis (GSEA) was used to explore molecular mechanisms underlying the risk score. RESULTS: The risk signature utilized 30 selected IRGPs. The prognosis of the high-risk group was significantly worse than that of the low-risk group. We used the GSE31684 dataset to validate the signature. Close relationships were found between the risk score and immunological parameters. Finally, GSEA showed that gene sets related to the extracellular matrix (ECM), stromal cells and epithelial-mesenchymal transition (EMT) were enriched in the high-risk group. In the low-risk group, we found a number of immune-related pathways in the enriched pathways and biofunctions. CONCLUSIONS: We used a new tool, IRGPs, to build a risk signature to predict the prognosis of BC. By evaluating immune parameters and molecular mechanisms, we gained a better understanding of the mechanisms underlying the risk signature. This signature can also be used as a tool to predict the effect of immunotherapy in patients with BC.


Assuntos
Bases de Dados Genéticas/normas , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias da Bexiga Urinária/genética , Idoso , Humanos , Prognóstico , Análise de Sobrevida , Neoplasias da Bexiga Urinária/mortalidade
15.
Nucleic Acids Res ; 47(D1): D649-D659, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357420

RESUMO

The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is an open online resource, which maintains an up-to-date catalog of genome and metagenome projects in the context of a comprehensive list of associated metadata. Information in GOLD is organized into four levels: Study, Biosample/Organism, Sequencing Project and Analysis Project. Currently GOLD hosts information on 33 415 Studies, 49 826 Biosamples, 313 324 Organisms, 215 881 Sequencing Projects and 174 454 Analysis Projects with a total of 541 metadata fields, of which 80 are based on controlled vocabulary (CV) terms. GOLD provides a user-friendly web interface to browse sequencing projects and launch advanced search tools across four classification levels. Users submit metadata on a wide range of Sequencing and Analysis Projects in GOLD before depositing sequence data to the Integrated Microbial Genomes (IMG) system for analysis. GOLD conforms with and supports the rules set by the Genomic Standards Consortium (GSC) Minimum Information standards. The current version of GOLD (v.7) has seen the number of projects and associated metadata increase exponentially over the years. This paper provides an update on the current status of GOLD and highlights the new features added over the last two years.


Assuntos
Bases de Dados Genéticas/normas , Genômica/métodos , Software/normas , Ontologia Genética
16.
Nucleic Acids Res ; 47(D1): D786-D792, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30304474

RESUMO

The HUGO Gene Nomenclature Committee (HGNC) based at EMBL's European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. There are over 40 000 approved gene symbols in our current database of which over 19 000 are for protein-coding genes. The Vertebrate Gene Nomenclature Committee (VGNC) was established in 2016 to assign standardized nomenclature in line with human for vertebrate species that lack their own nomenclature committees. The VGNC initially assigned nomenclature for over 15000 protein-coding genes in chimpanzee. We have extended this process to other vertebrate species, naming over 14000 protein-coding genes in cow and dog and over 13 000 in horse to date. Our HGNC website https://www.genenames.org has undergone a major design update, simplifying the homepage to provide easy access to our search tools and making the site more mobile friendly. Our gene families pages are now known as 'gene groups' and have increased in number to over 1200, with nearly half of all named genes currently assigned to at least one gene group. This article provides an overview of our online data and resources, focusing on our work over the last two years.


Assuntos
Biologia Computacional/normas , Bases de Dados Genéticas/normas , Genômica/normas , Terminologia como Assunto , Animais , Bovinos , Cães , Cavalos/genética , Humanos , Pan troglodytes/genética , Ferramenta de Busca
17.
BMC Biol ; 18(1): 37, 2020 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-32264902

RESUMO

Metagenomics studies leverage genomic reference databases to generate discoveries in basic science and translational research. However, current microbial studies use disparate reference databases that lack consistent standards of specimen inclusion, data preparation, taxon labelling and accessibility, hindering their quality and comprehensiveness, and calling for the establishment of recommendations for reference genome database assembly. Here, we analyze existing fungal and bacterial databases and discuss guidelines for the development of a master reference database that promises to improve the quality and quantity of omics research.


Assuntos
Bactérias/genética , Bases de Dados Genéticas/normas , Fungos/genética , Metagenômica/normas , Metagenômica/instrumentação
18.
PLoS Biol ; 15(6): e2002477, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28594819

RESUMO

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.


Assuntos
Acesso à Informação , Pesquisa Biomédica/métodos , Bases de Dados Genéticas , Animais , Pesquisa Biomédica/tendências , Biotecnologia/tendências , Biologia Computacional/tendências , Mineração de Dados , Bases de Dados Bibliográficas , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica , Humanos , Automação de Bibliotecas , Dados de Sequência Molecular , National Library of Medicine (U.S.) , Publicações Periódicas como Assunto , Reprodutibilidade dos Testes , Fatores de Tempo , Estados Unidos
20.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126148

RESUMO

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Assuntos
Sequência Consenso , Bases de Dados Genéticas , Fases de Leitura Aberta , Animais , Curadoria de Dados/métodos , Curadoria de Dados/normas , Bases de Dados Genéticas/normas , Guias como Assunto , Humanos , Camundongos , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA