Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Mol Cell ; 61(6): 903-13, 2016 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-26990993

RESUMEN

Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell lines, tissues, and developmental stages is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology, and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).


Asunto(s)
Bases de Datos Genéticas , Proteínas de Unión al ARN/genética , ARN/metabolismo , Transcriptoma/genética , Sitios de Unión , Humanos , Unión Proteica , ARN/genética , ARN Interferente Pequeño/clasificación , ARN Interferente Pequeño/genética , Proteínas de Unión al ARN/metabolismo
2.
BMC Bioinformatics ; 22(1): 459, 2021 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-34563119

RESUMEN

BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


Asunto(s)
Genética de Población , Genoma Humano , Haplotipos , Humanos , Polimorfismo de Nucleótido Simple
3.
Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26527727

RESUMEN

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano , Genómica , Animales , ADN/metabolismo , Genes , Humanos , Ratones , Proteínas/metabolismo , ARN/metabolismo
4.
Genome Res ; 22(9): 1790-7, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955989

RESUMEN

As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano , Anotación de Secuencia Molecular , Proteínas de Unión al ADN/genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Internet , Péptidos y Proteínas de Señalización Intracelular/genética , Lupus Eritematoso Sistémico/genética , Proteínas Nucleares/genética , Sistemas de Lectura Abierta , Polimorfismo de Nucleótido Simple , Secuencias Reguladoras de Ácidos Nucleicos , Proteína 3 Inducida por el Factor de Necrosis Tumoral alfa
5.
Nucleic Acids Res ; 40(Database issue): D700-5, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22110037

RESUMEN

The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.


Asunto(s)
Bases de Datos Genéticas , Genoma Fúngico , Saccharomyces cerevisiae/genética , Genes Fúngicos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Fenotipo , Programas Informáticos , Terminología como Asunto
6.
Res Sq ; 2023 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-37503119

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

7.
bioRxiv ; 2023 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37066421

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

8.
Nucleic Acids Res ; 38(Database issue): D433-6, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19906697

RESUMEN

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Genoma Fúngico , Mutación , Saccharomyces cerevisiae/genética , Biología Computacional/tendencias , ADN de Hongos , Bases de Datos Genéticas , Bases de Datos de Proteínas , Genes Fúngicos , Almacenamiento y Recuperación de la Información/métodos , Internet , Fenotipo , Estructura Terciaria de Proteína , Programas Informáticos
9.
BMJ Open ; 12(10): e049657, 2022 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-36223959

RESUMEN

OBJECTIVES: The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analysing population-scale datasets in real time to monitor and better understand the evolving pandemic. The objectives of this study were to examine the relationship of risk factors to COVID-19 susceptibility and severity and to develop risk models to accurately predict COVID-19 outcomes using rapidly obtained self-reported data. DESIGN: A cross-sectional study. SETTING: AncestryDNA customers in the USA who consented to research. PARTICIPANTS: The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors and exposures for over 563 000 adult individuals in the USA in just under 4 months, including over 4700 COVID-19 cases as measured by a self-reported positive test. RESULTS: We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for men even after adjusting for known exposures and age (adjusted OR=1.36, 95% CI=1.19 to 1.55). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualised COVID-19 susceptibility (area under the curve (AUC)=0.84) and severity outcomes including hospitalisation and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex and genetic ancestry groups within the study. CONCLUSIONS: The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.


Asunto(s)
COVID-19 , Adulto , COVID-19/epidemiología , Estudios Transversales , Humanos , Masculino , Pandemias , Factores de Riesgo , SARS-CoV-2
10.
Nat Genet ; 54(4): 374-381, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35410379

RESUMEN

Multiple COVID-19 genome-wide association studies (GWASs) have identified reproducible genetic associations indicating that there is a genetic component to susceptibility and severity risk. To complement these studies, we collected deep coronavirus disease 2019 (COVID-19) phenotype data from a survey of 736,723 AncestryDNA research participants. With these data, we defined eight phenotypes related to COVID-19 outcomes: four phenotypes that align with previously studied COVID-19 definitions and four 'expanded' phenotypes that focus on susceptibility given exposure, mild clinical manifestations and an aggregate score of symptom severity. We performed a replication analysis of 12 previously reported COVID-19 genetic associations with all eight phenotypes in a trans-ancestry meta-analysis of AncestryDNA research participants. In this analysis, we show distinct patterns of association at the 12 loci with the eight outcomes that we assessed. We also performed a genome-wide discovery analysis of all eight phenotypes, which did not yield new genome-wide significant loci but did suggest that three of the four 'expanded' COVID-19 phenotypes have enhanced power to capture protective genetic associations relative to the previously studied phenotypes. Thus, we conclude that continued large-scale ascertainment of deep COVID-19 phenotype data would likely represent a boon for COVID-19 therapeutic target identification.


Asunto(s)
COVID-19 , Estudio de Asociación del Genoma Completo , COVID-19/genética , Predisposición Genética a la Enfermedad , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
11.
Nucleic Acids Res ; 36(Database issue): D577-81, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17982175

RESUMEN

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.


Asunto(s)
Bases de Datos Genéticas , Genes Fúngicos , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Biología Computacional , Genoma Fúngico , Genómica , Internet , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/fisiología , Interfaz Usuario-Computador , Vocabulario Controlado
12.
Nucleic Acids Res ; 35(Database issue): D468-71, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17142221

RESUMEN

The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.


Asunto(s)
Bases de Datos de Proteínas , Proteómica , Proteínas de Saccharomyces cerevisiae/química , Gráficos por Computador , Genoma Fúngico , Internet , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
13.
Bioinformatics ; 23(23): 3232-40, 2007 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-17942445

RESUMEN

MOTIVATION: The rate at which gene-related findings appear in the scientific literature makes it difficult if not impossible for biomedical scientists to keep fully informed and up to date. The importance of these findings argues for the development of automated methods that can find, extract and summarize this information. This article reports on methods for determining the molecular function claims that are being made in a scientific article, specifically those that are backed by experimental evidence. RESULTS: The most significant result is that for molecular function claims based on direct assays, our methods achieved recall of 70.7% and precision of 65.7%. Furthermore, our methods correctly identified in the text 44.6% of the specific molecular function claims backed up by direct assays, but with a precision of only 0.92%, a disappointing outcome that led to an examination of the different kinds of errors. These results were based on an analysis of 1823 articles from the literature of Saccharomyces cerevisiae (budding yeast). AVAILABILITY: The annotation files for S.cerevisiae are available from ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/gene_association.sgd.gz. The draft protocol vocabulary is available by request from the first author.


Asunto(s)
Inteligencia Artificial , Medicina Basada en la Evidencia/métodos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto , Proteínas de Saccharomyces cerevisiae/clasificación , Proteínas de Saccharomyces cerevisiae/metabolismo , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
14.
Nucleic Acids Res ; 34(Database issue): D442-5, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381907

RESUMEN

Sequencing and annotation of the entire Saccharomyces cerevisiae genome has made it possible to gain a genome-wide perspective on yeast genes and gene products. To make this information available on an ongoing basis, the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/) has created the Genome Snapshot (http://db.yeastgenome.org/cgi-bin/genomeSnapShot.pl). The Genome Snapshot summarizes the current state of knowledge about the genes and chromosomal features of S.cerevisiae. The information is organized into two categories: (i) number of each type of chromosomal feature annotated in the genome and (ii) number and distribution of genes annotated to Gene Ontology terms. Detailed lists are accessible through SGD's Advanced Search tool (http://db.yeastgenome.org/cgi-bin/search/featureSearch), and all the data presented on this page are available from the SGD ftp site (ftp://ftp.yeastgenome.org/yeast/).


Asunto(s)
Bases de Datos Genéticas , Genoma Fúngico , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Cromosomas Fúngicos , Gráficos por Computador , Genómica , Internet , Proteínas de Saccharomyces cerevisiae/clasificación , Proteínas de Saccharomyces cerevisiae/fisiología , Interfaz Usuario-Computador
15.
Nucleic Acids Res ; 33(Database issue): D374-7, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608219

RESUMEN

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is a scientific database of gene, protein and genomic information for the yeast Saccharomyces cerevisiae. SGD has recently developed two new resources that facilitate nucleotide and protein sequence comparisons between S.cerevisiae and other organisms. The Fungal BLAST tool provides directed searches against all fungal nucleotide and protein sequences available from GenBank, divided into categories according to organism, status of completeness and annotation, and source. The Model Organism BLASTP Best Hits resource displays, for each S.cerevisiae protein, the single most similar protein from several model organisms and presents links to the database pages of those proteins, facilitating access to curated information about potential orthologs of yeast proteins.


Asunto(s)
Bases de Datos Genéticas , Genoma Fúngico , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Homología de Secuencia de Aminoácido , Homología de Secuencia de Ácido Nucleico , Programas Informáticos , Proteínas de Saccharomyces cerevisiae/química , Análisis de Secuencia
16.
Nat Commun ; 8: 14238, 2017 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-28169989

RESUMEN

Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.


Asunto(s)
Demografía/estadística & datos numéricos , Genética de Población/métodos , Dinámica Poblacional/tendencias , Población/genética , Análisis por Conglomerados , Demografía/métodos , Emigrantes e Inmigrantes , Flujo Génico/genética , Técnicas de Genotipaje , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple , Dinámica Poblacional/estadística & datos numéricos , Análisis de Secuencia de ADN , Estados Unidos/etnología
17.
PLoS One ; 12(4): e0175310, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28403240

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Metadatos , Programas Informáticos , Animales , ADN/genética , Genoma , Humanos , Ratones
18.
Nucleic Acids Res ; 32(Database issue): D311-4, 2004 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-14681421

RESUMEN

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/), a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, has recently developed several new resources that allow the comparison and integration of information on a genome-wide scale, enabling the user not only to find detailed information about individual genes, but also to make connections across groups of genes with common features and across different species. The Fungal Alignment Viewer displays alignments of sequences from multiple fungal genomes, while the Sequence Similarity Query tool displays PSI-BLAST alignments of each S.cerevisiae protein with similar proteins from any species whose sequences are contained in the non-redundant (nr) protein data set at NCBI. The Yeast Biochemical Pathways tool integrates groups of genes by their common roles in metabolism and displays the metabolic pathways in a graphical form. Finally, the Find Chromosomal Features search interface provides a versatile tool for querying multiple types of information in SGD.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genoma Fúngico , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Secuencia de Aminoácidos , Animales , Humanos , Almacenamiento y Recuperación de la Información , Internet , Datos de Secuencia Molecular , Proteínas de Saccharomyces cerevisiae/química , Alineación de Secuencia , Homología de Secuencia , Programas Informáticos
19.
Artículo en Inglés | MEDLINE | ID: mdl-26980513

RESUMEN

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.


Asunto(s)
Biología Computacional/métodos , ADN/genética , Bases de Datos Genéticas , Algoritmos , Animales , Caenorhabditis elegans , Biología Computacional/normas , Recolección de Datos , Drosophila melanogaster , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Ácidos Nucleicos/genética , Control de Calidad , Reproducibilidad de los Resultados , Alineación de Secuencia
20.
Artículo en Inglés | MEDLINE | ID: mdl-25776021

RESUMEN

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.


Asunto(s)
Curaduría de Datos/métodos , Bases de Datos Genéticas , Ontología de Genes , Redes Reguladoras de Genes/fisiología , Anotación de Secuencia Molecular/métodos , Transcripción Genética/fisiología , Animales , Humanos , Ratones
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA