Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
2.
Genetics ; 224(1)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-36866529

RESUMEN

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.


Asunto(s)
Bases de Datos Genéticas , Proteínas , Ontología de Genes , Proteínas/genética , Anotación de Secuencia Molecular , Biología Computacional
3.
Genetics ; 224(1)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-36607068

RESUMEN

As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.


Asunto(s)
Saccharomyces , Humanos , Saccharomyces/genética , Saccharomyces cerevisiae/genética , Genoma Fúngico , Bases de Datos Genéticas , Programas Informáticos
5.
Genome Med ; 14(1): 6, 2022 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-35039090

RESUMEN

BACKGROUND: Identification of clinically significant genetic alterations involved in human disease has been dramatically accelerated by developments in next-generation sequencing technologies. However, the infrastructure and accessible comprehensive curation tools necessary for analyzing an individual patient genome and interpreting genetic variants to inform healthcare management have been lacking. RESULTS: Here we present the ClinGen Variant Curation Interface (VCI), a global open-source variant classification platform for supporting the application of evidence criteria and classification of variants based on the ACMG/AMP variant classification guidelines. The VCI is among a suite of tools developed by the NIH-funded Clinical Genome Resource (ClinGen) Consortium and supports an FDA-recognized human variant curation process. Essential to this is the ability to enable collaboration and peer review across ClinGen Expert Panels supporting users in comprehensively identifying, annotating, and sharing relevant evidence while making variant pathogenicity assertions. To facilitate evidence-based improvements in human variant classification, the VCI is publicly available to the genomics community. Navigation workflows support users providing guidance to comprehensively apply the ACMG/AMP evidence criteria and document provenance for asserting variant classifications. CONCLUSIONS: The VCI offers a central platform for clinical variant classification that fills a gap in the learning healthcare system, facilitates widespread adoption of standards for clinical curation, and is available at https://curation.clinicalgenome.org.


Asunto(s)
Variación Genética , Genoma Humano , Humanos , Pruebas Genéticas , Genómica
6.
Genetics ; 220(4)2022 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-34897464

RESUMEN

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.


Asunto(s)
Saccharomyces , Alelos , Bases de Datos Genéticas , Genoma Fúngico , Humanos , Saccharomyces/genética , Saccharomyces cerevisiae/genética
8.
Cell Genom ; 1(2)2021 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-35072136

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.

9.
Cell ; 183(4): 905-917.e16, 2020 11 12.
Artículo en Inglés | MEDLINE | ID: mdl-33186529

RESUMEN

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.


Asunto(s)
Seguridad Computacional , Genómica , Privacidad , Genoma Humano , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Fenotipo , Filogenia , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Análisis de la Célula Individual
11.
Nature ; 583(7818): 744-751, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728240

RESUMEN

The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.


Asunto(s)
Cromatina/genética , Cromatina/metabolismo , Conjuntos de Datos como Asunto , Desarrollo Fetal/genética , Histonas/metabolismo , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Cromatina/química , Secuenciación de Inmunoprecipitación de Cromatina , Enfermedad/genética , Elementos de Facilitación Genéticos/genética , Femenino , Regulación del Desarrollo de la Expresión Génica/genética , Variación Genética , Histonas/química , Humanos , Masculino , Ratones , Ratones Endogámicos C57BL , Especificidad de Órganos/genética , Reproducibilidad de los Resultados , Transposasas/metabolismo
12.
Nature ; 583(7818): 693-698, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728248

RESUMEN

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.


Asunto(s)
Bases de Datos Genéticas , Genoma/genética , Genómica , Anotación de Secuencia Molecular , Animales , Sitios de Unión , Cromatina/genética , Cromatina/metabolismo , Metilación de ADN , Bases de Datos Genéticas/normas , Bases de Datos Genéticas/tendencias , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Genómica/normas , Genómica/tendencias , Histonas/metabolismo , Humanos , Ratones , Anotación de Secuencia Molecular/normas , Control de Calidad , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo
13.
Sci Rep ; 10(1): 7933, 2020 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-32404971

RESUMEN

ChIP-seq is one of the core experimental resources available to understand genome-wide epigenetic interactions and identify the functional elements associated with diseases. The analysis of ChIP-seq data is important but poses a difficult computational challenge, due to the presence of irregular noise and bias on various levels. Although many peak-calling methods have been developed, the current computational tools still require, in some cases, human manual inspection using data visualization. However, the huge volumes of ChIP-seq data make it almost impossible for human researchers to manually uncover all the peaks. Recently developed convolutional neural networks (CNN), which are capable of achieving human-like classification accuracy, can be applied to this challenging problem. In this study, we design a novel supervised learning approach for identifying ChIP-seq peaks using CNNs, and integrate it into a software pipeline called CNN-Peaks. We use data labeled by human researchers who annotate the presence or absence of peaks in some genomic segments, as training data for our model. The trained model is then applied to predict peaks in previously unseen genomic segments from multiple ChIP-seq datasets including benchmark datasets commonly used for validation of peak calling methods. We observe a performance superior to that of previous methods.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Redes Neurales de la Computación , Programas Informáticos , Algoritmos , Sitios de Unión , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Bases de Datos de Ácidos Nucleicos , Epigénesis Genética , Epigenómica/métodos , Histonas/metabolismo , Humanos , Motivos de Nucleótidos , Unión Proteica , Sitio de Iniciación de la Transcripción
14.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32128557

RESUMEN

The identification and accurate quantitation of protein abundance has been a major objective of proteomics research. Abundance studies have the potential to provide users with data that can be used to gain a deeper understanding of protein function and regulation and can also help identify cellular pathways and modules that operate under various environmental stress conditions. One of the central missions of the Saccharomyces Genome Database (SGD; https://www.yeastgenome.org) is to work with researchers to identify and incorporate datasets of interest to the wider scientific community, thereby enabling hypothesis-driven research. A large number of studies have detailed efforts to generate proteome-wide abundance data, but deeper analyses of these data have been hampered by the inability to compare results between studies. Recently, a unified protein abundance dataset was generated through the evaluation of more than 20 abundance datasets, which were normalized and converted to common measurement units, in this case molecules per cell. We have incorporated these normalized protein abundance data and associated metadata into the SGD database, as well as the SGD YeastMine data warehouse, resulting in the addition of 56 487 values for untreated cells grown in either rich or defined media and 28 335 values for cells treated with environmental stressors. Abundance data for protein-coding genes are displayed in a sortable, filterable table on Protein pages, available through Locus Summary pages. A median abundance value was incorporated, and a median absolute deviation was calculated for each protein-coding gene and incorporated into SGD. These values are displayed in the Protein section of the Locus Summary page. The inclusion of these data has enhanced the quality and quantity of protein experimental information presented at SGD and provides opportunities for researchers to access and utilize the data to further their research.


Asunto(s)
Genoma Fúngico/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Bases de Datos Genéticas , Genómica/métodos , Internet , Proteoma/genética , Proteoma/metabolismo , Proteómica/métodos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Interfaz Usuario-Computador
15.
Nucleic Acids Res ; 48(D1): D743-D748, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31612944

RESUMEN

The Saccharomyces Genome Database (SGD; www.yeastgenome.org) maintains the official annotation of all genes in the Saccharomyces cerevisiae reference genome and aims to elucidate the function of these genes and their products by integrating manually curated experimental data. Technological advances have allowed researchers to profile RNA expression and identify transcripts at high resolution. These data can be configured in web-based genome browser applications for display to the general public. Accordingly, SGD has incorporated published transcript isoform data in our instance of JBrowse, a genome visualization platform. This resource will help clarify S. cerevisiae biological processes by furthering studies of transcriptional regulation, untranslated regions, genome engineering, and expression quantification in S. cerevisiae.


Asunto(s)
Genoma Fúngico , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta , Isoformas de Proteínas , RNA-Seq , Valores de Referencia , Interfaz Usuario-Computador , Navegador Web
16.
Nucleic Acids Res ; 48(D1): D882-D889, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31713622

RESUMEN

The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.


Asunto(s)
ADN/genética , Bases de Datos Genéticas , Genoma Humano , Programas Informáticos , Animales , Genómica , Humanos , Ratones
17.
Curr Protoc Bioinformatics ; 68(1): e89, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31751002

RESUMEN

The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.


Asunto(s)
Cromatina/metabolismo , ADN/genética , Bases de Datos Genéticas , Epigenómica/métodos , Animales , Metilación de ADN , Genoma Humano , Humanos , Internet , Metadatos , Ratones , Programas Informáticos
18.
PLoS One ; 14(8): e0221858, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31454399

RESUMEN

BACKGROUND: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.


Asunto(s)
Análisis de Secuencia de ADN , Programas Informáticos , Cromosomas Fúngicos , Genoma Fúngico , Anotación de Secuencia Molecular , Sintenía/genética
19.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30715277

RESUMEN

Proteins seldom function individually. Instead, they interact with other proteins or nucleic acids to form stable macromolecular complexes that play key roles in important cellular processes and pathways. One of the goals of Saccharomyces Genome Database (SGD; www.yeastgenome.org) is to provide a complete picture of budding yeast biological processes. To this end, we have collaborated with the Molecular Interactions team that provides the Complex Portal database at EMBL-EBI to manually curate the complete yeast complexome. These data, from a total of 589 complexes, were previously available only in SGD's YeastMine data warehouse (yeastmine.yeastgenome.org) and the Complex Portal (www.ebi.ac.uk/complexportal). We have now incorporated these macromolecular complex data into the SGD core database and designed complex-specific reports to make these data easily available to researchers. These web pages contain referenced summaries focused on the composition and function of individual complexes. In addition, detailed information about how subunits interact within the complex, their stoichiometry and the physical structure are displayed when such information is available. Finally, we generate network diagrams displaying subunits and Gene Ontology annotations that are shared between complexes. Information on macromolecular complexes will continue to be updated in collaboration with the Complex Portal team and curated as more data become available.


Asunto(s)
ADN de Hongos , Bases de Datos Genéticas , Proteínas Fúngicas , Genoma Fúngico/genética , Saccharomyces/genética , ADN de Hongos/química , ADN de Hongos/genética , ADN de Hongos/metabolismo , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Genómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA