Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros

Banco de datos
Tipo de estudio
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Biomed Inform ; 118: 103779, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33839304

RESUMEN

The automatic recognition of gene names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. While current methods for tagging gene entities have been developed for biomedical literature, their performance on species other than human is substantially lower due to the lack of annotation data. We therefore present the NLM-Gene corpus, a high-quality manually annotated corpus for genes developed at the US National Library of Medicine (NLM), covering ambiguous gene names, with an average of 29 gene mentions (10 unique identifiers) per document, and a broader representation of different species (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, etc.) when compared to previous gene annotation corpora. NLM-Gene consists of 550 PubMed abstracts from 156 biomedical journals, doubly annotated by six experienced NLM indexers, randomly paired for each document to control for bias. The annotators worked in three annotation rounds until they reached complete agreement. This gold-standard corpus can serve as a benchmark to develop & test new gene text mining algorithms. Using this new resource, we have developed a new gene finding algorithm based on deep learning which improved both on precision and recall from existing tools. The NLM-Gene annotated corpus is freely available at ftp://ftp.ncbi.nlm.nih.gov/pub/lu/NLMGene. We have also applied this tool to the entire PubMed/PMC with their results freely accessible through our web-based tool PubTator (www.ncbi.nlm.nih.gov/research/pubtator).


Asunto(s)
Drosophila melanogaster , Genes vif , Animales , Minería de Datos , National Library of Medicine (U.S.) , PubMed , Ratas , Estados Unidos
2.
Database (Oxford) ; 20242024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39126204

RESUMEN

The automatic recognition of biomedical relationships is an important step in the semantic understanding of the information contained in the unstructured text of the published literature. The BioRED track at BioCreative VIII aimed to foster the development of such methods by providing the participants the BioRED-BC8 corpus, a collection of 1000 PubMed documents manually curated for diseases, gene/proteins, chemicals, cell lines, gene variants, and species, as well as pairwise relationships between them which are disease-gene, chemical-gene, disease-variant, gene-gene, chemical-disease, chemical-chemical, chemical-variant, and variant-variant. Furthermore, relationships are categorized into the following semantic categories: positive correlation, negative correlation, binding, conversion, drug interaction, comparison, cotreatment, and association. Unlike most of the previous publicly available corpora, all relationships are expressed at the document level as opposed to the sentence level, and as such, the entities are normalized to the corresponding concept identifiers of the standardized vocabularies, namely, diseases and chemicals are normalized to MeSH, genes (and proteins) to National Center for Biotechnology Information (NCBI) Gene, species to NCBI Taxonomy, cell lines to Cellosaurus, and gene/protein variants to Single Nucleotide Polymorphism Database. Finally, each annotated relationship is categorized as 'novel' depending on whether it is a novel finding or experimental verification in the publication it is expressed in. This distinction helps differentiate novel findings from other relationships in the same text that provides known facts and/or background knowledge. The BioRED-BC8 corpus uses the previous BioRED corpus of 600 PubMed articles as the training dataset and includes a set of newly published 400 articles to serve as the test data for the challenge. All test articles were manually annotated for the BioCreative VIII challenge by expert biocurators at the National Library of Medicine, using the original annotation guidelines, where each article is doubly annotated in a three-round annotation process until full agreement is reached between all curators. This manuscript details the characteristics of the BioRED-BC8 corpus as a critical resource for biomedical named entity recognition and relation extraction. Using this new resource, we have demonstrated advancements in biomedical text-mining algorithm development. Database URL: https://codalab.lisn.upsaclay.fr/competitions/16381.


Asunto(s)
Curaduría de Datos , Humanos , Curaduría de Datos/métodos , Minería de Datos/métodos , Semántica , PubMed
3.
Database (Oxford) ; 20222022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36458799

RESUMEN

The automatic recognition of chemical names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. The task is even more challenging when considering the identification of these entities in the article's full text and, furthermore, the identification of candidate substances for that article's metadata [Medical Subject Heading (MeSH) article indexing]. The National Library of Medicine (NLM)-Chem track at BioCreative VII aimed to foster the development of algorithms that can predict with high quality the chemical entities in the biomedical literature and further identify the chemical substances that are candidates for article indexing. As a result of this challenge, the NLM-Chem track produced two comprehensive, manually curated corpora annotated with chemical entities and indexed with chemical substances: the chemical identification corpus and the chemical indexing corpus. The NLM-Chem BioCreative VII (NLM-Chem-BC7) Chemical Identification corpus consists of 204 full-text PubMed Central (PMC) articles, fully annotated for chemical entities by 12 NLM indexers for both span (i.e. named entity recognition) and normalization (i.e. entity linking) using MeSH. This resource was used for the training and testing of the Chemical Identification task to evaluate the accuracy of algorithms in predicting chemicals mentioned in recently published full-text articles. The NLM-Chem-BC7 Chemical Indexing corpus consists of 1333 recently published PMC articles, equipped with chemical substance indexing by manual experts at the NLM. This resource was used for the evaluation of the Chemical Indexing task, which evaluated the accuracy of algorithms in predicting the chemicals that should be indexed, i.e. appear in the listing of MeSH terms for the document. This set was further enriched after the challenge in two ways: (i) 11 NLM indexers manually verified each of the candidate terms appearing in the prediction results of the challenge participants, but not in the MeSH indexing, and the chemical indexing terms appearing in the MeSH indexing list, but not in the prediction results, and (ii) the challenge organizers algorithmically merged the chemical entity annotations in the full text for all predicted chemical entities and used a statistical approach to keep those with the highest degree of confidence. As a result, the NLM-Chem-BC7 Chemical Indexing corpus is a gold-standard corpus for chemical indexing of journal articles and a silver-standard corpus for chemical entity identification in full-text journal articles. Together, these resources are currently the most comprehensive resources for chemical entity recognition, and we demonstrate improvements in the chemical entity recognition algorithms. We detail the characteristics of these novel resources and make them available for the community. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/NLM-Chem-BC7-corpus/.


Asunto(s)
Algoritmos , Minería de Datos , Estados Unidos , Humanos , National Library of Medicine (U.S.) , PubMed , Bases de Datos Factuales
4.
Mol Cell Biol ; 23(21): 7554-65, 2003 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-14560003

RESUMEN

The role of aggregation of abnormal proteins in cellular toxicity is of general importance for understanding many neurological disorders. Here, using a yeast model, we demonstrate that mutations in many proteins involved in endocytosis and actin function dramatically enhance the toxic effect of polypeptides with an expanded polyglutamine (polyQ) domain. This enhanced cytotoxicity required polyQ aggregation and was dependent on the yeast protein Rnq1 in its prion form. In wild-type cells, expression of expanded polyQ followed by its aggregation led to specific and acute inhibition of endocytosis, which preceded growth inhibition. Some components of the endocytic machinery were efficiently recruited into the polyQ aggregates. Furthermore, in cells with polyQ aggregates, cortical actin patches were delocalized and actin was recruited into the polyQ aggregates. Aggregation of polyQ in mammalian HEK293 cells also led to defects in endocytosis. Therefore, it appears that inhibition of endocytosis is a direct consequence of polyQ aggregation and could significantly contribute to cytotoxicity.


Asunto(s)
Endocitosis/fisiología , Proteínas Fúngicas/metabolismo , Péptidos/metabolismo , Proteínas de Saccharomyces cerevisiae , Actinas/metabolismo , Animales , Fraccionamiento Celular , Línea Celular , Supervivencia Celular , Proteínas Fúngicas/genética , Genes Fúngicos , Humanos , Metabolismo de los Lípidos , Mutación , Priones/metabolismo , Estructura Terciaria de Proteína , Receptores de Transferrina/metabolismo , Levaduras/fisiología , Levaduras/ultraestructura
5.
Curr Opin Cell Biol ; 20(6): 688-93, 2008 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-18840522

RESUMEN

Centrioles play an important role in organizing microtubules and are precisely duplicated once per cell cycle. New (daughter) centrioles typically arise in association with existing (mother) centrioles (canonical assembly), suggesting that mother centrioles direct the formation of daughter centrioles. However, under certain circumstances, centrioles can also selfassemble free of an existing centriole (de novo assembly). Recent work indicates that the canonical and de novo pathways utilize a common mechanism and that a mother centriole spatially constrains the self-assembly process to occur within its immediate vicinity. Other recently identified mechanisms further regulate canonical assembly so that during each cell cycle, one and only one daughter centriole is assembled per mother centriole.


Asunto(s)
Centriolos/fisiología , Animales , Ciclo Celular , División Celular , Humanos , Microtúbulos/metabolismo , Modelos Biológicos
6.
Cell Biochem Biophys ; 41(2): 295-318, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15475615

RESUMEN

Endocytosis is a protein and lipid-trafficking pathway that occurs in all eukaryotic cells. It involves the internalization of plasma membrane proteins and lipids into the cell and the subsequent degradation of proteins in the lysosome or the recycling of proteins and lipids back to the plasma membrane. Over the past decade, studies in yeast and mammalian cells have revealed endocytosis to be a very complex molecular process that depends on regulated interactions between a variety of proteins and lipids. The Eps15 homology (EH) domain is a conserved, modular protein-interaction domain found in several endocytosis proteins. EH proteins can function as key regulators of endocytosis through their ability to interact with many of the other proteins involved in this process.


Asunto(s)
Endocitosis , Regulación de la Expresión Génica , Animales , Transporte Biológico , Biofisica/métodos , Clatrina/química , Humanos , Lípidos/química , Modelos Biológicos , Unión Proteica , Conformación Proteica , Estructura Terciaria de Proteína , Proteínas/química , Saccharomyces cerevisiae/metabolismo
7.
Traffic ; 5(12): 963-78, 2004 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-15522098

RESUMEN

Pan1p is an essential protein of the yeast Saccharomyces cerevisiae that is required for the internalization step of endocytosis and organization of the actin cytoskeleton. Pan1p, which binds several other endocytic proteins, is composed of multiple protein-protein interaction domains including two Eps15 Homology (EH) domains, a coiled-coil domain, an acidic Arp2/3-activating region, and a proline-rich domain. In this study, we have induced high-level expression of various domains of Pan1p in wild-type cells to assess the dominant consequences on viability, endocytosis, and actin organization. We found that the most severe phenotypes, with blocked endocytosis and aggregated actin, required expression of nearly full length Pan1p, and also required the endocytic regulatory protein kinase Prk1p. The central coiled-coil domain was the smallest fragment whose overexpression caused any dominant effects; these effects were more pronounced by inclusion of the second EH domain. Co-overexpressing nonoverlapping amino- and carboxy-terminal fragments did not mimic the effects of the intact protein, whereas fragments that overlapped within the coiled-coil region could. Yeast two-hybrid and in vivo coimmunoprecipitation analyses suggest that Pan1 may form dimers or higher order oligomers. Collectively, our data support a view of Pan1p as a dimeric/oligomeric scaffold whose functions require both the amino- and carboxy-termini, linked by the central region.


Asunto(s)
Proteínas Fúngicas/fisiología , Actinas/metabolismo , Endocitosis/fisiología , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Proteínas Fúngicas/toxicidad , Genes Reporteros , Proteínas de Microfilamentos , Fragmentos de Péptidos/genética , Fragmentos de Péptidos/fisiología , Proteína Quinasa C , Proteínas Serina-Treonina Quinasas/metabolismo , Estructura Terciaria de Proteína , Proteínas Tirosina Quinasas Receptoras/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Receptores del Factor de Conjugación , Receptores de Feromonas/metabolismo , Saccharomyces cerevisiae/fisiología , Proteínas de Saccharomyces cerevisiae/metabolismo
8.
Cell ; 111(7): 991-1002, 2002 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-12507426

RESUMEN

A rapid cascade of regulatory events defines the developmental fates of embryonic cells. However, once established, these developmental fates and the underlying transcriptional programs can be remarkably stable. Here, we describe two proteins, MEP-1 and LET-418/Mi-2, required for maintenance of somatic differentiation in C. elegans. In animals lacking MEP-1 and LET-418, germline-specific genes become derepressed in somatic cells, and Polycomb group (PcG) and SET domain-related proteins promote this ectopic expression. MEP-1 and LET-418 interact in vivo with the germline-protein PIE-1. Our findings support a model in which PIE-1 inhibits MEP-1 and associated factors to maintain the pluripotency of germ cells, while at later times MEP-1 and LET-418 remodel chromatin to establish new stage- or cell-type-specific differentiation potential.


Asunto(s)
Adenosina Trifosfatasas , Autoantígenos/metabolismo , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/embriología , Linaje de la Célula/fisiología , ADN Helicasas , Embrión no Mamífero/embriología , Células Germinativas/metabolismo , Factores de Transcripción/metabolismo , Animales , Autoantígenos/genética , Caenorhabditis elegans/citología , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Diferenciación Celular/fisiología , Embrión no Mamífero/citología , Embrión no Mamífero/metabolismo , Regulación del Desarrollo de la Expresión Génica/fisiología , Células Germinativas/citología , Histona Desacetilasas/genética , Histona Desacetilasas/metabolismo , Complejo Desacetilasa y Remodelación del Nucleosoma Mi-2 , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Factores de Transcripción/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA