Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Drug Discov Today ; 19(7): 882-9, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24201223

RESUMEN

In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.


Asunto(s)
Investigación Biomédica/métodos , Minería de Datos/métodos , Diabetes Mellitus Tipo 2/genética , Semántica , Integración de Sistemas , Animales , Diabetes Mellitus Tipo 2/diagnóstico , Humanos , Bases del Conocimiento
2.
Drug Discov Today ; 18(9-10): 428-34, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23247259

RESUMEN

Research in the life sciences requires ready access to primary data, derived information and relevant knowledge from a multitude of sources. Integration and interoperability of such resources are crucial for sharing content across research domains relevant to the life sciences. In this article we present a perspective review of data integration with emphasis on a semantics driven approach to data integration that pushes content into a shared infrastructure, reduces data redundancy and clarifies any inconsistencies. This enables much improved access to life science data from numerous primary sources. The Semantic Enrichment of the Scientific Literature (SESL) pilot project demonstrates feasibility for using already available open semantic web standards and technologies to integrate public and proprietary data resources, which span structured and unstructured content. This has been accomplished through a precompetitive consortium, which provides a cost effective approach for numerous stakeholders to work together to solve common problems.


Asunto(s)
Recolección de Datos , Difusión de la Información , Almacenamiento y Recuperación de la Información , Integración de Sistemas , Disciplinas de las Ciencias Biológicas , Humanos , Internet
3.
Bioinformatics ; 28(2): 254-60, 2012 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-22135416

RESUMEN

MOTIVATION: The scientific literature contains a wealth of information about biological systems. Manual curation lacks the scalability to extract this information due to the ever-increasing numbers of papers being published. The development and application of text mining technologies has been proposed as a way of dealing with this problem. However, the inter-species ambiguity of the genomic nomenclature makes mapping of gene mentions identified in text to their corresponding Entrez gene identifiers an extremely difficult task. We propose a novel method, which transforms a MEDLINE record into a mixture of adjacency matrices; by performing a random walkover the resulting graph, we can perform multi-class supervised classification allowing the assignment of taxonomy identifiers to individual gene mentions. The ability to achieve good performance at this task has a direct impact on the performance of normalizing gene mentions to Entrez gene identifiers. Such graph mixtures add flexibility and allow us to generate probabilistic classification schemes that naturally reflect the uncertainties inherent, even in literature-derived data. RESULTS: Our method performs well in terms of both micro- and macro-averaged performance, achieving micro-F(1) of 0.76 and macro-F(1) of 0.36 on the publicly available DECA corpus. Re-curation of the DECA corpus was performed, with our method achieving 0.88 micro-F(1) and 0.51 macro-F(1). Our method improves over standard classification techniques [such as support vector machines (SVMs)] in a number of ways: flexibility, interpretability and its resistance to the effects of class bias in the training data. Good performance is achieved without the need for computationally expensive parse tree generation or 'bag of words classification'.


Asunto(s)
Minería de Datos , Genes , Terminología como Asunto , Animales , Humanos , MEDLINE , Programas Informáticos , Especificidad de la Especie , Máquina de Vectores de Soporte , Estados Unidos
4.
Drug Discov Today ; 16(21-22): 940-7, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21963522

RESUMEN

The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.


Asunto(s)
Investigación Biomédica/métodos , Descubrimiento de Drogas/métodos , Industria Farmacéutica/normas , Terminología como Asunto , Investigación Biomédica/normas , Conducta Cooperativa , Bases de Datos Factuales , Humanos , Vocabulario
5.
Hum Genomics ; 5(1): 17-29, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21106487

RESUMEN

Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining - the automated extraction of information from (electronically) published sources - could potentially fulfil an important role - but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.


Asunto(s)
Minería de Datos/métodos , Genómica/métodos , Publicaciones Periódicas como Asunto , Biología de Sistemas/métodos , Sesgo de Publicación , Terminología como Asunto
6.
Am J Hum Genet ; 81(6): 1119-32, 2007 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-17999355

RESUMEN

We have conducted a multistage genomewide association study, using 1,620,742 single-nucleotide polymorphisms to systematically investigate the genetic factors influencing intrinsic skin pigmentation in a population of South Asian descent. Polymorphisms in three genes--SLC24A5, TYR, and SLC45A2--yielded highly significant replicated associations with skin-reflectance measurements, an indirect measure of melanin content in the skin. The associations detected in these three genes, in an additive manner, collectively account for a large fraction of the natural variation of skin pigmentation in a South Asian population. Our study is the first to interrogate polymorphisms across the genome, to find genetic determinants of the natural variation of skin pigmentation within a human population.


Asunto(s)
Antígenos de Neoplasias/genética , Antiportadores/genética , Genoma Humano , Melaninas/análisis , Proteínas de Transporte de Membrana/genética , Polimorfismo de Nucleótido Simple , Fenómenos Fisiológicos de la Piel , Pigmentación de la Piel/genética , Bangladesh , Frecuencia de los Genes , Humanos , India , Pakistán , Fenotipo , Sri Lanka
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...