Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
J Biomed Inform ; 94: 103202, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31075531

RESUMEN

CONTEXT: Citation screening (also called study selection) is a phase of systematic review process that has attracted a growing interest on the use of text mining (TM) methods to support it to reduce time and effort. Search results are usually imbalanced between the relevant and the irrelevant classes of returned citations. Class imbalance among other factors has been a persistent problem that impairs the performance of TM models, particularly in the context of automatic citation screening for systematic reviews. This has often caused the performance of classification models using the basic title and abstract data to ordinarily fall short of expectations. OBJECTIVE: In this study, we explore the effects of using full bibliography data in addition to title and abstract on text classification performance for automatic citation screening. METHODS: We experiment with binary and Word2vec feature representations and SVM models using 4 software engineering (SE) and 15 medical review datasets. We build and compare 3 types of models (binary-non-linear, Word2vec-linear and Word2vec-non-linear kernels) with each dataset using the two feature sets. RESULTS: The bibliography enriched data exhibited consistent improved performance in terms of recall, work saved over sampling (WSS) and Matthews correlation coefficient (MCC) in 3 of the 4 SE datasets that are fairly large in size. For the medical datasets, the results vary, however in the majority of cases the performance is the same or better. CONCLUSION: Inclusion of the bibliography data provides the potential of improving the performance of the models but to date results are inconclusive.


Asunto(s)
Bibliografías como Asunto , Minería de Datos/métodos , Automatización , Biología Computacional/métodos , Modelos Teóricos
2.
J Biomed Inform ; 73: 1-13, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28711679

RESUMEN

CONTEXT: Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. OBJECTIVE: In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. METHODS: The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. RESULTS: Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. CONCLUSIONS: The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and how any randomization was controlled. We introduce a checklist of information that needs to be provided in order to ensure that a published study can be reproduced.


Asunto(s)
Lista de Verificación , Minería de Datos , Literatura de Revisión como Asunto , Investigación Biomédica , Humanos , Publicaciones , Reproducibilidad de los Resultados
3.
Int J Med Inform ; 76(2-3): 137-44, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17010664

RESUMEN

With the continued expansion of electronic patient record systems ahead of comprehensive evidence, metrics, or future-proofing, health informatics in Europe and beyond is embarking on a faith-driven adventure that also risks data swamping of end-users. An alternative approach is an information broker system, drawing from departmental data sources. A 3-year study in health and social care has produced a first demonstrator which can search for specified information in heterogeneous distributed data stores, with source-specific permission can copy it, and then merge the search results into one integrated picture in a real-time process which is also captured in an audit system. The research project has addressed a number of issues during the study, including updating the concepts of role-based access, semantic interoperability, and harnessing web-based services bound at the time of need. A demonstrator now exists, and provides a platform for further application and development research. This paper summarises how this opens up a viable alternative approach for the next generation of health record systems, enabling record searching and integration as and when it is needed for specific patient-related purposes, whilst being independent of organisations, diagnostic approaches, or service delivery structures, and reducing the risks of data swamping.


Asunto(s)
Gestión de la Información/métodos , Internet , Sistemas de Registros Médicos Computarizados/organización & administración , Programas Informáticos , Acceso a la Información , Eficiencia Organizacional , Humanos , Almacenamiento y Recuperación de la Información , Aplicaciones de la Informática Médica
4.
Stud Health Technol Inform ; 112: 3-16, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15923711

RESUMEN

We describe a prototype information broker that has been developed to address typical healthcare information needs, using web services to obtain data from autonomous, heterogeneous sources. Some key features are reviewed: how data sources are turned into data services; how we enforce a distributed access control policy; and how semantic interoperability is achieved between the broker and its data services. Finally, we discuss the role that such a broker might have in a Grid context, as well as the limitations this reveals in current Grid provision.


Asunto(s)
Gestión de la Información/métodos , Servicios de Información , Sistemas de Información/organización & administración , Sistemas de Computación , Humanos , Gestión de la Información/instrumentación , Almacenamiento y Recuperación de la Información , Reino Unido , Interfaz Usuario-Computador , Vocabulario Controlado
5.
Stud Health Technol Inform ; 116: 905-10, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16160373

RESUMEN

With the continued expansion of Electronic Patient Record systems ahead of comprehensive evidence, metrics, or future-proofing, European health informatics is embarking on a faith-driven adventure that also risks data swamping of end-users. An alternative approach is an information broker system, drawing from departmental data sources. A three-year study in health and social care has produced a first demonstrator which can search for specified information in heterogeneous distributed data stores, with source-specific permission can copy it, and then merge the search results in a real-time process.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Humanos , Sistemas de Registros Médicos Computarizados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA