Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Comput Biol Med ; 171: 108189, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38447502

RESUMEN

Recently, Large Language Models (LLMs) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets has been conducted. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art models when they were fine-tuned only on the training set of these datasets. This suggests that pre-training on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.


Asunto(s)
Benchmarking , Lenguaje , Femenino , Humanos , Útero
2.
Sensors (Basel) ; 22(24)2022 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-36559994

RESUMEN

We propose a new generative model named adaptive cycle-consistent generative adversarial network, or Ad CycleGAN to perform image translation between normal and COVID-19 positive chest X-ray images. An independent pre-trained criterion is added to the conventional Cycle GAN architecture to exert adaptive control on image translation. The performance of Ad CycleGAN is compared with the Cycle GAN without the external criterion. The quality of the synthetic images is evaluated by quantitative metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI), visual information fidelity (VIF), Frechet Inception Distance (FID), and translation accuracy. The experimental results indicate that the synthetic images generated either by the Cycle GAN or by the Ad CycleGAN have lower MSE and RMSE, and higher scores in PSNR, UIQI, and VIF in homogenous image translation (i.e., Y → Y) compared to the heterogenous image translation process (i.e., X → Y). The synthetic images by Ad CycleGAN through the heterogeneous image translation have significantly higher FID score compared to Cycle GAN (p < 0.01). The image translation accuracy of Ad CycleGAN is higher than that of Cycle GAN when normal images are converted to COVID-19 positive images (p < 0.01). Therefore, we conclude that the Ad CycleGAN with the independent criterion can improve the accuracy of GAN image translation. The new architecture has more control on image synthesis and can help address the common class imbalance issue in machine learning methods and artificial intelligence applications with medical images.


Asunto(s)
Inteligencia Artificial , COVID-19 , Humanos , Rayos X , Procesamiento de Imagen Asistido por Computador/métodos , COVID-19/diagnóstico por imagen , Aprendizaje Automático
3.
AMIA Jt Summits Transl Sci Proc ; 2022: 323-330, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35854731

RESUMEN

We present a cycle-consistent adversarial network (Cycle GAN) with dynamic criterion to synthesize blood cells parasitized by malaria plasmodia. The result shows 100% of the synthetic images are correctly classified by the pretrained classifier compared to 99.61% of the real images, 76.6% generated by the Cycle GAN without the dynamic criterion. The average score of Frechet Inception Distance (FID) of the generated images by the enhanced Cycle GAN is 0.0043 (Std=0.0005), which is significantly lower than the FID score of the variational autoencoder (VAE) model (0.0085 (Std=0.0007)). We conclude that the new Cycle GAN model with dynamic criterion can generate high quality malaria infected blood cell images with good diversity. The new method provides new augmentation technique to enhance the image diversity where the acquisition of well-annotated images is highly restricted, and to improve the robustness of medical image automatic processing by deep neural networks.

4.
Inf Sci (N Y) ; 608: 1557-1571, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35855405

RESUMEN

In response to fighting COVID-19 pandemic, researchers in machine learning and artificial intelligence have constructed some medical knowledge graphs (KG) based on existing COVID-19 datasets, however, these KGs contain a considerable amount of semantic relations which are incomplete or missing. In this paper, we focus on the task of knowledge graph embedding (KGE), which serves an important solution to infer the missing relations. In the past, there have been a collection of knowledge graph embedding models with different scoring functions to learn entity and relation embeddings published. However, these models share the same problems of rarely taking important features of KG like attribute features, other than relation triples, into account, while dealing with the heterogeneous, complex and incomplete COVID-19 medical data. To address the above issue, we propose a graph feature collection network (GFCNet) for COVID-19 KGE task, which considers both neighbor and attribute features in KGs. The extensive experiments conducted on the COVID-19 drug KG dataset show promising results and prove the effectiveness and efficiency of our proposed model. In addition, we also explain the future directions of deepening the study on COVID-19 KGE task.

5.
Comput Methods Programs Biomed ; 174: 17-23, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-29801696

RESUMEN

BACKGROUND: Computer-aided medical decision-making (CAMDM) is the method to utilize massive EMR data as both empirical and evidence support for the decision procedure of healthcare activities. Well-developed information infrastructure, such as hospital information systems and disease surveillance systems, provides abundant data for CAMDM. However, the complexity of EMR data with abstract medical knowledge makes the conventional model incompetent for the analysis. Thus a deep belief networks (DBN) based model is proposed to simulate the information analysis and decision-making procedure in medical practice. The purpose of this paper is to evaluate a deep learning architecture as an effective solution for CAMDM. METHODS: A two-step model is applied in our study. At the first step, an optimized seven-layer deep belief network (DBN) is applied as an unsupervised learning algorithm to perform model training to acquire feature representation. Then a support vector machine model is adopted to DBN at the second step of the supervised learning. There are two data sets used in the experiments. One is a plain text data set indexed by medical experts. The other is a structured dataset on primary hypertension. The data are randomly divided to generate the training set for the unsupervised learning and the testing set for the supervised learning. The model performance is evaluated by the statistics of mean and variance, the average precision and coverage on the data sets. Two conventional shallow models (support vector machine / SVM and decision tree / DT) are applied as the comparisons to show the superiority of our proposed approach. RESULTS: The deep learning (DBN + SVM) model outperforms simple SVM and DT on two data sets in terms of all the evaluation measures, which confirms our motivation that the deep model is good at capturing the key features with less dependence when the index is built up by manpower. CONCLUSIONS: Our study shows the two-step deep learning model achieves high performance for medical information retrieval over the conventional shallow models. It is able to capture the features of both plain text and the highly-structured database of EMR data. The performance of the deep model is superior to the conventional shallow learning models such as SVM and DT. It is an appropriate knowledge-learning model for information retrieval of EMR system. Therefore, deep learning provides a good solution to improve the performance of CAMDM systems.


Asunto(s)
Aprendizaje Profundo , Registros Electrónicos de Salud , Medicina Tradicional China/métodos , Algoritmos , Teorema de Bayes , Toma de Decisiones Clínicas , Simulación por Computador , Humanos , Almacenamiento y Recuperación de la Información , Modelos Estadísticos , Máquina de Vectores de Soporte
6.
BMC Med Genomics ; 9 Suppl 2: 48, 2016 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-27510822

RESUMEN

BACKGROUND: Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions. METHODS: A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model. RESULTS: There are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models. CONCLUSIONS: Machine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/clasificación , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/genética , Humanos , Modelos Biológicos , Procesos Estocásticos
7.
J Bioinform Comput Biol ; 11(3): 1341005, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23796182

RESUMEN

We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.


Asunto(s)
Genómica/métodos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet
9.
BMC Bioinformatics ; 13 Suppl 9: S2, 2012 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-22901087

RESUMEN

BACKGROUND: The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. RESULTS: We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. CONCLUSIONS: First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Minería de Datos , Almacenamiento y Recuperación de la Información/métodos , Genómica/métodos , Semántica
10.
BMC Genomics ; 13 Suppl 3: S2, 2012 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-22759611

RESUMEN

BACKGROUND: In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. RESULTS: In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. CONCLUSIONS: The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search. Moreover, we proposes a distance measure to quantify how much a passage can increase topical diversity by considering both topical importance and topical coefficient by LDA, and the distance measure is a modified Euclidean distance.


Asunto(s)
Algoritmos , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Investigación Biomédica/métodos , Modelos Genéticos , Reproducibilidad de los Resultados
11.
Int J Data Min Bioinform ; 6(2): 115-29, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22724293

RESUMEN

In this paper, we present a context-sensitive approach to re-ranking retrieved documents for further improving the effectiveness of high-performance biomedical literature retrieval systems. For each topic, a two-dimensional positive context is learnt from the top N retrieved documents and a group of negative contexts are learnt from the last N' documents in initial retrieval ranked list. The contextual space contains lexical context and conceptual context. The probabilities that retrieved documents are generated within the contextual space are then computed for document re-ranking. Empirical evaluation on the TREC Genomics full-text collection and three high-performance biomedical literature retrieval runs demonstrates that the context-sensitive re-ranking approach yields better retrieval performance.


Asunto(s)
Algoritmos , Almacenamiento y Recuperación de la Información/métodos , Evaluación de la Tecnología Biomédica/métodos , Genómica
12.
BMC Bioinformatics ; 12 Suppl 5: S6, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21989123

RESUMEN

BACKGROUND: The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance. RESULTS: In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP. CONCLUSIONS: From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval.


Asunto(s)
Genómica , Almacenamiento y Recuperación de la Información/métodos , Perfilación de la Expresión Génica , Humanos
13.
BMC Bioinformatics ; 12 Suppl 5: S8, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21989180

RESUMEN

BACKGROUND: In the biomedical domain, the desired information of a question (query) asked by biologists usually is a list of a certain type of entities covering different aspects that are related to the question, such as genes, proteins, diseases, mutations, etc. Hence it is important for a biomedical information retrieval system to be able to provide comprehensive and diverse answers to fulfill biologists' information needs. However, traditional retrieval models assume that the relevance of a document is independent of the relevance of other documents. This assumption may result in high redundancy and low diversity in the retrieval ranked lists. RESULTS: In this paper, we propose a relevance-novelty combined model, named RelNov model, based on the framework of an undirected graphical model. It consists of two component models, namely the aspect-term relevance model and the aspect-term novelty model. They model the relevance of a document and the novelty of a document respectively. We show that our approach can achieve 16.4% improvement over the highest aspect level MAP reported in the TREC 2007 Genomics track, and 9.8% improvement over the highest passage level MAP reported in the TREC 2007 Genomics track. CONCLUSIONS: The proposed combination model which models aspects, terms, topic relevance and document novelty as potential functions is demonstrated to be effective in promoting ranking diversity as well as in improving relevance of ranked lists for genomics search. We also show that the use of aspect plays an important role in the model. Moreover, the proposed model can integrate various different relevance and novelty measures easily.


Asunto(s)
Genómica/métodos , Almacenamiento y Recuperación de la Información , Algoritmos , Semántica , Unified Medical Language System , Vocabulario Controlado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...