Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 481
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 111(10): 2190-2202, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39255797

RESUMO

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.


Assuntos
Fenótipo , Doenças Raras , Humanos , Doenças Raras/genética , Biologia Computacional/métodos
2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39013383

RESUMO

Unlike animals, variability in transcription factors (TFs) and their binding regions (TFBRs) across the plants species is a major problem that most of the existing TFBR finding software fail to tackle, rendering them hardly of any use. This limitation has resulted into underdevelopment of plant regulatory research and rampant use of Arabidopsis-like model species, generating misleading results. Here, we report a revolutionary transformers-based deep-learning approach, PTFSpot, which learns from TF structures and their binding regions' co-variability to bring a universal TF-DNA interaction model to detect TFBR with complete freedom from TF and species-specific models' limitations. During a series of extensive benchmarking studies over multiple experimentally validated data, it not only outperformed the existing software by >30% lead but also delivered consistently >90% accuracy even for those species and TF families that were never encountered during the model-building process. PTFSpot makes it possible now to accurately annotate TFBRs across any plant genome even in the total lack of any TF information, completely free from the bottlenecks of species and TF-specific models.


Assuntos
Aprendizado Profundo , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Software , Arabidopsis/metabolismo , Arabidopsis/genética , Genoma de Planta , Biologia Computacional/métodos , Plantas/metabolismo , Plantas/genética
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388680

RESUMO

CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes/métodos , RNA Guia de Sistemas CRISPR-Cas , Redes Neurais de Computação
4.
Proc Natl Acad Sci U S A ; 120(34): e2219150120, 2023 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-37579149

RESUMO

Glial cells account for between 50% and 90% of all human brain cells, and serve a variety of important developmental, structural, and metabolic functions. Recent experimental efforts suggest that astrocytes, a type of glial cell, are also directly involved in core cognitive processes such as learning and memory. While it is well established that astrocytes and neurons are connected to one another in feedback loops across many timescales and spatial scales, there is a gap in understanding the computational role of neuron-astrocyte interactions. To help bridge this gap, we draw on recent advances in AI and astrocyte imaging technology. In particular, we show that neuron-astrocyte networks can naturally perform the core computation of a Transformer, a particularly successful type of AI architecture. In doing so, we provide a concrete, normative, and experimentally testable account of neuron-astrocyte communication. Because Transformers are so successful across a wide variety of task domains, such as language, vision, and audition, our analysis may help explain the ubiquity, flexibility, and power of the brain's neuron-astrocyte networks.


Assuntos
Astrócitos , Neurônios , Humanos , Astrócitos/fisiologia , Neurônios/fisiologia , Neuroglia/fisiologia , Encéfalo
5.
Proc Natl Acad Sci U S A ; 120(32): e2303499120, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37523536

RESUMO

Transformer neural networks have revolutionized structural biology with the ability to predict protein structures at unprecedented high accuracy. Here, we report the predictive modeling performance of the state-of-the-art protein structure prediction methods built on transformers for 69 protein targets from the recently concluded 15th Critical Assessment of Structure Prediction (CASP15) challenge. Our study shows the power of transformers in protein structure modeling and highlights future areas of improvement.


Assuntos
Fontes de Energia Elétrica , Redes Neurais de Computação
6.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36907657

RESUMO

Discovering pre-microRNAs (miRNAs) is the core of miRNA discovery. Using traditional sequence/structural features, many tools have been published to discover miRNAs. However, in practical applications like genomic annotations, their actual performance has been very low. This becomes more grave in plants where unlike animals pre-miRNAs are much more complex and difficult to identify. A huge gap exists between animals and plants for the available software for miRNA discovery and species-specific miRNA information. Here, we present miWords, a composite deep learning system of transformers and convolutional neural networks which sees genome as a pool of sentences made of words with specific occurrence preferences and contexts, to accurately identify pre-miRNA regions across plant genomes. A comprehensive benchmarking was done involving >10 software representing different genre and many experimentally validated datasets. miWords emerged as the best one while breaching accuracy of 98% and performance lead of ~10%. miWords was also evaluated across Arabidopsis genome where also it outperformed the compared tools. As a demonstration, miWords was run across the tea genome, reporting 803 pre-miRNA regions, all validated by small RNA-seq reads from multiple samples, and most of them were functionally supported by the degradome sequencing data. miWords is freely available as stand-alone source codes at https://scbb.ihbt.res.in/miWords/index.php.


Assuntos
Arabidopsis , Aprendizado Profundo , MicroRNAs , Animais , MicroRNAs/genética , MicroRNAs/química , Software , Genômica , Genoma de Planta , Arabidopsis/genética
7.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38033290

RESUMO

Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.


Assuntos
Descoberta de Drogas , Intuição , Humanos , Aprendizagem
8.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37605947

RESUMO

Predicting the biological properties of molecules is crucial in computer-aided drug development, yet it's often impeded by data scarcity and imbalance in many practical applications. Existing approaches are based on self-supervised learning or 3D data and using an increasing number of parameters to improve performance. These approaches may not take full advantage of established chemical knowledge and could inadvertently introduce noise into the respective model. In this study, we introduce a more elegant transformer-based framework with focused attention for molecular representation (TransFoxMol) to improve the understanding of artificial intelligence (AI) of molecular structure property relationships. TransFoxMol incorporates a multi-scale 2D molecular environment into a graph neural network + Transformer module and uses prior chemical maps to obtain a more focused attention landscape compared to that obtained using existing approaches. Experimental results show that TransFoxMol achieves state-of-the-art performance on MoleculeNet benchmarks and surpasses the performance of baselines that use self-supervised learning or geometry-enhanced strategies on small-scale datasets. Subsequent analyses indicate that TransFoxMol's predictions are highly interpretable and the clever use of chemical knowledge enables AI to perceive molecules in a simple but rational way, enhancing performance.


Assuntos
Inteligência Artificial , Benchmarking , Redes Neurais de Computação
9.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37523217

RESUMO

Annotation of cell-types is a critical step in the analysis of single-cell RNA sequencing (scRNA-seq) data that allows the study of heterogeneity across multiple cell populations. Currently, this is most commonly done using unsupervised clustering algorithms, which project single-cell expression data into a lower dimensional space and then cluster cells based on their distances from each other. However, as these methods do not use reference datasets, they can only achieve a rough classification of cell-types, and it is difficult to improve the recognition accuracy further. To effectively solve this issue, we propose a novel supervised annotation method, scDeepInsight. The scDeepInsight method is capable of performing manifold assignments. It is competent in executing data integration through batch normalization, performing supervised training on the reference dataset, doing outlier detection and annotating cell-types on query datasets. Moreover, it can help identify active genes or marker genes related to cell-types. The training of the scDeepInsight model is performed in a unique way. Tabular scRNA-seq data are first converted to corresponding images through the DeepInsight methodology. DeepInsight can create a trainable image transformer to convert non-image RNA data to images by comprehensively comparing interrelationships among multiple genes. Subsequently, the converted images are fed into convolutional neural networks such as EfficientNet-b3. This enables automatic feature extraction to identify the cell-types of scRNA-seq samples. We benchmarked scDeepInsight with six other mainstream cell annotation methods. The average accuracy rate of scDeepInsight reached 87.5%, which is more than 7% higher compared with the state-of-the-art methods.


Assuntos
Aprendizado Profundo , Análise da Expressão Gênica de Célula Única , Algoritmos , Benchmarking , Análise por Conglomerados , Análise de Sequência de RNA , Perfilação da Expressão Gênica
10.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37088981

RESUMO

BACKGROUND: Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development. RESULTS: To shed new light on the conflicting views on short eccDNAs' randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65 ± 4.7%, attention-based models: 83.31 ± 4.18%). CONCLUSIONS: The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.


Assuntos
DNA Circular , DNA , Humanos , Genoma , Células Eucarióticas , Biomarcadores
11.
BMC Bioinformatics ; 25(1): 122, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38515052

RESUMO

BACKGROUND: Nanobodies, also known as VHH or single-domain antibodies, are unique antibody fragments derived solely from heavy chains. They offer advantages of small molecules and conventional antibodies, making them promising therapeutics. The paratope is the specific region on an antibody that binds to an antigen. Paratope prediction involves the identification and characterization of the antigen-binding site on an antibody. This process is crucial for understanding the specificity and affinity of antibody-antigen interactions. Various computational methods and experimental approaches have been developed to predict and analyze paratopes, contributing to advancements in antibody engineering, drug development, and immunotherapy. However, existing predictive models trained on traditional antibodies may not be suitable for nanobodies. Additionally, the limited availability of nanobody datasets poses challenges in constructing accurate models. METHODS: To address these challenges, we have developed a novel nanobody prediction model, named NanoBERTa-ASP (Antibody Specificity Prediction), which is specifically designed for predicting nanobody-antigen binding sites. The model adopts a training strategy more suitable for nanobodies, based on an advanced natural language processing (NLP) model called BERT (Bidirectional Encoder Representations from Transformers). To be more specific, the model utilizes a masked language modeling approach named RoBERTa (Robustly Optimized BERT Pretraining Approach) to learn the contextual information of the nanobody sequence and predict its binding site. RESULTS: NanoBERTa-ASP achieved exceptional performance in predicting nanobody binding sites, outperforming existing methods, indicating its proficiency in capturing sequence information specific to nanobodies and accurately identifying their binding sites. Furthermore, NanoBERTa-ASP provides insights into the interaction mechanisms between nanobodies and antigens, contributing to a better understanding of nanobodies and facilitating the design and development of nanobodies with therapeutic potential. CONCLUSION: NanoBERTa-ASP represents a significant advancement in nanobody paratope prediction. Its superior performance highlights the potential of deep learning approaches in nanobody research. By leveraging the increasing volume of nanobody data, NanoBERTa-ASP can further refine its predictions, enhance its performance, and contribute to the development of novel nanobody-based therapeutics. Github repository: https://github.com/WangLabforComputationalBiology/NanoBERTa-ASP.


Assuntos
Anticorpos de Domínio Único , Sítios de Ligação de Anticorpos , Anticorpos de Domínio Único/química , Anticorpos , Sítios de Ligação , Especificidade de Anticorpos
12.
BMC Bioinformatics ; 25(1): 208, 2024 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-38849719

RESUMO

BACKGROUND: Drug design is a challenging and important task that requires the generation of novel and effective molecules that can bind to specific protein targets. Artificial intelligence algorithms have recently showed promising potential to expedite the drug design process. However, existing methods adopt multi-objective approaches which limits the number of objectives. RESULTS: In this paper, we expand this thread of research from the many-objective perspective, by proposing a novel framework that integrates a latent Transformer-based model for molecular generation, with a drug design system that incorporates absorption, distribution, metabolism, excretion, and toxicity prediction, molecular docking, and many-objective metaheuristics. We compared the performance of two latent Transformer models (ReLSO and FragNet) on a molecular generation task and show that ReLSO outperforms FragNet in terms of reconstruction and latent space organization. We then explored six different many-objective metaheuristics based on evolutionary algorithms and particle swarm optimization on a drug design task involving potential drug candidates to human lysophosphatidic acid receptor 1, a cancer-related protein target. CONCLUSION: We show that multi-objective evolutionary algorithm based on dominance and decomposition performs the best in terms of finding molecules that satisfy many objectives, such as high binding affinity and low toxicity, and high drug-likeness. Our framework demonstrates the potential of combining Transformers and many-objective computational intelligence for drug design.


Assuntos
Algoritmos , Desenho de Fármacos , Humanos , Simulação de Acoplamento Molecular , Receptores de Ácidos Lisofosfatídicos/metabolismo , Receptores de Ácidos Lisofosfatídicos/química , Inteligência Artificial
13.
Hum Brain Mapp ; 45(11): e26803, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39119860

RESUMO

Accurate segmentation of chronic stroke lesions from mono-spectral magnetic resonance imaging scans (e.g., T1-weighted images) is a difficult task due to the arbitrary shape, complex texture, variable size and intensities, and varied locations of the lesions. Due to this inherent spatial heterogeneity, existing machine learning methods have shown moderate performance for chronic lesion delineation. In this study, we introduced: (1) a method that integrates transformers' deformable feature attention mechanism with convolutional deep learning architecture to improve the accuracy and generalizability of stroke lesion segmentation, and (2) an ecological data augmentation technique based on inserting real lesions into intact brain regions. Our combination of these two approaches resulted in a significant increase in segmentation performance, with a Dice index of 0.82 (±0.39), outperforming the existing methods trained and tested on the same Anatomical Tracings of Lesions After Stroke (ATLAS) 2022 dataset. Our method performed relatively well even for cases with small stroke lesions. We validated the robustness of our method through an ablation study and by testing it on new unseen brain scans from the Ischemic Stroke Lesion Segmentation (ISLES) 2015 dataset. Overall, our proposed approach of transformers with ecological data augmentation offers a robust way to delineate chronic stroke lesions with clinically relevant accuracy. Our method can be extended to other challenging tasks that require automated detection and segmentation of diverse brain abnormalities from clinical scans.


Assuntos
Aprendizado Profundo , Imageamento por Ressonância Magnética , Acidente Vascular Cerebral , Humanos , Imageamento por Ressonância Magnética/métodos , Imageamento por Ressonância Magnética/normas , Acidente Vascular Cerebral/diagnóstico por imagem , Acidente Vascular Cerebral/patologia , Neuroimagem/métodos , Neuroimagem/normas , AVC Isquêmico/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Idoso , Encéfalo/diagnóstico por imagem , Encéfalo/patologia
14.
J Cardiovasc Magn Reson ; 26(1): 101035, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38460841

RESUMO

BACKGROUND: Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings. PURPOSE: To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons. METHODS: ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: "Explain the radiology report in a language understandable to a medical layperson". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 "strongly disagree", 5 "strongly agree"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis. RESULTS: GPT-4 reports were generated on average in 52 s ± 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with "strong agreement" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001). CONCLUSION: GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patient-relevant radiology information in an easy-to-understand manner.


Assuntos
Compreensão , Imageamento por Ressonância Magnética , Valor Preditivo dos Testes , Humanos , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Letramento em Saúde , Educação de Pacientes como Assunto , Doenças Cardiovasculares/diagnóstico por imagem , Feminino , Masculino
15.
J Chem Inf Model ; 64(16): 6259-6280, 2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39136669

RESUMO

Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.


Assuntos
Aprendizado de Máquina , Descoberta de Drogas/métodos , Aprendizado Profundo
16.
RNA Biol ; 21(1): 1-10, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38357904

RESUMO

RNA modifications play crucial roles in various biological processes and diseases. Accurate prediction of RNA modification sites is essential for understanding their functions. In this study, we propose a hybrid approach that fuses a pre-trained sequence representation with various sequence features to predict multiple types of RNA modifications in one combined prediction framework. We developed MRM-BERT, a deep learning method that combined the pre-trained DNABERT deep sequence representation module and the convolutional neural network (CNN) exploiting four traditional sequence feature encodings to improve the prediction performance. MRM-BERT was evaluated on multiple datasets of 12 commonly occurring RNA modifications, including m6A, m5C, m1A and so on. The results demonstrate that our hybrid model outperforms other models in terms of area under receiver operating characteristic curve (AUC) for all 12 types of RNA modifications. MRM-BERT is available as an online tool (http://117.122.208.21:8501) or source code (https://github.com/abhhba999/MRM-BERT), which allows users to predict RNA modification sites and visualize the results. Overall, our study provides an effective and efficient approach to predict multiple RNA modifications, contributing to the understanding of RNA biology and the development of therapeutic strategies.


Assuntos
Redes Neurais de Computação , RNA , RNA/genética , Curva ROC , Software
17.
J Biomed Inform ; 156: 104674, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38871012

RESUMO

OBJECTIVE: Biomedical Named Entity Recognition (bio NER) is the task of recognizing named entities in biomedical texts. This paper introduces a new model that addresses bio NER by considering additional external contexts. Different from prior methods that mainly use original input sequences for sequence labeling, the model takes into account additional contexts to enhance the representation of entities in the original sequences, since additional contexts can provide enhanced information for the concept explanation of biomedical entities. METHODS: To exploit an additional context, given an original input sequence, the model first retrieves the relevant sentences from PubMed and then ranks the retrieved sentences to form the contexts. It next combines the context with the original input sequence to form a new enhanced sequence. The original and new enhanced sequences are fed into PubMedBERT for learning feature representation. To obtain more fine-grained features, the model stacks a BiLSTM layer on top of PubMedBERT. The final named entity label prediction is done by using a CRF layer. The model is jointly trained in an end-to-end manner to take advantage of the additional context for NER of the original sequence. RESULTS: Experimental results on six biomedical datasets show that the proposed model achieves promising performance compared to strong baselines and confirms the contribution of additional contexts for bio NER. CONCLUSION: The promising results confirm three important points. First, the additional context from PubMed helps to improve the quality of the recognition of biomedical entities. Second, PubMed is more appropriate than the Google search engine for providing relevant information of bio NER. Finally, more relevant sentences from the context are more beneficial than irrelevant ones to provide enhanced information for the original input sequences. The model is flexible to integrate any additional context types for the NER task.


Assuntos
Processamento de Linguagem Natural , PubMed , Humanos , Algoritmos , Mineração de Dados/métodos , Semântica , Informática Médica/métodos
18.
Int J Clin Oncol ; 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38619651

RESUMO

Breast cancer is the most prevalent cancer among women, and its diagnosis requires the accurate identification and classification of histological features for effective patient management. Artificial intelligence, particularly through deep learning, represents the next frontier in cancer diagnosis and management. Notably, the use of convolutional neural networks and emerging Vision Transformers (ViT) has been reported to automate pathologists' tasks, including tumor detection and classification, in addition to improving the efficiency of pathology services. Deep learning applications have also been extended to the prediction of protein expression, molecular subtype, mutation status, therapeutic efficacy, and outcome prediction directly from hematoxylin and eosin-stained slides, bypassing the need for immunohistochemistry or genetic testing. This review explores the current status and prospects of deep learning in breast cancer diagnosis with a focus on whole-slide image analysis. Artificial intelligence applications are increasingly applied to many tasks in breast pathology ranging from disease diagnosis to outcome prediction, thus serving as valuable tools for assisting pathologists and supporting breast cancer management.

19.
BMC Nephrol ; 25(1): 337, 2024 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-39385124

RESUMO

Recent advancements in computer vision within the field of artificial intelligence (AI) have made significant inroads into the medical domain. However, the application of AI for classifying renal pathology remains challenging due to the subtle variations in multiple renal pathological classifications. Vision Transformers (ViT), an adaptation of the Transformer model for image recognition, have demonstrated superior capabilities in capturing global features and providing greater explainability. In our study, we developed a ViT model using a diverse set of stained renal histopathology images to evaluate its effectiveness in classifying renal pathology. A total of 1861 whole slide images (WSI) stained with HE, MASSON, PAS, and PASM were collected from 635 patients. Renal tissue images were then extracted, tiled, and categorized into 14 classes on the basis of renal pathology. We employed the classic ViT model from the Timm library, utilizing images sized 384 × 384 pixels with 16 × 16 pixel patches, to train the classification model. A comparative analysis was conducted to evaluate the performance of the ViT model against traditional convolutional neural network (CNN) models. The results indicated that the ViT model demonstrated superior recognition ability (accuracy: 0.96-0.99). Furthermore, we visualized the identification process of the ViT models to investigate potentially significant pathological ultrastructures. Our study demonstrated that ViT models outperformed CNN models in accurately classifying renal pathology. Additionally, ViT models are able to focus on specific, significant structures within renal histopathology, which could be crucial for identifying novel and meaningful pathological features in the diagnosis and treatment of renal disease.


Assuntos
Nefropatias , Rim , Humanos , Nefropatias/patologia , Nefropatias/classificação , Rim/patologia , Redes Neurais de Computação , Inteligência Artificial , Processamento de Imagem Assistida por Computador/métodos
20.
J Med Internet Res ; 26: e53968, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38767953

RESUMO

BACKGROUND: In 2023, the United States experienced its highest- recorded number of suicides, exceeding 50,000 deaths. In the realm of psychiatric disorders, major depressive disorder stands out as the most common issue, affecting 15% to 17% of the population and carrying a notable suicide risk of approximately 15%. However, not everyone with depression has suicidal thoughts. While "suicidal depression" is not a clinical diagnosis, it may be observed in daily life, emphasizing the need for awareness. OBJECTIVE: This study aims to examine the dynamics, emotional tones, and topics discussed in posts within the r/Depression subreddit, with a specific focus on users who had also engaged in the r/SuicideWatch community. The objective was to use natural language processing techniques and models to better understand the complexities of depression among users with potential suicide ideation, with the goal of improving intervention and prevention strategies for suicide. METHODS: Archived posts were extracted from the r/Depression and r/SuicideWatch Reddit communities in English spanning from 2019 to 2022, resulting in a final data set of over 150,000 posts contributed by approximately 25,000 unique overlapping users. A broad and comprehensive mix of methods was conducted on these posts, including trend and survival analysis, to explore the dynamic of users in the 2 subreddits. The BERT family of models extracted features from data for sentiment and thematic analysis. RESULTS: On August 16, 2020, the post count in r/SuicideWatch surpassed that of r/Depression. The transition from r/Depression to r/SuicideWatch in 2020 was the shortest, lasting only 26 days. Sadness emerged as the most prevalent emotion among overlapping users in the r/Depression community. In addition, physical activity changes, negative self-view, and suicidal thoughts were identified as the most common depression symptoms, all showing strong positive correlations with the emotion tone of disappointment. Furthermore, the topic "struggles with depression and motivation in school and work" (12%) emerged as the most discussed topic aside from suicidal thoughts, categorizing users based on their inclination toward suicide ideation. CONCLUSIONS: Our study underscores the effectiveness of using natural language processing techniques to explore language markers and patterns associated with mental health challenges in online communities like r/Depression and r/SuicideWatch. These insights offer novel perspectives distinct from previous research. In the future, there will be potential for further refinement and optimization of machine classifications using these techniques, which could lead to more effective intervention and prevention strategies.


Assuntos
COVID-19 , Ideação Suicida , Humanos , COVID-19/psicologia , COVID-19/epidemiologia , Processamento de Linguagem Natural , Depressão/psicologia , Pandemias , Estados Unidos , Mídias Sociais , Suicídio/psicologia , Suicídio/estatística & dados numéricos , Transtorno Depressivo Maior/psicologia , SARS-CoV-2
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA