Búsqueda | Biblioteca Virtual en Salud

1.

Accurate staging of chick embryonic tissues via deep learning of salient features.

Groves, Ian; Holmshaw, Jacob; Furley, David; Manning, Elizabeth; Chinnaiya, Kavitha; Towers, Matthew; Evans, Benjamin D; Placzek, Marysia; Fletcher, Alexander G.

Development ; 150(22)2023 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-37830145

RESUMEN

Recent work shows that the developmental potential of progenitor cells in the HH10 chick brain changes rapidly, accompanied by subtle changes in morphology. This demands increased temporal resolution for studies of the brain at this stage, necessitating precise and unbiased staging. Here, we investigated whether we could train a deep convolutional neural network to sub-stage HH10 chick brains using a small dataset of 151 expertly labelled images. By augmenting our images with biologically informed transformations and data-driven preprocessing steps, we successfully trained a classifier to sub-stage HH10 brains to 87.1% test accuracy. To determine whether our classifier could be generally applied, we re-trained it using images (269) of randomised control and experimental chick wings, and obtained similarly high test accuracy (86.1%). Saliency analyses revealed that biologically relevant features are used for classification. Our strategy enables training of image classifiers for various applications in developmental biology with limited microscopy data.

Asunto(s)

Aprendizaje Profundo , Animales , Redes Neurales de la Computación , Encéfalo , Microscopía , Alas de Animales

2.

Mix-Key: graph mixup with key structures for molecular property prediction.

Jiang, Tianyi; Wang, Zeyu; Yu, Wenchao; Wang, Jinhuan; Yu, Shanqing; Bao, Xiaoze; Wei, Bin; Xuan, Qi.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38706318

RESUMEN

Molecular property prediction faces the challenge of limited labeled data as it necessitates a series of specialized experiments to annotate target molecules. Data augmentation techniques can effectively address the issue of data scarcity. In recent years, Mixup has achieved significant success in traditional domains such as image processing. However, its application in molecular property prediction is relatively limited due to the irregular, non-Euclidean nature of graphs and the fact that minor variations in molecular structures can lead to alterations in their properties. To address these challenges, we propose a novel data augmentation method called Mix-Key tailored for molecular property prediction. Mix-Key aims to capture crucial features of molecular graphs, focusing separately on the molecular scaffolds and functional groups. By generating isomers that are relatively invariant to the scaffolds or functional groups, we effectively preserve the core information of molecules. Additionally, to capture interactive information between the scaffolds and functional groups while ensuring correlation between the original and augmented graphs, we introduce molecular fingerprint similarity and node similarity. Through these steps, Mix-Key determines the mixup ratio between the original graph and two isomers, thus generating more informative augmented molecular graphs. We extensively validate our approach on molecular datasets of different scales with several Graph Neural Network architectures. The results demonstrate that Mix-Key consistently outperforms other data augmentation methods in enhancing molecular property prediction on several datasets.

Asunto(s)

Algoritmos , Estructura Molecular , Biología Computacional/métodos , Programas Informáticos

3.

DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model.

Wang, Xiao; Han, Lijun; Wang, Rong; Chen, Haoran.

Brief Bioinform ; 24(3)2023 05 19.

Artículo en Inglés | MEDLINE | ID: mdl-36929854

RESUMEN

Chloroplast is a crucial site for photosynthesis in plants. Determining the location and distribution of proteins in subchloroplasts is significant for studying the energy conversion of chloroplasts and regulating the utilization of light energy in crop production. However, the prediction accuracy of the currently developed protein subcellular site predictors is still limited due to the complex protein sequence features and the scarcity of labeled samples. We propose DaDL-SChlo, a multi-location protein subchloroplast localization predictor, which addresses the above problems by fusing pre-trained protein language model deep learning features with traditional handcrafted features and using generative adversarial networks for data augmentation. The experimental results of cross-validation and independent testing show that DaDL-SChlo has greatly improved the prediction performance of protein subchloroplast compared with the state-of-the-art predictors. Specifically, the overall actual accuracy outperforms the state-of-the-art predictors by 10.7% on 10-fold cross-validation and 12.6% on independent testing. DaDL-SChlo is a promising and efficient predictor for protein subchloroplast localization. The datasets and codes of DaDL-SChlo are available at https://github.com/xwanggroup/DaDL-SChlo.

Asunto(s)

Cloroplastos , Lenguaje , Transporte de Proteínas , Cloroplastos/metabolismo , Proyectos de Investigación

4.

A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing.

Ma, Chunwei; Wolfinger, Russ.

Brief Bioinform ; 24(6)2023 09 22.

Artículo en Inglés | MEDLINE | ID: mdl-37985456

RESUMEN

Blood-brain barrier penetrating peptides (BBBPs) are short peptide sequences that possess the ability to traverse the selective blood-brain interface, making them valuable drug candidates or carriers for various payloads. However, the in vivo or in vitro validation of BBBPs is resource-intensive and time-consuming, driving the need for accurate in silico prediction methods. Unfortunately, the scarcity of experimentally validated BBBPs hinders the efficacy of current machine-learning approaches in generating reliable predictions. In this paper, we present DeepB3P3, a novel framework for BBBPs prediction. Our contribution encompasses four key aspects. Firstly, we propose a novel deep learning model consisting of a transformer encoder layer, a convolutional network backbone, and a capsule network classification head. This integrated architecture effectively learns representative features from peptide sequences. Secondly, we introduce masked peptides as a powerful data augmentation technique to compensate for small training set sizes in BBBP prediction. Thirdly, we develop a novel threshold-tuning method to handle imbalanced data by approximating the optimal decision threshold using the training set. Lastly, DeepB3P3 provides an accurate estimation of the uncertainty level associated with each prediction. Through extensive experiments, we demonstrate that DeepB3P3 achieves state-of-the-art accuracy of up to 98.31% on a benchmarking dataset, solidifying its potential as a promising computational tool for the prediction and discovery of BBBPs.

Asunto(s)

Barrera Hematoencefálica , Péptidos , Aprendizaje Automático , Secuencia de Aminoácidos , Biología Computacional/métodos

5.

Prediction of blood-brain barrier penetrating peptides based on data augmentation with Augur.

Gu, Zhi-Feng; Hao, Yu-Duo; Wang, Tian-Yu; Cai, Pei-Ling; Zhang, Yang; Deng, Ke-Jun; Lin, Hao; Lv, Hao.

BMC Biol ; 22(1): 86, 2024 Apr 19.

Artículo en Inglés | MEDLINE | ID: mdl-38637801

RESUMEN

BACKGROUND: The blood-brain barrier serves as a critical interface between the bloodstream and brain tissue, mainly composed of pericytes, neurons, endothelial cells, and tightly connected basal membranes. It plays a pivotal role in safeguarding brain from harmful substances, thus protecting the integrity of the nervous system and preserving overall brain homeostasis. However, this remarkable selective transmission also poses a formidable challenge in the realm of central nervous system diseases treatment, hindering the delivery of large-molecule drugs into the brain. In response to this challenge, many researchers have devoted themselves to developing drug delivery systems capable of breaching the blood-brain barrier. Among these, blood-brain barrier penetrating peptides have emerged as promising candidates. These peptides had the advantages of high biosafety, ease of synthesis, and exceptional penetration efficiency, making them an effective drug delivery solution. While previous studies have developed a few prediction models for blood-brain barrier penetrating peptides, their performance has often been hampered by issue of limited positive data. RESULTS: In this study, we present Augur, a novel prediction model using borderline-SMOTE-based data augmentation and machine learning. we extract highly interpretable physicochemical properties of blood-brain barrier penetrating peptides while solving the issues of small sample size and imbalance of positive and negative samples. Experimental results demonstrate the superior prediction performance of Augur with an AUC value of 0.932 on the training set and 0.931 on the independent test set. CONCLUSIONS: This newly developed Augur model demonstrates superior performance in predicting blood-brain barrier penetrating peptides, offering valuable insights for drug development targeting neurological disorders. This breakthrough may enhance the efficiency of peptide-based drug discovery and pave the way for innovative treatment strategies for central nervous system diseases.

Asunto(s)

Péptidos de Penetración Celular , Enfermedades del Sistema Nervioso Central , Humanos , Barrera Hematoencefálica/química , Células Endoteliales , Péptidos de Penetración Celular/química , Péptidos de Penetración Celular/farmacología , Péptidos de Penetración Celular/uso terapéutico , Encéfalo , Enfermedades del Sistema Nervioso Central/tratamiento farmacológico

6.

Assessing the reliability of point mutation as data augmentation for deep learning with genomic data.

Lee, Hyunjung; Ozbulak, Utku; Park, Homin; Depuydt, Stephen; De Neve, Wesley; Vankerschaver, Joris.

BMC Bioinformatics ; 25(1): 170, 2024 Apr 30.

Artículo en Inglés | MEDLINE | ID: mdl-38689247

RESUMEN

BACKGROUND: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. RESULTS: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. CONCLUSION: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.

Asunto(s)

Aprendizaje Profundo , Genómica , Mutación Puntual , Genómica/métodos , Humanos , Reproducibilidad de los Resultados , Redes Neurales de la Computación

7.

DeepAEG: a model for predicting cancer drug response based on data enhancement and edge-collaborative update strategies.

Lao, Chuanqi; Zheng, Pengfei; Chen, Hongyang; Liu, Qiao; An, Feng; Li, Zhao.

BMC Bioinformatics ; 25(1): 105, 2024 Mar 09.

Artículo en Inglés | MEDLINE | ID: mdl-38461284

RESUMEN

MOTIVATION: The prediction of cancer drug response is a challenging subject in modern personalized cancer therapy due to the uncertainty of drug efficacy and the heterogeneity of patients. It has been shown that the characteristics of the drug itself and the genomic characteristics of the patient can greatly influence the results of cancer drug response. Therefore, accurate, efficient, and comprehensive methods for drug feature extraction and genomics integration are crucial to improve the prediction accuracy. RESULTS: Accurate prediction of cancer drug response is vital for guiding the design of anticancer drugs. In this study, we propose an end-to-end deep learning model named DeepAEG which is based on a complete-graph update mode to predict IC50. Specifically, we integrate an edge update mechanism on the basis of a hybrid graph convolutional network to comprehensively learn the potential high-dimensional representation of topological structures in drugs, including atomic characteristics and chemical bond information. Additionally, we present a novel approach for enhancing simplified molecular input line entry specification data by employing sequence recombination to eliminate the defect of single sequence representation of drug molecules. Our extensive experiments show that DeepAEG outperforms other existing methods across multiple evaluation parameters in multiple test sets. Furthermore, we identify several potential anticancer agents, including bortezomib, which has proven to be an effective clinical treatment option. Our results highlight the potential value of DeepAEG in guiding the design of specific cancer treatment regimens.

Asunto(s)

Antineoplásicos , Neoplasias , Humanos , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Bortezomib , Genómica , Incertidumbre

8.

MFNet: Meta-learning based on frequency-space mix for MRI segmentation in nasopharyngeal carcinoma.

Li, Yin; Chen, Qi; Li, Hao; Wang, Song; Chen, Nutan; Han, Ting; Wang, Kai; Yu, Qingqing; Cao, Zhantao; Tang, Jun.

J Cell Mol Med ; 28(9): e18355, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38685683

RESUMEN

Deep learning techniques have been applied to medical image segmentation and demonstrated expert-level performance. Due to the poor generalization abilities of the models in the deployment in different centres, common solutions, such as transfer learning and domain adaptation techniques, have been proposed to mitigate this issue. However, these solutions necessitate retraining the models with target domain data and annotations, which limits their deployment in clinical settings in unseen domains. We evaluated the performance of domain generalization methods on the task of MRI segmentation of nasopharyngeal carcinoma (NPC) by collecting a new dataset of 321 patients with manually annotated MRIs from two hospitals. We transformed the modalities of MRI, including T1WI, T2WI and CE-T1WI, from the spatial domain to the frequency domain using Fourier transform. To address the bottleneck of domain generalization in MRI segmentation of NPC, we propose a meta-learning approach based on frequency domain feature mixing. We evaluated the performance of MFNet against existing techniques for generalizing NPC segmentation in terms of Dice and MIoU. Our method evidently outperforms the baseline in handling the generalization of NPC segmentation. The MF-Net clearly demonstrates its effectiveness for generalizing NPC MRI segmentation to unseen domains (Dice = 67.59%, MIoU = 75.74% T1W1). MFNet enhances the model's generalization capabilities by incorporating mixed-feature meta-learning. Our approach offers a novel perspective to tackle the domain generalization problem in the field of medical imaging by effectively exploiting the unique characteristics of medical images.

Asunto(s)

Imagen por Resonancia Magnética , Carcinoma Nasofaríngeo , Neoplasias Nasofaríngeas , Humanos , Imagen por Resonancia Magnética/métodos , Carcinoma Nasofaríngeo/diagnóstico por imagen , Neoplasias Nasofaríngeas/diagnóstico por imagen , Aprendizaje Profundo , Procesamiento de Imagen Asistido por Computador/métodos , Femenino , Masculino , Algoritmos

9.

A novel machine learning-based screening identifies statins as inhibitors of the calcium pump SERCA.

Cruz-Cortés, Carlos; Velasco-Saavedra, M Andrés; Fernández-de Gortari, Eli; Guerrero-Serna, Guadalupe; Aguayo-Ortiz, Rodrigo; Espinoza-Fonseca, L Michel.

J Biol Chem ; 299(5): 104681, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-37030504

RESUMEN

We report a novel small-molecule screening approach that combines data augmentation and machine learning to identify Food and Drug Administration (FDA)-approved drugs interacting with the calcium pump (Sarcoplasmic reticulum Ca2+-ATPase, SERCA) from skeletal (SERCA1a) and cardiac (SERCA2a) muscle. This approach uses information about small-molecule effectors to map and probe the chemical space of pharmacological targets, thus allowing to screen with high precision large databases of small molecules, including approved and investigational drugs. We chose SERCA because it plays a major role in the excitation-contraction-relaxation cycle in muscle and it represents a major target in both skeletal and cardiac muscle. The machine learning model predicted that SERCA1a and SERCA2a are pharmacological targets for seven statins, a group of FDA-approved 3-hydroxy-3-methylglutaryl coenzyme A reductase inhibitors used in the clinic as lipid-lowering medications. We validated the machine learning predictions by using in vitro ATPase assays to show that several FDA-approved statins are partial inhibitors of SERCA1a and SERCA2a. Complementary atomistic simulations predict that these drugs bind to two different allosteric sites of the pump. Our findings suggest that SERCA-mediated Ca2+ transport may be targeted by some statins (e.g., atorvastatin), thus providing a molecular pathway to explain statin-associated toxicity reported in the literature. These studies show the applicability of data augmentation and machine learning-based screening as a general platform for the identification of off-target interactions and the applicability of this approach extends to drug discovery.

Asunto(s)

Inhibidores de Hidroximetilglutaril-CoA Reductasas , ATPasas Transportadoras de Calcio del Retículo Sarcoplásmico , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Inhibidores de Hidroximetilglutaril-CoA Reductasas/metabolismo , Miocardio/enzimología , ATPasas Transportadoras de Calcio del Retículo Sarcoplásmico/antagonistas & inhibidores , Aprendizaje Automático

10.

Flattening the curve-How to get better results with small deep-mutational-scanning datasets.

Wirnsberger, Gregor; Pritisanac, Iva; Oberdorfer, Gustav; Gruber, Karl.

Proteins ; 92(7): 886-902, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38501649

RESUMEN

Proteins are used in various biotechnological applications, often requiring the optimization of protein properties by introducing specific amino-acid exchanges. Deep mutational scanning (DMS) is an effective high-throughput method for evaluating the effects of these exchanges on protein function. DMS data can then inform the training of a neural network to predict the impact of mutations. Most approaches use some representation of the protein sequence for training and prediction. As proteins are characterized by complex structures and intricate residue interaction networks, directly providing structural information as input reduces the need to learn these features from the data. We introduce a method for encoding protein structures as stacked 2D contact maps, which capture residue interactions, their evolutionary conservation, and mutation-induced interaction changes. Furthermore, we explored techniques to augment neural network training performance on smaller DMS datasets. To validate our approach, we trained three neural network architectures originally used for image analysis on three DMS datasets, and we compared their performances with networks trained solely on protein sequences. The results confirm the effectiveness of the protein structure encoding in machine learning efforts on DMS data. Using structural representations as direct input to the networks, along with data augmentation and pretraining, significantly reduced demands on training data size and improved prediction performance, especially on smaller datasets, while performance on large datasets was on par with state-of-the-art sequence convolutional neural networks. The methods presented here have the potential to provide the same workflow as DMS without the experimental and financial burden of testing thousands of mutants. Additionally, we present an open-source, user-friendly software tool to make these data analysis techniques accessible, particularly to biotechnology and protein engineering researchers who wish to apply them to their mutagenesis data.

Asunto(s)

Redes Neurales de la Computación , Proteínas , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Mutación , Bases de Datos de Proteínas , Biología Computacional/métodos , Aprendizaje Profundo , Algoritmos , Conformación Proteica , Programas Informáticos , Aprendizaje Automático , Humanos

11.

A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes.

Samartsidis, Pantelis; Seaman, Shaun R; Harrison, Abbie; Alexopoulos, Angelos; Hughes, Gareth J; Rawlinson, Christopher; Anderson, Charlotte; Charlett, André; Oliver, Isabel; De Angelis, Daniela.

Biostatistics ; 2023 Dec 06.

Artículo en Inglés | MEDLINE | ID: mdl-38058013

RESUMEN

Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.

12.

Cross-site validation of lung cancer diagnosis by electronic nose with deep learning: a multicenter prospective study.

Lee, Meng-Rui; Kao, Mu-Hsiang; Hsieh, Ya-Chu; Sun, Min; Tang, Kea-Tiong; Wang, Jann-Yuan; Ho, Chao-Chi; Shih, Jin-Yuan; Yu, Chong-Jen.

Respir Res ; 25(1): 203, 2024 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-38730430

RESUMEN

BACKGROUND: Although electronic nose (eNose) has been intensively investigated for diagnosing lung cancer, cross-site validation remains a major obstacle to be overcome and no studies have yet been performed. METHODS: Patients with lung cancer, as well as healthy control and diseased control groups, were prospectively recruited from two referral centers between 2019 and 2022. Deep learning models for detecting lung cancer with eNose breathprint were developed using training cohort from one site and then tested on cohort from the other site. Semi-Supervised Domain-Generalized (Semi-DG) Augmentation (SDA) and Noise-Shift Augmentation (NSA) methods with or without fine-tuning was applied to improve performance. RESULTS: In this study, 231 participants were enrolled, comprising a training/validation cohort of 168 individuals (90 with lung cancer, 16 healthy controls, and 62 diseased controls) and a test cohort of 63 individuals (28 with lung cancer, 10 healthy controls, and 25 diseased controls). The model has satisfactory results in the validation cohort from the same hospital while directly applying the trained model to the test cohort yielded suboptimal results (AUC, 0.61, 95% CI: 0.47â0.76). The performance improved after applying data augmentation methods in the training cohort (SDA, AUC: 0.89 [0.81â0.97]; NSA, AUC:0.90 [0.89â1.00]). Additionally, after applying fine-tuning methods, the performance further improved (SDA plus fine-tuning, AUC:0.95 [0.89â1.00]; NSA plus fine-tuning, AUC:0.95 [0.90â1.00]). CONCLUSION: Our study revealed that deep learning models developed for eNose breathprint can achieve cross-site validation with data augmentation and fine-tuning. Accordingly, eNose breathprints emerge as a convenient, non-invasive, and potentially generalizable solution for lung cancer detection. CLINICAL TRIAL REGISTRATION: This study is not a clinical trial and was therefore not registered.

Asunto(s)

Aprendizaje Profundo , Nariz Electrónica , Neoplasias Pulmonares , Adulto , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pruebas Respiratorias/métodos , Neoplasias Pulmonares/diagnóstico , Estudios Prospectivos , Reproducibilidad de los Resultados

13.

Development of generic metabolic Raman calibration models using solution titration in aqueous phase and data augmentation for in-line cell culture analysis.

Zhang, Zhijun; Lang, Zhe; Chen, Gong; Zhou, Hang; Zhou, Weichang.

Biotechnol Bioeng ; 121(7): 2193-2204, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38639160

RESUMEN

This study presents a novel approach for developing generic metabolic Raman calibration models for in-line cell culture analysis using glucose and lactate stock solution titration in an aqueous phase and data augmentation techniques. First, a successful set-up of the titration method was achieved by adding glucose or lactate solution at several different constant rates into the aqueous phase of a bench-top bioreactor. Subsequently, the in-line glucose and lactate concentration were calculated and interpolated based on the rate of glucose and lactate addition, enabling data augmentation and enhancing the robustness of the metabolic calibration model. Nine different combinations of spectra pretreatment, wavenumber range selection, and number of latent variables were evaluated and optimized using aqueous titration data as training set and a historical cell culture data set as validation and prediction set. Finally, Raman spectroscopy data collected from 11 historical cell culture batches (spanning four culture modes and scales ranging from 3 to 200 L) were utilized to predict the corresponding glucose and lactate values. The results demonstrated a high prediction accuracy, with an average root mean square errors of prediction of 0.65 g/L for glucose, and 0.48 g/L for lactate. This innovative method establishes a generic metabolic calibration model, and its applicability can be extended to other metabolites, reducing the cost of deploying real-time cell culture monitoring using Raman spectroscopy in bioprocesses.

Asunto(s)

Técnicas de Cultivo de Célula , Glucosa , Ácido Láctico , Espectrometría Raman , Espectrometría Raman/métodos , Glucosa/metabolismo , Ácido Láctico/metabolismo , Ácido Láctico/análisis , Calibración , Técnicas de Cultivo de Célula/métodos , Reactores Biológicos , Modelos Biológicos , Células CHO , Cricetulus , Medios de Cultivo/química , Animales

14.

Inferring HIV transmission patterns from viral deep-sequence data via latent typed point processes.

Bu, Fan; Kagaayi, Joseph; Grabowski, Mary Kate; Ratmann, Oliver; Xu, Jason.

Biometrics ; 80(1)2024 Jan 29.

Artículo en Inglés | MEDLINE | ID: mdl-38372402

RESUMEN

Viral deep-sequencing data play a crucial role toward understanding disease transmission network flows, providing higher resolution compared to standard Sanger sequencing. To more fully utilize these rich data and account for the uncertainties in outcomes from phylogenetic analyses, we propose a spatial Poisson process model to uncover human immunodeficiency virus (HIV) transmission flow patterns at the population level. We represent pairings of individuals with viral sequence data as typed points, with coordinates representing covariates such as gender and age and point types representing the unobserved transmission statuses (linkage and direction). Points are associated with observed scores on the strength of evidence for each transmission status that are obtained through standard deep-sequence phylogenetic analysis. Our method is able to jointly infer the latent transmission statuses for all pairings and the transmission flow surface on the source-recipient covariate space. In contrast to existing methods, our framework does not require preclassification of the transmission statuses of data points, and instead learns them probabilistically through a fully Bayesian inference scheme. By directly modeling continuous spatial processes with smooth densities, our method enjoys significant computational advantages compared to previous methods that rely on discretization of the covariate space. We demonstrate that our framework can capture age structures in HIV transmission at high resolution, bringing valuable insights in a case study on viral deep-sequencing data from Southern Uganda.

Asunto(s)

Infecciones por VIH , VIH-1 , Humanos , Infecciones por VIH/epidemiología , Filogenia , Teorema de Bayes

15.

A discrete approximation method for modeling interval-censored multistate data.

You, Lu; Liu, Xiang; Krischer, Jeffrey.

Stat Med ; 43(12): 2452-2471, 2024 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-38599784

RESUMEN

Many longitudinal studies are designed to monitor participants for major events related to the progression of diseases. Data arising from such longitudinal studies are usually subject to interval censoring since the events are only known to occur between two monitoring visits. In this work, we propose a new method to handle interval-censored multistate data within a proportional hazards model framework where the hazard rate of events is modeled by a nonparametric function of time and the covariates affect the hazard rate proportionally. The main idea of this method is to simplify the likelihood functions of a discrete-time multistate model through an approximation and the application of data augmentation techniques, where the assumed presence of censored information facilitates a simpler parameterization. Then the expectation-maximization algorithm is used to estimate the parameters in the model. The performance of the proposed method is evaluated by numerical studies. Finally, the method is employed to analyze a dataset on tracking the advancement of coronary allograft vasculopathy following heart transplantation.

Asunto(s)

Algoritmos , Trasplante de Corazón , Modelos de Riesgos Proporcionales , Humanos , Funciones de Verosimilitud , Trasplante de Corazón/estadística & datos numéricos , Estudios Longitudinales , Simulación por Computador , Modelos Estadísticos , Interpretación Estadística de Datos

16.

Deep neural networks for wearable sensor-based activity recognition in Parkinson's disease: investigating generalizability and model complexity.

Davidashvilly, Shelly; Cardei, Maria; Hssayeni, Murtadha; Chi, Christopher; Ghoraani, Behnaz.

Biomed Eng Online ; 23(1): 17, 2024 Feb 09.

Artículo en Inglés | MEDLINE | ID: mdl-38336781

RESUMEN

BACKGROUND: The research gap addressed in this study is the applicability of deep neural network (NN) models on wearable sensor data to recognize different activities performed by patients with Parkinson's Disease (PwPD) and the generalizability of these models to PwPD using labeled healthy data. METHODS: The experiments were carried out utilizing three datasets containing wearable motion sensor readings on common activities of daily living. The collected readings were from two accelerometer sensors. PAMAP2 and MHEALTH are publicly available datasets collected from 10 and 9 healthy, young subjects, respectively. A private dataset of a similar nature collected from 14 PwPD patients was utilized as well. Deep NN models were implemented with varying levels of complexity to investigate the impact of data augmentation, manual axis reorientation, model complexity, and domain adaptation on activity recognition performance. RESULTS: A moderately complex model trained on the augmented PAMAP2 dataset and adapted to the Parkinson domain using domain adaptation achieved the best activity recognition performance with an accuracy of 73.02%, which was significantly higher than the accuracy of 63% reported in previous studies. The model's F1 score of 49.79% significantly improved compared to the best cross-testing of 33.66% F1 score with only data augmentation and 2.88% F1 score without data augmentation or domain adaptation. CONCLUSION: These findings suggest that deep NN models originating on healthy data have the potential to recognize activities performed by PwPD accurately and that data augmentation and domain adaptation can improve the generalizability of models in the healthy-to-PwPD transfer scenario. The simple/moderately complex architectures tested in this study could generalize better to the PwPD domain when trained on a healthy dataset compared to the most complex architectures used. The findings of this study could contribute to the development of accurate wearable-based activity monitoring solutions for PwPD, improving clinical decision-making and patient outcomes based on patient activity levels.

Asunto(s)

Enfermedad de Parkinson , Dispositivos Electrónicos Vestibles , Humanos , Enfermedad de Parkinson/diagnóstico , Actividades Cotidianas , Redes Neurales de la Computación , Movimiento (Física)

17.

Examining the role of artificial intelligence to advance knowledge and address barriers to research in eating disorders.

Norris, Mark L; Obeid, Nicole; El-Emam, Khaled.

Int J Eat Disord ; 57(6): 1357-1368, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38597344

RESUMEN

OBJECTIVE: To provide a brief overview of artificial intelligence (AI) application within the field of eating disorders (EDs) and propose focused solutions for research. METHOD: An overview and summary of AI application pertinent to EDs with focus on AI's ability to address issues relating to data sharing and pooling (and associated privacy concerns), data augmentation, as well as bias within datasets is provided. RESULTS: In addition to clinical applications, AI can utilize useful tools to help combat commonly encountered challenges in ED research, including issues relating to low prevalence of specific subpopulations of patients, small overall sample sizes, and bias within datasets. DISCUSSION: There is tremendous potential to embed and utilize various facets of artificial intelligence (AI) to help improve our understanding of EDs and further evaluate and investigate questions that ultimately seek to improve outcomes. Beyond the technology, issues relating to regulation of AI, establishing ethical guidelines for its application, and the trust of providers and patients are all needed for ultimate adoption and acceptance into ED practice. PUBLIC SIGNIFICANCE: Artificial intelligence (AI) offers a promise of significant potential within the realm of eating disorders (EDs) and encompasses a broad set of techniques that offer utility in various facets of ED research and by extension delivery of clinical care. Beyond the technology, issues relating to regulation, establishing ethical guidelines for application, and the trust of providers and patients are needed for the ultimate adoption and acceptance of AI into ED practice.

Asunto(s)

Inteligencia Artificial , Trastornos de Alimentación y de la Ingestión de Alimentos , Humanos , Trastornos de Alimentación y de la Ingestión de Alimentos/terapia , Investigación Biomédica

18.

Machine Learning Data Augmentation Strategy for Electron Energy Loss Spectroscopy: Generative Adversarial Networks.

Del-Pozo-Bueno, Daniel; Kepaptsoglou, Demie; Ramasse, Quentin M; Peiró, Francesca; Estradé, Sònia.

Microsc Microanal ; 30(2): 278-293, 2024 Apr 29.

Artículo en Inglés | MEDLINE | ID: mdl-38684097

RESUMEN

Recent advances in machine learning (ML) have highlighted a novel challenge concerning the quality and quantity of data required to effectively train algorithms in supervised ML procedures. This article introduces a data augmentation (DA) strategy for electron energy loss spectroscopy (EELS) data, employing generative adversarial networks (GANs). We present an innovative approach, called the data augmentation generative adversarial network (DAG), which facilitates data generation from a very limited number of spectra, around 100. Throughout this study, we explore the optimal configuration for GANs to produce realistic spectra. Notably, our DAG generates realistic spectra, and the spectra produced by the generator are successfully used in real-world applications to train classifiers based on artificial neural networks (ANNs) and support vector machines (SVMs) that have been successful in classifying experimental EEL spectra.

19.

Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings?

Scroggins, Jihye Kim; Topaz, Maxim; Song, Jiyoun; Zolnoori, Maryam.

J Nurs Scholarsh ; 2024 Jul 03.

Artículo en Inglés | MEDLINE | ID: mdl-38961517

RESUMEN

BACKGROUND: Identifying health problems in audio-recorded patient-nurse communication is important to improve outcomes in home healthcare patients who have complex conditions with increased risks of hospital utilization. Training machine learning classifiers for identifying problems requires resource-intensive human annotation. OBJECTIVE: To generate synthetic patient-nurse communication and to automatically annotate for common health problems encountered in home healthcare settings using GPT-4. We also examined whether augmenting real-world patient-nurse communication with synthetic data can improve the performance of machine learning to identify health problems. DESIGN: Secondary data analysis of patient-nurse verbal communication data in home healthcare settings. METHODS: The data were collected from one of the largest home healthcare organizations in the United States. We used 23 audio recordings of patient-nurse communications from 15 patients. The audio recordings were transcribed verbatim and manually annotated for health problems (e.g., circulation, skin, pain) indicated in the Omaha System Classification scheme. Synthetic data of patient-nurse communication were generated using the in-context learning prompting method, enhanced by chain-of-thought prompting to improve the automatic annotation performance. Machine learning classifiers were applied to three training datasets: real-world communication, synthetic communication, and real-world communication augmented by synthetic communication. RESULTS: Average F1 scores improved from 0.62 to 0.63 after training data were augmented with synthetic communication. The largest increase was observed using the XGBoost classifier where F1 scores improved from 0.61 to 0.64 (about 5% improvement). When trained solely on either real-world communication or synthetic communication, the classifiers showed comparable F1 scores of 0.62-0.61, respectively. CONCLUSION: Integrating synthetic data improves machine learning classifiers' ability to identify health problems in home healthcare, with performance comparable to training on real-world data alone, highlighting the potential of synthetic data in healthcare analytics. CLINICAL RELEVANCE: This study demonstrates the clinical relevance of leveraging synthetic patient-nurse communication data to enhance machine learning classifier performances to identify health problems in home healthcare settings, which will contribute to more accurate and efficient problem identification and detection of home healthcare patients with complex health conditions.

20.

Ethereum Phishing Scam Detection Based on Data Augmentation Method and Hybrid Graph Neural Network Model.

Chen, Zhen; Liu, Sheng-Zheng; Huang, Jia; Xiu, Yu-Han; Zhang, Hao; Long, Hai-Xia.

Sensors (Basel) ; 24(12)2024 Jun 20.

Artículo en Inglés | MEDLINE | ID: mdl-38931803

RESUMEN

The rapid advancement of blockchain technology has fueled the prosperity of the cryptocurrency market. Unfortunately, it has also facilitated certain criminal activities, particularly the increasing issue of phishing scams on blockchain platforms such as Ethereum. Consequently, developing an efficient phishing detection system is critical for ensuring the security and reliability of cryptocurrency transactions. However, existing methods have shortcomings in dealing with sample imbalance and effective feature extraction. To address these issues, this study proposes an Ethereum phishing scam detection method based on DA-HGNN (Data Augmentation Method and Hybrid Graph Neural Network Model), validated by real Ethereum datasets to prove its effectiveness. Initially, basic node features consisting of 11 attributes were designed. This study applied a sliding window sampling method based on node transactions for data augmentation. Since phishing nodes often initiate numerous transactions, the augmented samples tended to balance. Subsequently, the Temporal Features Extraction Module employed Conv1D (One-Dimensional Convolutional neural network) and GRU-MHA (GRU-Multi-Head Attention) models to uncover intrinsic relationships between features from the time sequences and to mine adequate local features, culminating in the extraction of temporal features. The GAE (Graph Autoencoder) concept was then leveraged, with SAGEConv (Graph SAGE Convolution) as the encoder. In the SAGEConv reconstruction module, by reconstructing the relationships between transaction graph nodes, the structural features of the nodes were learned, obtaining reconstructed node embedding representations. Ultimately, phishing fraud nodes were further identified by integrating temporal features, basic features, and embedding representations. A real Ethereum dataset was collected for evaluation, and the DA-HGNN model achieved an AUC-ROC (Area Under the Receiver Operating Characteristic Curve) of 0.994, a Recall of 0.995, and an F1-score of 0.994, outperforming existing methods and baseline models.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda