Búsqueda | Portal de Búsqueda de la BVS

1.

Breaking the barriers of data scarcity in drug-target affinity prediction.

Pei, Qizhi; Wu, Lijun; Zhu, Jinhua; Xia, Yingce; Xie, Shufang; Qin, Tao; Liu, Haiguang; Liu, Tie-Yan; Yan, Rui.

Brief Bioinform ; 24(6)2023 09 22.

Artículo en Inglés | MEDLINE | ID: mdl-37903413

RESUMEN

Accurate prediction of drug-target affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the Semi-Supervised Multi-task training (SSM) framework for DTA prediction, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes.

Asunto(s)

Benchmarking , Descubrimiento de Drogas , Sistemas de Liberación de Medicamentos

2.

GANSamples-ac4C: Enhancing ac4C site prediction via generative adversarial networks and transfer learning.

Li, Fei; Zhang, Jiale; Li, Kewei; Peng, Yu; Zhang, Haotian; Xu, Yiping; Yu, Yue; Zhang, Yuteng; Liu, Zewen; Wang, Ying; Huang, Lan; Zhou, Fengfeng.

Anal Biochem ; 689: 115495, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38431142

RESUMEN

RNA modification, N4-acetylcytidine (ac4C), is enzymatically catalyzed by N-acetyltransferase 10 (NAT10) and plays an essential role across tRNA, rRNA, and mRNA. It influences various cellular functions, including mRNA stability and rRNA biosynthesis. Wet-lab detection of ac4C modification sites is highly resource-intensive and costly. Therefore, various machine learning and deep learning techniques have been employed for computational detection of ac4C modification sites. The known ac4C modification sites are limited for training an accurate and stable prediction model. This study introduces GANSamples-ac4C, a novel framework that synergizes transfer learning and generative adversarial network (GAN) to generate synthetic RNA sequences to train a better ac4C modification site prediction model. Comparative analysis reveals that GANSamples-ac4C outperforms existing state-of-the-art methods in identifying ac4C sites. Moreover, our result underscores the potential of synthetic data in mitigating the issue of data scarcity for biological sequence prediction tasks. Another major advantage of GANSamples-ac4C is its interpretable decision logic. Multi-faceted interpretability analyses detect key regions in the ac4C sequences influencing the discriminating decision between positive and negative samples, a pronounced enrichment of G in this region, and ac4C-associated motifs. These findings may offer novel insights for ac4C research. The GANSamples-ac4C framework and its source code are publicly accessible at http://www.healthinformaticslab.org/supp/.

Asunto(s)

Citidina/análogos & derivados , Aprendizaje Automático , ARN , Estabilidad del ARN

3.

Understanding Rejection Mechanisms of Trace Organic Contaminants by Polyamide Membranes via Data-Knowledge Codriven Machine Learning.

Wang, Hejia; Zeng, Jin; Dai, Ruobin; Wang, Zhiwei.

Environ Sci Technol ; 58(13): 5878-5888, 2024 Apr 02.

Artículo en Inglés | MEDLINE | ID: mdl-38498471

RESUMEN

Data-driven machine learning (ML) provides a promising approach to understanding and predicting the rejection of trace organic contaminants (TrOCs) by polyamide (PA). However, various confounding variables, coupled with data scarcity, restrict the direct application of data-driven ML. In this study, we developed a data-knowledge codriven ML model via domain-knowledge embedding and explored its application in comprehending TrOC rejection by PA membranes. Domain-knowledge embedding enhanced both the predictive performance and the interpretability of the ML model. The contribution of key mechanisms, including size exclusion, charge effect, hydrophobic interaction, etc., that dominate the rejections of the three TrOC categories (neutral hydrophilic, neutral hydrophobic, and charged TrOCs) was quantified. Logâ¯D and molecular charge emerge as key factors contributing to the discernible variations in the rejection among the three TrOC categories. Furthermore, we quantitatively compared the TrOC rejection mechanisms between nanofiltration (NF) and reverse osmosis (RO) PA membranes. The charge effect and hydrophobic interactions possessed higher weights for NF to reject TrOCs, while the size exclusion in RO played a more important role. This study demonstrated the effectiveness of the data-knowledge codriven ML method in understanding TrOC rejection by PA membranes, providing a methodology to formulate a strategy for targeted TrOC removal.

Asunto(s)

Nylons , Purificación del Agua , Ósmosis , Purificación del Agua/métodos , Membranas Artificiales , Filtración

4.

Burden assessment of antimicrobial use and resistance in livestock in data-scarce contexts.

Afonso, J S; Babo Martins, S; Fastl, C; Chaters, G; Hoza, A S; Shirima, G; Nyasebwa, O M; Rushton, J.

Rev Sci Tech ; 43: 168-176, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39222100

RESUMEN

Misuse and overuse of antimicrobials in livestock production are identified as drivers for antimicrobial resistance (AMR). To improve decision-making concerning livestock health, it is important to understand the impact of AMR in livestock and aquaculture, within and beyond farm level, as well as expenditure on antimicrobial use (AMU). Such understanding provides grounds for systematic disease prioritisation and establishes a baseline for understanding the value of different strategies to mitigate animal health problems and for the monitoring and evaluation of the impact of those strategies. Yet limited data availability and quality surrounding AMU and AMR create barriers to furthering the knowledge of such impact. These data constraints are also more prevalent in contexts that lack the necessary resources to develop and maintain systematic and centralised data collection and collation systems. Even in regions with robust AMU and AMR monitoring systems in place, data limitations remain, such that the expenditure on antimicrobials and impacts of AMR remain unclear. Additionally, the current research funding strategies have been less focused on primary data collection, adding further barriers to filling the data void and reducing the global AMU/AMR knowledge gap. To work around the data scarcity and leverage previous and ongoing research efforts, it is vital to gain comprehensive knowledge of the people, projects and research consortia dedicated to the topic of AMU/AMR.

Les utilisations incorrecte et excessive d'agents antimicrobiens dans la production animale figurent parmi les facteurs connus de développement de résistances aux agents antimicrobiens (RAM). Pour améliorer la prise de décision relative à la santé des cheptels, il est essentiel de comprendre l'impact de la RAM chez les animaux d'élevage terrestres et aquatiques, aussi bien au niveau des élevages qu'au-delà, et de pouvoir quantifier les dépenses consacrées à l'utilisation d'agents antimicrobiens (UAM). Cette compréhension apporte les éléments d'information pour la priorisation systématique des maladies et établit un cadre de référence pour comprendre la valeur respective des différentes stratégies d'atténuation des problèmes de santé animale et pour assurer le suivi et l'évaluation d'impact de ces stratégies. Cependant, la disponibilité et la qualité limitées des données relatives à l'UAM et à la RAM font obstacle à une connaissance plus poussée de cet impact. Ces contraintes liées aux données sont plus répandues dans les contextes dépourvus des ressources nécessaires pour élaborer et entretenir des systèmes de collecte de données systématiques et centralisés. MÃªme dans les régions où des systèmes robustes de suivi de l'UAM et de la RAM sont en place, le problème de l'insuffisance de données reste posé de sorte que la réalité des coûts induits par les agents antimicrobiens et l'impact de la RAM demeurent incertains. De plus, les stratégies actuelles de financement de la recherche ont été moins axées sur la collecte de données primaires, ce qui ajoute des obstacles supplémentaires pour l'obtention des données manquantes et compromet les efforts visant à réduire les écarts de connaissances sur l'UAM et la RAM à l'échelle mondiale. Afin de remédier à la pénurie de données et de mettre à profit les recherches antérieures et en cours, il est indispensable de savoir quels sont les acteurs, les projets et les consortiums de recherche qui travaillent sur l'UAM et la RAM.

El uso incorrecto y excesivo de antimicrobianos en la producción ganadera se considera un impulsor de la resistencia a los antimicrobianos (RAM). Para mejorar la toma de decisiones relativas a la sanidad del ganado, es importante comprender el impacto de la RAM en la ganadería y la acuicultura, a nivel de las granjas y más allá, así como el coste con el uso de antimicrobianos (UAM). Tal comprensión permite una priorización sistemática de enfermedades y establece una línea base para comprender el valor de las distintas estrategias destinadas a mitigar los problemas de sanidad animal, así como para supervisar y evaluar el impacto de esas estrategias. Sin embargo, la limitada disponibilidad y calidad de los datos en torno al UAM y a la RAM crean barreras que impiden ampliar la comprensión de dicho impacto. Estas limitaciones de datos también son más frecuentes en contextos que carecen de los recursos necesarios para desarrollar y mantener sistemas sistemáticos y centralizados de recopilación y cotejo de datos. Incluso en las regiones que cuentan con sistemas sólidos de seguimiento del UAM y la RAM, los datos siguen siendo limitados, de modo que los costes con antimicrobianos y las repercusiones de la resistencia a estos siguen sin estar claros. Además, las actuales estrategias de financiación de la investigación se han centrado menos en la recopilación de datos primarios, lo que añade más obstáculos a la hora de llenar el vacío de datos y reducir la brecha mundial de conocimientos sobre el UAM y la RAM. Para superar la escasez de datos y aprovechar las iniciativas de investigación previas y en curso, es fundamental adquirir un conocimiento detallado de las personas, los proyectos y los consorcios de investigación dedicados al tema del uso de antimicrobianos y la resistencia a estos.

Asunto(s)

Ganado , Animales , Farmacorresistencia Bacteriana , Crianza de Animales Domésticos/métodos , Antiinfecciosos/uso terapéutico , Antibacterianos

5.

Towards the Improvement of Soil Salinity Mapping in a Data-Scarce Context Using Sentinel-2 Images in Machine-Learning Models.

Sirpa-Poma, J W; Satgé, F; Resongles, E; Pillco-Zolá, R; Molina-Carpio, J; Flores Colque, M G; Ormachea, M; Pacheco Mollinedo, P; Bonnet, M-P.

Sensors (Basel) ; 23(23)2023 Nov 22.

Artículo en Inglés | MEDLINE | ID: mdl-38067701

RESUMEN

Several recent studies have evidenced the relevance of machine-learning for soil salinity mapping using Sentinel-2 reflectance as input data and field soil salinity measurement (i.e., Electrical Conductivity-EC) as the target. As soil EC monitoring is costly and time consuming, most learning databases used for training/validation rely on a limited number of soil samples, which can affect the model consistency. Based on the low soil salinity variation at the Sentinel-2 pixel resolution, this study proposes to increase the learning database's number of observations by assigning the EC value obtained on the sampled pixel to the eight neighboring pixels. The method allowed extending the original learning database made up of 97 field EC measurements (OD) to an enhanced learning database made up of 691 observations (ED). Two classification machine-learning models (i.e., Random Forest-RF and Support Vector Machine-SVM) were trained with both OD and ED to assess the efficiency of the proposed method by comparing the models' outcomes with EC observations not used in the models´ training. The use of ED led to a significant increase in both models' consistency with the overall accuracy of the RF (SVM) model increasing from 0.25 (0.26) when using the OD to 0.77 (0.55) when using ED. This corresponds to an improvement of approximately 208% and 111%, respectively. Besides the improved accuracy reached with the ED database, the results showed that the RF model provided better soil salinity estimations than the SVM model and that feature selection (i.e., Variance Inflation Factor-VIF and/or Genetic Algorithm-GA) increase both models´ reliability, with GA being the most efficient. This study highlights the potential of machine-learning and Sentinel-2 image combination for soil salinity monitoring in a data-scarce context, and shows the importance of both model and features selection for an optimum machine-learning set-up.

6.

Bayesian Convolutional Neural Networks in Medical Imaging Classification: A Promising Solution for Deep Learning Limits in Data Scarcity Scenarios.

Bargagna, Filippo; De Santi, Lisa Anita; Martini, Nicola; Genovesi, Dario; Favilli, Brunella; Vergaro, Giuseppe; Emdin, Michele; Giorgetti, Assuero; Positano, Vincenzo; Santarelli, Maria Filomena.

J Digit Imaging ; 36(6): 2567-2577, 2023 12.

Artículo en Inglés | MEDLINE | ID: mdl-37787869

RESUMEN

Deep neural networks (DNNs) have already impacted the field of medicine in data analysis, classification, and image processing. Unfortunately, their performance is drastically reduced when datasets are scarce in nature (e.g., rare diseases or early-research data). In such scenarios, DNNs display poor capacity for generalization and often lead to highly biased estimates and silent failures. Moreover, deterministic systems cannot provide epistemic uncertainty, a key component to asserting the model's reliability. In this work, we developed a probabilistic system for classification as a framework for addressing the aforementioned criticalities. Specifically, we implemented a Bayesian convolutional neural network (BCNN) for the classification of cardiac amyloidosis (CA) subtypes. We prepared four different CNNs: base-deterministic, dropout-deterministic, dropout-Bayesian, and Bayesian. We then trained them on a dataset of 1107 PET images from 47 CA and control patients (data scarcity scenario). The Bayesian model achieved performances (78.28 (1.99) % test accuracy) comparable to the base-deterministic, dropout-deterministic, and dropout-Bayesian ones, while showing strongly increased "Out of Distribution" input detection (validation-test accuracy mismatch reduction). Additionally, both the dropout-Bayesian and the Bayesian models enriched the classification through confidence estimates, while reducing the criticalities of the dropout-deterministic and base-deterministic approaches. This in turn increased the model's reliability, also providing much needed insights into the network's estimates. The obtained results suggest that a Bayesian CNN can be a promising solution for addressing the challenges posed by data scarcity in medical imaging classification tasks.

Asunto(s)

Aprendizaje Profundo , Humanos , Reproducibilidad de los Resultados , Teorema de Bayes , Redes Neurales de la Computación , Diagnóstico por Imagen

7.

Comparing critical source areas for the sediment and nutrients of calibrated and uncalibrated models in a plateau watershed in southwest China.

Chen, Meijun; Janssen, Annette B G; de Klein, Jeroen J M; Du, Xinzhong; Lei, Qiuliang; Li, Ying; Zhang, Tianpeng; Pei, Wei; Kroeze, Carolien; Liu, Hongbin.

J Environ Manage ; 326(Pt B): 116712, 2023 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-36402022

RESUMEN

Controlling non-point source pollution is often difficult and costly. Therefore, focusing on areas that contribute the most, so-called critical source areas (CSAs), can have economic and ecological benefits. CSAs are often determined using a modelling approach, yet it has proved difficult to calibrate the models in regions with limited data availability. Since identifying CSAs is based on the relative contributions of sub-basins to the total load, it has been suggested that uncalibrated models could be used to identify CSAs to overcome data scarcity issues. Here, we use the SWAT model to study the extent to which an uncalibrated model can be applied to determine CSAs. We classify and rank sub-basins to identify CSAs for sediment, total nitrogen (TN), and total phosphorus (TP) in the Fengyu River Watershed (China) with and without model calibration. The results show high similarity (81%-93%) between the identified sediment and TP CSA number and locations before and after calibration both on the yearly and seasonal scale. For TN alone, the results show moderate similarity on the yearly scale (73%). This may be because, in our study area, TN is determined more by groundwater flow after calibration than by surface water flow. We conclude that CSA identification with the uncalibrated model for TP is always good because its CSA number and locations changed least, and for sediment, it is generally satisfactory. The use of the uncalibrated model for TN is acceptable, as its CSA locations did not change after calibration; however, the TN CSA number changed by over 60% compared to the figures before calibration on both yearly and seasonal scales. Therefore, we advise using an uncalibrated model to identify CSAs for TN only if water yield composition changes are expected to be limited. This study shows that CSAs can be identified based on relative loading estimates with uncalibrated models in data-deficient regions.

Asunto(s)

Contaminación Difusa , Contaminantes Químicos del Agua , Contaminantes Químicos del Agua/análisis , Ríos , Fósforo/análisis , Nitrógeno/análisis , China , Nutrientes , Agua , Monitoreo del Ambiente

8.

Common and Founder Mutations for Monogenic Traits in Sub-Saharan African Populations.

Krause, Amanda; Seymour, Heather; Ramsay, Michèle.

Annu Rev Genomics Hum Genet ; 19: 149-175, 2018 08 31.

Artículo en Inglés | MEDLINE | ID: mdl-30169122

RESUMEN

This review highlights molecular genetic studies of monogenic traits where common pathogenic mutations occur in black families from sub-Saharan Africa. Examples of founder mutations have been identified for oculocutaneous albinism, cystic fibrosis, Fanconi anemia, and Gaucher disease. Although there are few studies from Africa, some of the mutations traverse populations across the continent, and they are almost all different from the common mutations observed in non-African populations. Myotonic dystrophy is curiously absent among Africans, and nonsyndromic deafness does not arise from mutations in GJB2 and GJB7. Locus heterogeneity is present for Huntington disease, with two common triplet expansion loci in Africa, HTT and JPH3. These findings have important clinical consequences for diagnosis, treatment, and genetic counseling in affected families. We currently have just a glimpse of the molecular etiology of monogenic diseases in sub-Saharan Africa, a proverbial "ears of the hippo" situation.

Asunto(s)

Población Negra/genética , Efecto Fundador , Enfermedades Genéticas Congénitas/genética , Mutación , África del Sur del Sahara , Heterogeneidad Genética , Humanos

9.

Transfer learning in deep neural network-based receiver coil sensitivity map estimation.

Arshad, Madiha; Qureshi, Mahmood; Inam, Omair; Omer, Hammad.

MAGMA ; 34(5): 717-728, 2021 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-33772694

RESUMEN

INTRODUCTION: The success of parallel Magnetic Resonance Imaging algorithms like SENSitivity Encoding (SENSE) depends on an accurate estimation of the receiver coil sensitivity maps. Deep learning-based receiver coil sensitivity map estimation depends upon the size of training dataset and generalization capabilities of the trained neural network. When there is a mismatch between the training and testing datasets, retraining of the neural networks is required from a scratch which is costly and time consuming. MATERIALS AND METHODS: A transfer learning approach, i.e., end-to-end fine-tuning is proposed to address the data scarcity and generalization problems of deep learning-based receiver coil sensitivity map estimation. First, generalization capabilities of a pre-trained U-Net (initially trained on 1.5T receiver coil sensitivity maps) are thoroughly assessed for 3T receiver coil sensitivity map estimation. Later, end-to-end fine-tuning is performed on the pre-trained U-Net to estimate the 3T receiver coil sensitivity maps. RESULT AND CONCLUSION: Peak Signal-to-Noise Ratio, Root Mean Square Error and central line profiles (of the SENSE reconstructed images) show a successful SENSE reconstruction by utilizing the receiver coil sensitivity maps estimated by the proposed method.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Algoritmos , Aprendizaje Automático , Imagen por Resonancia Magnética , Relación Señal-Ruido

10.

Fuzzy-based assessment of groundwater intrinsic vulnerability of a volcanic aquifer in the Chilean Andean Valley.

Duhalde, Denisse J; Arumí, José L; Oyarzún, Ricardo A; Rivera, Diego A.

Environ Monit Assess ; 190(7): 390, 2018 Jun 11.

Artículo en Inglés | MEDLINE | ID: mdl-29892906

RESUMEN

A fuzzy logic approach has been proposed to face the uncertainty caused by sparse data in the assessment of the intrinsic vulnerability of a groundwater system with parametric methods in Las Trancas Valley, Andean Mountain, south-central Chile, a popular touristic place in Chile, but lacking of a centralized drinking and sewage water public systems; this situation is a potentially source of groundwater pollution. Based on DRASTIC, GOD, and EKv and the expert knowledge of the study area, the Mamdani fuzzy approach was generated and the spatial data were processed by ArcGIS. The groundwater system exhibited areas with high, medium, and low intrinsic vulnerability indices. The fuzzy approach results were compared with traditional methods results, which, in general, have shown a good spatial agreement even though significant changes were also identified in the spatial distribution of the indices. The Mamdani logic approach has shown to be a useful and practical tool to assess the intrinsic vulnerability of an aquifer under sparse data conditions.

Asunto(s)

Monitoreo del Ambiente/métodos , Agua Subterránea/análisis , Abastecimiento de Agua/estadística & datos numéricos , Chile , Ambiente , Lógica Difusa , Contaminación del Agua/análisis , Contaminación del Agua/estadística & datos numéricos

11.

Generative Artificial Intelligence Enhancements for Reducing Image-based Training Data Requirements.

Chen, Dake; Han, Ying; Duncan, Jacque; Jia, Lin; Shan, Jing.

Ophthalmol Sci ; 4(5): 100531, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39071920

RESUMEN

Objective: Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data. Design: Computational study with open-source data. Subjects: The data were obtained from 6 open-source datasets comprising patients aged 40-80 years in Singapore, China, India, and Spain. Methods: The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy. Main Outcome Measures: Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance. Results: Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models. Conclusions: Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields. Financial Disclosures: The authors have no proprietary or commercial interest in any materials discussed in this article.

12.

Fusion of biomedical imaging studies for increased sample size and diversity: a case study of brain MRI.

Aiskovich, Matias; Castro, Eduardo; Reinen, Jenna M; Fadnavis, Shreyas; Mehta, Anushree; Li, Hongyang; Dhurandhar, Amit; Cecchi, Guillermo A; Polosecki, Pablo.

Front Radiol ; 4: 1283392, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38645773

RESUMEN

Data collection, curation, and cleaning constitute a crucial phase in Machine Learning (ML) projects. In biomedical ML, it is often desirable to leverage multiple datasets to increase sample size and diversity, but this poses unique challenges, which arise from heterogeneity in study design, data descriptors, file system organization, and metadata. In this study, we present an approach to the integration of multiple brain MRI datasets with a focus on homogenization of their organization and preprocessing for ML. We use our own fusion example (approximately 84,000 images from 54,000 subjects, 12 studies, and 88 individual scanners) to illustrate and discuss the issues faced by study fusion efforts, and we examine key decisions necessary during dataset homogenization, presenting in detail a database structure flexible enough to accommodate multiple observational MRI datasets. We believe our approach can provide a basis for future similarly-minded biomedical ML projects.

13.

Automatic Acne Severity Grading with a Small and Imbalanced Data Set of Low-Resolution Images.

Bernhard, Rémi; Bletterer, Arnaud; Le Caro, Maëlle; García Álvarez, Estrella; Kostov, Belchin; Herrera Egea, Diego.

Dermatol Ther (Heidelb) ; 2024 Oct 08.

Artículo en Inglés | MEDLINE | ID: mdl-39379778

RESUMEN

INTRODUCTION: Developing automatic acne vulgaris grading systems based on machine learning is an expensive endeavor in terms of data acquisition. A machine learning practitioner will need to gather high-resolution pictures from a considerable number of different patients, with a well-balanced distribution between acne severity grades and potentially very tedious labeling. We developed a deep learning model to grade acne severity with respect to the Investigator's Global Assessment (IGA) scale that can be trained on low-resolution images, with pictures from a small number of different patients, a strongly imbalanced severity grade distribution and minimal labeling. METHODS: A total of 1374 triplets of images (frontal and lateral views) from 391 different patients suffering from acne labeled with the IGA severity grade by an expert dermatologist were used to train and validate a deep learning model that predicts the IGA severity grade. RESULTS: On the test set we obtained 66.67% accuracy with an equivalent performance for all grades despite the highly imbalanced severity grade distribution of our database. Importantly, we obtained performance on par with more tedious methods in terms of data acquisition which have the same simple labeling as ours but require either a more balanced severity grade distribution or large numbers of high-resolution images. CONCLUSIONS: Our deep learning model demonstrated promising accuracy despite the limited data set on which it was trained, indicating its potential for further development both as an assistance tool for medical practitioners and as a way to provide patients with an immediately available and standardized acne grading tool. TRIAL REGISTRATION: chinadrugtrials.org.cn identifier CTR20211314.

14.

A multi-module algorithm for heartbeat classification based on unsupervised learning and adaptive feature transfer.

Wang, Yanan; Hu, Shuaicong; Liu, Jian; Zhong, Gaoyan; Yang, Cuiwei.

Comput Biol Med ; 170: 108072, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38301518

RESUMEN

The scarcity of annotated data is a common issue in the realm of heartbeat classification based on deep learning. Transfer learning (TL) has emerged as an effective strategy for addressing this issue. However, current TL techniques in this realm overlook the probability distribution differences between the source domain (SD) and target domain (TD) databases. The motivation of this paper is to address the challenge of labeled data scarcity at the model level while exploring an effective method to eliminate domain discrepancy between SD and TD databases, especially when SD and TD are derived from inconsistent tasks. This study proposes a multi-module heartbeat classification algorithm. Initially, unsupervised feature extractors are designed to extract rich features from unlabeled SD and TD data. Subsequently, a novel adaptive transfer method is proposed to effectively eliminate domain discrepancy between features of SD for pre-training (PTF-SD) and features of TD for fine-tuning (FTF-TD). Finally, the adapted PTF-SD is employed to pre-train a designed classifier, and FTF-TD is used for classifier fine-tuning, with the objective of evaluating the algorithm's performance on the TD task. In our experiments, MNIST-DB serves as the SD database for handwritten digit image classification task, MIT-DB as the TD database for heartbeat classification task. The overall accuracy of classifying heartbeats into normal heartbeats, supraventricular ectopic beats (SVEBs), and ventricular ectopic beats (VEBs) reaches 96.7 %. Specifically, the sensitivity (Sen), positive predictive value (PPV), and F1 score for SVEBs are 0.802, 0.701, and 0.748, respectively. For VEBs, Sen, PPV, and F1 score are 0.976, 0.840, and 0.903, respectively. The results indicate that the proposed multi-module algorithm effectively addresses the challenge labeled data scarcity in heartbeat classification through unsupervised learning and adaptive feature transfer methods.

Asunto(s)

Aprendizaje Automático no Supervisado , Complejos Prematuros Ventriculares , Humanos , Frecuencia Cardíaca , Electrocardiografía/métodos , Procesamiento de Señales Asistido por Computador , Algoritmos

15.

Overcoming data scarcity in radiomics/radiogenomics using synthetic radiomic features.

Ahmadian, Milad; Bodalal, Zuhir; van der Hulst, Hedda J; Vens, Conchita; Karssemakers, Luc H E; Bogveradze, Nino; Castagnoli, Francesca; Landolfi, Federica; Hong, Eun Kyoung; Gennaro, Nicolo; Pizzi, Andrea Delli; Beets-Tan, Regina G H; van den Brekel, Michiel W M; Castelijns, Jonas A.

Comput Biol Med ; 174: 108389, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38593640

RESUMEN

PURPOSE: To evaluate the potential of synthetic radiomic data generation in addressing data scarcity in radiomics/radiogenomics models. METHODS: This study was conducted on a retrospectively collected cohort of 386 colorectal cancer patients (n = 2570 lesions) for whom matched contrast-enhanced CT images and gene TP53 mutational status were available. The full cohort data was divided into a training cohort (n = 2055 lesions) and an independent and fixed test set (n = 515 lesions). Differently sized training sets were subsampled from the training cohort to measure the impact of sample size on model performance and assess the added value of synthetic radiomic augmentation at different sizes. Five different tabular synthetic data generation models were used to generate synthetic radiomic data based on "real-world" radiomics data extracted from this cohort. The quality and reproducibility of the generated synthetic radiomic data were assessed. Synthetic radiomics were then combined with "real-world" radiomic training data to evaluate their impact on the predictive model's performance. RESULTS: A prediction model was generated using only "real-world" radiomic data, revealing the impact of data scarcity in this particular data set through a lack of predictive performance at low training sample numbers (n = 200, 400, 1000 lesions with average AUC = 0.52, 0.53, and 0.56 respectively, compared to 0.64 when using 2055 training lesions). Synthetic tabular data generation models created reproducible synthetic radiomic data with properties highly similar to "real-world" data (for n = 1000 lesions, average Chi-square = 0.932, average basic statistical correlation = 0.844). The integration of synthetic radiomic data consistently enhanced the performance of predictive models trained with small sample size sets (AUC enhanced by 9.6%, 11.3%, and 16.7% for models trained on n_samples = 200, 400, and 1000 lesions, respectively). In contrast, synthetic data generated from randomised/noisy radiomic data failed to enhance predictive performance underlining the requirement of true signal data to do so. CONCLUSION: Synthetic radiomic data, when combined with real radiomics, could enhance the performance of predictive models. Tabular synthetic data generation might help to overcome limitations in medical AI stemming from data scarcity.

Asunto(s)

Neoplasias Colorrectales , Tomografía Computarizada por Rayos X , Humanos , Neoplasias Colorrectales/diagnóstico por imagen , Neoplasias Colorrectales/genética , Femenino , Masculino , Tomografía Computarizada por Rayos X/métodos , Estudios Retrospectivos , Persona de Mediana Edad , Anciano , Genómica , Proteína p53 Supresora de Tumor/genética , Radiómica

16.

Beyond seen faults: Zero-shot diagnosis of power circuit breakers using symptom description transfer.

Yang, Qiuyu; Zhai, Zhenlin; Lin, Yuyi; Liao, Yuxiang; Xie, Jingyi; Xue, Xue; Ruan, Jiangjun.

ISA Trans ; 2024 Sep 20.

Artículo en Inglés | MEDLINE | ID: mdl-39358098

RESUMEN

Power circuit breakers (CBs) are vital for the control and protection of power systems, yet diagnosing their faults accurately remains a challenge due to the diversity of fault types and the complexity of their structures. Traditional data-driven methods, although effective, require extensive labeled data for each fault class, limiting their applicability in real-world scenarios where many faults are unseen. This paper addresses these limitations by introducing symptom description transfer-based zero-shot fault diagnosis (SDT-ZSFD), a method that leverages zero-shot learning for fault diagnosis. Our approach constructs a fault symptom description (FSD) framework, which embeds a fault symptom layer between the feature layer and the label layer to facilitate knowledge transfer from seen to unseen fault classes. The method utilizes current and acceleration signals collected during CB operation to extract features. By applying sparse principal component analysis to these signals, we derive high-quality features that are mapped to the FSD framework, enabling effective zero-shot learning. Our method achieves a satisfactory recognition rate by accurately diagnosing unseen faults based on these symptoms. This approach not only overcomes the data scarcity problem but also holds potential for practical applications in power system maintenance. The SDT-ZSFD method offers a reliable solution for CB fault diagnosis and provides a foundation for future improvements in symptom-based zero-shot diagnostic mechanisms and algorithmic robustness.

17.

Synthesizing affective neurophysiological signals using generative models: A review paper.

Nia, Alireza F; Tang, Vanessa; Talou, Gonzalo Maso; Billinghurst, Mark.

J Neurosci Methods ; 406: 110129, 2024 06.

Artículo en Inglés | MEDLINE | ID: mdl-38614286

RESUMEN

The integration of emotional intelligence in machines is an important step in advancing human-computer interaction. This demands the development of reliable end-to-end emotion recognition systems. However, the scarcity of public affective datasets presents a challenge. In this literature review, we emphasize the use of generative models to address this issue in neurophysiological signals, particularly Electroencephalogram (EEG) and Functional Near-Infrared Spectroscopy (fNIRS). We provide a comprehensive analysis of different generative models used in the field, examining their input formulation, deployment strategies, and methodologies for evaluating the quality of synthesized data. This review serves as a comprehensive overview, offering insights into the advantages, challenges, and promising future directions in the application of generative models in emotion recognition systems. Through this review, we aim to facilitate the progression of neurophysiological data augmentation, thereby supporting the development of more efficient and reliable emotion recognition systems.

Asunto(s)

Electroencefalografía , Emociones , Espectroscopía Infrarroja Corta , Humanos , Electroencefalografía/métodos , Espectroscopía Infrarroja Corta/métodos , Emociones/fisiología , Encéfalo/fisiología , Inteligencia Emocional/fisiología , Modelos Neurológicos

18.

Estimating time-varying cholera transmission and oral cholera vaccine effectiveness in Haiti and Cameroon, 2021-2023.

Hulland, Erin N; Charpignon, Marie-Laure; El Hayek, Ghinwa Y; Zhao, Lihong; Desai, Angel N; Majumder, Maimuna S.

medRxiv ; 2024 Aug 16.

Artículo en Inglés | MEDLINE | ID: mdl-39185512

RESUMEN

In 2023, cholera affected approximately 1 million people and caused more than 5000 deaths globally, predominantly in low-income and conflict settings. In recent years, the number of new cholera outbreaks has grown rapidly. Further, ongoing cholera outbreaks have been exacerbated by conflict, climate change, and poor infrastructure, resulting in prolonged crises. As a result, the demand for treatment and intervention is quickly outpacing existing resource availability. Prior to improved water and sanitation systems, cholera, a disease primarily transmitted via contaminated water sources, also routinely ravaged high-income countries. Crumbling infrastructure and climate change are now putting new locations at risk - even in high-income countries. Thus, understanding the transmission and prevention of cholera is critical. Combating cholera requires multiple interventions, the two most common being behavioral education and water treatment. Two-dose oral cholera vaccination (OCV) is often used as a complement to these interventions. Due to limited supply, countries have recently switched to single-dose vaccines (OCV1). One challenge lies in understanding where to allocate OCV1 in a timely manner, especially in settings lacking well-resourced public health surveillance systems. As cholera occurs and propagates in such locations, timely, accurate, and openly accessible outbreak data are typically inaccessible for disease modeling and subsequent decision-making. In this study, we demonstrated the value of open-access data to rapidly estimate cholera transmission and vaccine effectiveness. Specifically, we obtained non-machine readable (NMR) epidemic curves for recent cholera outbreaks in two countries, Haiti and Cameroon, from figures published in situation and disease outbreak news reports. We used computational digitization techniques to derive weekly counts of cholera cases, resulting in nominal differences when compared against the reported cumulative case counts (i.e., a relative error rate of 5.67% in Haiti and 0.54% in Cameroon). Given these digitized time series, we leveraged EpiEstim-an open-source modeling platform-to derive rapid estimates of time-varying disease transmission via the effective reproduction number ( R t ). To compare OCV1 effectiveness in the two considered countries, we additionally used VaxEstim, a recent extension of EpiEstim that facilitates the estimation of vaccine effectiveness via the relation among three inputs: the basic reproduction number ( R 0 ), R t , and vaccine coverage. Here, with Haiti and Cameroon as case studies, we demonstrated the first implementation of VaxEstim in low-resource settings. Importantly, we are the first to use VaxEstim with digitized data rather than traditional epidemic surveillance data. In the initial phase of the outbreak, weekly rolling average estimates of R t were elevated in both countries: 2.60 in Haiti [95% credible interval: 2.42-2.79] and 1.90 in Cameroon [1.14-2.95]. These values are largely consistent with previous estimates of R 0 in Haiti, where average values have ranged from 1.06 to 3.72, and in Cameroon, where average values have ranged from 1.10 to 3.50. In both Haiti and Cameroon, this initial period of high transmission preceded a longer period during which R t oscillated around the critical threshold of 1. Our results derived from VaxEstim suggest that Haiti had higher OCV1 effectiveness than Cameroon (75.32% effective [54.00-86.39%] vs. 54.88% [18.94-84.90%]). These estimates of OCV1 effectiveness are generally aligned with those derived from field studies conducted in other countries. Thus, our case study reinforces the validity of VaxEstim as an alternative to costly, time-consuming field studies of OCV1 effectiveness. Indeed, prior work in South Sudan, Bangladesh, and the Democratic Republic of the Congo reported OCV1 effectiveness ranging from approximately 40% to 80%. This work underscores the value of combining NMR sources of outbreak case data with computational techniques and the utility of VaxEstim for rapid, inexpensive estimation of vaccine effectiveness in data-poor outbreak settings.

19.

[A region-level contrastive learning-based deep model for glomerular ultrastructure segmentation on electron microscope images].

Lin, G; Zhang, Z; Lu, Y; Geng, J; Zhou, Z; Lu, L; Cao, L.

Nan Fang Yi Ke Da Xue Xue Bao ; 43(5): 815-824, 2023 May 20.

Artículo en Zh | MEDLINE | ID: mdl-37313824

RESUMEN

OBJECTIVE: We propose a novel region- level self-supervised contrastive learning method USRegCon (ultrastructural region contrast) based on the semantic similarity of ultrastructures to improve the performance of the model for glomerular ultrastructure segmentation on electron microscope images. METHODS: USRegCon used a large amount of unlabeled data for pre- training of the model in 3 steps: (1) The model encoded and decoded the ultrastructural information in the image and adaptively divided the image into multiple regions based on the semantic similarity of the ultrastructures; (2) Based on the divided regions, the first-order grayscale region representations and deep semantic region representations of each region were extracted by region pooling operation; (3) For the first-order grayscale region representations, a grayscale loss function was proposed to minimize the grayscale difference within regions and maximize the difference between regions. For deep semantic region representations, a semantic loss function was introduced to maximize the similarity of positive region pairs and the difference of negative region pairs in the representation space. These two loss functions were jointly used for pre-training of the model. RESULTS: In the segmentation task for 3 ultrastructures of the glomerular filtration barrier based on the private dataset GlomEM, USRegCon achieved promising segmentation results for basement membrane, endothelial cells, and podocytes, with Dice coefficients of (85.69 ± 0.13)%, (74.59 ± 0.13)%, and (78.57 ± 0.16)%, respectively, demonstrating a good performance of the model superior to many existing image-level, pixel-level, and region-level self-supervised contrastive learning methods and close to the fully- supervised pre-training method based on the large- scale labeled dataset ImageNet. CONCLUSION: USRegCon facilitates the model to learn beneficial region representations from large amounts of unlabeled data to overcome the scarcity of labeled data and improves the deep model performance for glomerular ultrastructure recognition and boundary segmentation.

Asunto(s)

Enfermedades Renales , Podocitos , Humanos , Electrones , Células Endoteliales , Aprendizaje

20.

Estimation of the impact of combined sewer overflows on surface water quality in a sparsely monitored area.

Bertels, Daan; De Meester, Joke; Dirckx, Geert; Willems, Patrick.

Water Res ; 244: 120498, 2023 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-37639989

RESUMEN

Combined sewer overflows (CSOs) can have a severe negative, local impact on surface water systems. To assure good ecological surface water quality and drinking water production that meets the demands, the impact of sewer system overflows on the surrounding water bodies for current and future climate conditions needs to be assessed. Typically, integrated, detailed hydrological and hydrodynamic water quantity and quality models are used for this purpose, but often data and computational resource requirements limit their applicability. Therefore, an alternative computationally efficient, integrated water quantity and quality model of sewer systems and their receiving surface waters is proposed to assess the impact of CSOs on surface water quality in a sparsely observed area. A conceptual model approach to estimate CSO discharges is combined with an empirical model for estimating CSO pollutant concentrations based on waste water treatment plant influent observations. Both methods are compared with observations and independent results of established reference methods as to evaluate their performance. The methodology is demonstrated by modelling the current impact of CSOs on the water abstraction area of a major drinking water production centre in Flanders, Belgium. It is concluded that the proposed conceptual models achieve similar results for daily WWTP effluent and CSO frequency, whereby the accumulated CSO volume is similar to more detailed full hydrodynamic models. Further, the estimated pollutant concentrations correspond with another dataset based on high resolution sampled overflows. As a result, the proposed computational efficient method can give insights in the impact of CSOs on the water quality at a catchment level and can be used for planning monitoring campaigns or performing analyses of e.g. the current and future water availability for a data scarce areas.

Asunto(s)

Agua Potable , Contaminantes Ambientales , Calidad del Agua , Clima , Hidrodinámica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA