Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 216
Filtrar
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36892153

RESUMO

Accurate and effective drug-target interaction (DTI) prediction can greatly shorten the drug development lifecycle and reduce the cost of drug development. In the deep-learning-based paradigm for predicting DTI, robust drug and protein feature representations and their interaction features play a key role in improving the accuracy of DTI prediction. Additionally, the class imbalance problem and the overfitting problem in the drug-target dataset can also affect the prediction accuracy, and reducing the consumption of computational resources and speeding up the training process are also critical considerations. In this paper, we propose shared-weight-based MultiheadCrossAttention, a precise and concise attention mechanism that can establish the association between target and drug, making our models more accurate and faster. Then, we use the cross-attention mechanism to construct two models: MCANet and MCANet-B. In MCANet, the cross-attention mechanism is used to extract the interaction features between drugs and proteins for improving the feature representation ability of drugs and proteins, and the PolyLoss loss function is applied to alleviate the overfitting problem and the class imbalance problem in the drug-target dataset. In MCANet-B, the robustness of the model is improved by combining multiple MCANet models and prediction accuracy further increases. We train and evaluate our proposed methods on six public drug-target datasets and achieve state-of-the-art results. In comparison with other baselines, MCANet saves considerable computational resources while maintaining accuracy in the leading position; however, MCANet-B greatly improves prediction accuracy by combining multiple models while maintaining a balance between computational resource consumption and prediction accuracy.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Descoberta de Drogas/métodos , Proteínas/metabolismo , Sistemas de Liberação de Medicamentos , Domínios Proteicos
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37193676

RESUMO

Protein-deoxyribonucleic acid (DNA) interactions are important in a variety of biological processes. Accurately predicting protein-DNA binding affinity has been one of the most attractive and challenging issues in computational biology. However, the existing approaches still have much room for improvement. In this work, we propose an ensemble model for Protein-DNA Binding Affinity prediction (emPDBA), which combines six base models with one meta-model. The complexes are classified into four types based on the DNA structure (double-stranded or other forms) and the percentage of interface residues. For each type, emPDBA is trained with the sequence-based, structure-based and energy features from binding partners and complex structures. Through feature selection by the sequential forward selection method, it is found that there do exist considerable differences in the key factors contributing to intermolecular binding affinity. The complex classification is beneficial for the important feature extraction for binding affinity prediction. The performance comparison of our method with other peer ones on the independent testing dataset shows that emPDBA outperforms the state-of-the-art methods with the Pearson correlation coefficient of 0.53 and the mean absolute error of 1.11 kcal/mol. The comprehensive results demonstrate that our method has a good performance for protein-DNA binding affinity prediction. Availability and implementation: The source code is available at https://github.com/ChunhuaLiLab/emPDBA/.


Assuntos
Proteínas , Software , Proteínas/química , Biologia Computacional/métodos , DNA/genética , Ligação Proteica
3.
Proc Natl Acad Sci U S A ; 119(18): e2103302119, 2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35476520

RESUMO

Short-term forecasting of the COVID-19 pandemic is required to facilitate the planning of COVID-19 health care demand in hospitals. Here, we evaluate the performance of 12 individual models and 19 predictors to anticipate French COVID-19-related health care needs from September 7, 2020, to March 6, 2021. We then build an ensemble model by combining the individual forecasts and retrospectively test this model from March 7, 2021, to July 6, 2021. We find that the inclusion of early predictors (epidemiological, mobility, and meteorological predictors) can halve the rms error for 14-d­ahead forecasts, with epidemiological and mobility predictors contributing the most to the improvement. On average, the ensemble model is the best or second-best model, depending on the evaluation metric. Our approach facilitates the comparison and benchmarking of competing models through their integration in a coherent analytical framework, ensuring that avenues for future improvements can be identified.


Assuntos
COVID-19 , COVID-19/epidemiologia , Atenção à Saúde , França/epidemiologia , Necessidades e Demandas de Serviços de Saúde , Humanos , Pandemias/prevenção & controle , Estudos Retrospectivos
4.
BMC Bioinformatics ; 25(1): 256, 2024 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-39098908

RESUMO

BACKGROUND: Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS: In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS: Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION: Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.


Assuntos
Antioxidantes , Proteínas , Antioxidantes/química , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Análise de Ondaletas , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica
5.
BMC Bioinformatics ; 25(1): 102, 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38454333

RESUMO

BACKGROUND: Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS: In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS: The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION: Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.


Assuntos
Evolução Biológica , Peptídeos , Humanos , Reprodutibilidade dos Testes , Peptídeos/química , Antivirais/farmacologia
6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34553746

RESUMO

Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.


Assuntos
Algoritmos , Software , Humanos , Aprendizado de Máquina
7.
Ecol Appl ; 34(2): e2934, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38071693

RESUMO

Species distribution models are vital to management decisions that require understanding habitat use patterns, particularly for species of conservation concern. However, the production of distribution maps for individual species is often hampered by data scarcity, and existing species maps are rarely spatially validated due to limited occurrence data. Furthermore, community-level maps based on stacked species distribution models lack important community assemblage information (e.g., competitive exclusion) relevant to conservation. Thus, multispecies, guild, or community models are often used in conservation practice instead. To address these limitations, we aimed to generate fine-scale, spatially continuous, nationwide maps for species represented in the North American Breeding Bird Survey (BBS) between 1992 and 2019. We developed ensemble models for each species at three spatial resolutions-0.5, 2.5, and 5 km-across the conterminous United States. We also compared species richness patterns from stacked single-species models with those of 19 functional guilds developed using the same data to assess the similarity between predictions. We successfully modeled 192 bird species at 5-km resolution, 160 species at 2.5-km resolution, and 80 species at 0.5-km resolution. However, the species we could model represent only 28%-56% of species found in the conterminous US BBSs across resolutions owing to data limitations. We found that stacked maps and guild maps generally had high correlations across resolutions (median = 84%), but spatial agreement varied regionally by resolution and was most pronounced between the East and West at the 5-km resolution. The spatial differences between our stacked maps and guild maps illustrate the importance of spatial validation in conservation planning. Overall, our species maps are useful for single-species conservation and can support fine-scale decision-making across the United States and support community-level conservation when used in tandem with guild maps. However, there remain data scarcity issues for many species of conservation concern when using the BBS for single-species models.


Assuntos
Aves , Ecossistema , Animais , Estados Unidos
8.
BMC Med Res Methodol ; 24(1): 131, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849766

RESUMO

BACKGROUND: Dynamical mathematical models defined by a system of differential equations are typically not easily accessible to non-experts. However, forecasts based on these types of models can help gain insights into the mechanisms driving the process and may outcompete simpler phenomenological growth models. Here we introduce a friendly toolbox, SpatialWavePredict, to characterize and forecast the spatial wave sub-epidemic model, which captures diverse wave dynamics by aggregating multiple asynchronous growth processes and has outperformed simpler phenomenological growth models in short-term forecasts of various infectious diseases outbreaks including SARS, Ebola, and the early waves of the COVID-19 pandemic in the US. RESULTS: This tutorial-based primer introduces and illustrates a user-friendly MATLAB toolbox for fitting and forecasting time-series trajectories using an ensemble spatial wave sub-epidemic model based on ordinary differential equations. Scientists, policymakers, and students can use the toolbox to conduct real-time short-term forecasts. The five-parameter epidemic wave model in the toolbox aggregates linked overlapping sub-epidemics and captures a rich spectrum of epidemic wave dynamics, including oscillatory wave behavior and plateaus. An ensemble strategy aims to improve forecasting performance by combining the resulting top-ranked models. The toolbox provides a tutorial for forecasting time-series trajectories, including the full uncertainty distribution derived through parametric bootstrapping, which is needed to construct prediction intervals and evaluate their accuracy. Functions are available to assess forecasting performance, estimation methods, error structures in the data, and forecasting horizons. The toolbox also includes functions to quantify forecasting performance using metrics that evaluate point and distributional forecasts, including the weighted interval score. CONCLUSIONS: We have developed the first comprehensive toolbox to characterize and forecast time-series data using an ensemble spatial wave sub-epidemic wave model. As an epidemic situation or contagion occurs, the tools presented in this tutorial can facilitate policymakers to guide the implementation of containment strategies and assess the impact of control interventions. We demonstrate the functionality of the toolbox with examples, including a tutorial video, and is illustrated using daily data on the COVID-19 pandemic in the USA.


Assuntos
COVID-19 , Previsões , Humanos , COVID-19/epidemiologia , Previsões/métodos , SARS-CoV-2 , Epidemias/estatística & dados numéricos , Pandemias , Modelos Teóricos , Doença pelo Vírus Ebola/epidemiologia , Modelos Estatísticos
9.
Conserv Biol ; : e14316, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38946355

RESUMO

Assessing the extinction risk of species based on the International Union for Conservation of Nature (IUCN) Red List (RL) is key to guiding conservation policies and reducing biodiversity loss. This process is resource demanding, however, and requires continuous updating, which becomes increasingly difficult as new species are added to the RL. Automatic methods, such as comparative analyses used to predict species RL category, can be an efficient alternative to keep assessments up to date. Using amphibians as a study group, we predicted which species are more likely to change their RL category and thus should be prioritized for reassessment. We used species biological traits, environmental variables, and proxies of climate and land-use change as predictors of RL category. We produced an ensemble prediction of IUCN RL category for each species by combining 4 different model algorithms: cumulative link models, phylogenetic generalized least squares, random forests, and neural networks. By comparing RL categories with the ensemble prediction and accounting for uncertainty among model algorithms, we identified species that should be prioritized for future reassessment based on the mismatch between predicted and observed values. The most important predicting variables across models were species' range size and spatial configuration of the range, biological traits, climate change, and land-use change. We compared our proposed prioritization index and the predicted RL changes with independent IUCN RL reassessments and found high performance of both the prioritization and the predicted directionality of changes in RL categories. Ensemble modeling of RL category is a promising tool for prioritizing species for reassessment while accounting for models' uncertainty. This approach is broadly applicable to all taxa on the IUCN RL and to regional and national assessments and may improve allocation of the limited human and economic resources available to maintain an up-to-date IUCN RL.


Uso del análisis comparativo del riesgo de extinción para priorizar la reevaluación de los anfibios en la Lista Roja de la UICN Resumen El análisis del riesgo de extinción de una especie con base en la Lista Roja (LR) de la Unión Internacional para la Conservación de la Naturaleza (UICN) es clave para guiar las políticas de conservación y reducir la pérdida de la biodiversidad. Sin embargo, este proceso demanda recursos y requiere de actualizaciones continuas, lo que se complica conforme se añaden especies nuevas a la LR. Los métodos automáticos, como los análisis comparativos usados para predecir la categoría de la especie en la LR, pueden ser una alternativa eficiente para mantener actualizados los análisis. Usamos a los anfibios como grupo de estudio para predecir cuáles especies tienen mayor probabilidad de cambiar de categoría en la LR y que, por lo tanto, se debería priorizar su reevaluación. Usamos las características biológicas de la especie, las variables ambientales e indicadores climáticos y del cambio de uso de suelo como predictores de la categoría en la LR. Elaboramos una predicción de ensamble de la categoría en la LR de la UICN para cada especie mediante la combinación de cuatro algoritmos diferentes: modelos de vínculo acumulativo, menor número de cuadros filogenéticos generalizados, bosques aleatorios y redes neurales. Con la comparación entre las categorías de la LR y la predicción de ensamble y con considerar la incertidumbre entre los algoritmos identificamos especies que deberían ser prioridad para futuras reevaluaciones con base en el desfase entre los valores predichos y los observados. Las variables de predicción más importantes entre los modelos fueron el tamaño de la distribución de la especie y su configuración espacial, las características biológicas, el cambio climático y el cambio de uso de suelo. Comparamos nuestra propuesta de índice de priorización y los cambios predichos en la LR con las reevaluaciones independientes de la LR de la UICN y descubrimos un buen desempeño tanto para la priorización como para la direccionalidad predicha de los cambios en las categorías de la LR. El modelo de ensamble de la categoría de la LR esa una herramienta prometedora para priorizar la reevaluación de las especies a la vez que considera la incertidumbre del modelo. Esta estrategia puede generalizarse para aplicarse a todos los taxones de la LR de la UICN y a los análisis regionales y nacionales. También podría mejorar la asignación de los recursos humanos y económicos limitados disponibles para mantener actualizada la LR de la UICN.

10.
Methods ; 211: 23-30, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36740001

RESUMO

The enhancer is a DNA sequence that can increase the activity of promoters and thus speed up the frequency of gene transcription. The enhancer plays an essential role in activating gene expression. Currently, gene sequencing technology has been developed for 30 years from the first generation to the third generation, and a variety of biological sequence data have increased significantly every year. Due to the importance of enhancer functions, it is very expensive to identify enhancers through biochemical experiments. Therefore, we need to study new methods for the identification and classification of enhancers. Based on the K-mer principle this study proposed a feature extraction method that others have not used in convolutional neural networks. Then, we combined it with one-hot encoding to build an efficient one-dimensional convolutional neural network ensemble model for predicting enhancers and their strengths. Finally, we used five commonly used classification problem evaluation indicators to compare with the models proposed by other researchers. The model proposed in this paper has a better performance by using the same independent test dataset as other models.


Assuntos
Aprendizado Profundo , Elementos Facilitadores Genéticos , Redes Neurais de Computação , Regiões Promotoras Genéticas
11.
Oecologia ; 204(3): 589-601, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38386057

RESUMO

Procambarus clarkii is a notorious invasive species that has led to ecological concerns owing to its high viability and rapid reproduction. South Korea, a country exposed to a high risk of introduction of invasive species due to active international trade, has suffered from recent massive invasions by invasive species, necessitating the evaluation of potential areas requiring intensive monitoring. In this study, we developed two different types of species distribution models, CLIMEX and random forest, for P. clarkii using occurrence records from the United States. The potential distribution in the United States was predicted along coastal lines and inland regions located below 40°N latitude The model was then applied to evaluate the potential distribution in South Korea, and an ensemble map was constructed to identify the most vulnerable domestic regions. According to both models, the domestic potential distribution was highest in most areas located at low altitudes. In the ensemble model, most of the low-altitude western regions, the eastern coast, and some southern inland regions were predicted to be suitable for the distribution of P. clarkii, and a similar distribution pattern was predicted when the model was projected into the future climate. Through this study, it is possible to secure basic data that can be used for the early monitoring of the introduction and subsequent distribution of P. clarkii.


Assuntos
Astacoidea , Mariposas , Animais , Comércio , Mudança Climática , Internacionalidade , Espécies Introduzidas
12.
Biomed Eng Online ; 23(1): 77, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39098936

RESUMO

BACKGROUND: Timely prevention of major adverse cardiovascular events (MACEs) is imperative for reducing cardiovascular diseases-related mortality. Perivascular adipose tissue (PVAT), the adipose tissue surrounding coronary arteries, has attracted increased amounts of attention. Developing a model for predicting the incidence of MACE utilizing machine learning (ML) integrating clinical and PVAT features may facilitate targeted preventive interventions and improve patient outcomes. METHODS: From January 2017 to December 2019, we analyzed a cohort of 1077 individuals who underwent coronary CT scanning at our facility. Clinical features were collected alongside imaging features, such as coronary artery calcium (CAC) scores and perivascular adipose tissue (PVAT) characteristics. Logistic regression (LR), Framingham Risk Score, and ML algorithms were employed for MACE prediction. RESULTS: We screened seven critical features to improve the practicability of the model. MACE patients tended to be older, smokers, and hypertensive. Imaging biomarkers such as CAC scores and PVAT characteristics differed significantly between patients with and without a 3-year MACE risk in a population that did not exhibit disparities in laboratory results. The ensemble model, which leverages multiple ML algorithms, demonstrated superior predictive performance compared with the other models. Finally, the ensemble model was used for risk stratification prediction to explore its clinical application value. CONCLUSIONS: The developed ensemble model effectively predicted MACE incidence based on clinical and imaging features, highlighting the potential of ML algorithms in cardiovascular risk prediction and personalized medicine. Early identification of high-risk patients may facilitate targeted preventive interventions and improve patient outcomes.


Assuntos
Tecido Adiposo , Doenças Cardiovasculares , Aprendizado de Máquina , Humanos , Tecido Adiposo/diagnóstico por imagem , Feminino , Masculino , Pessoa de Meia-Idade , Doenças Cardiovasculares/diagnóstico por imagem , Medição de Risco , Idoso , Tomografia Computadorizada por Raios X , Fatores de Risco , Vasos Coronários/diagnóstico por imagem
13.
Environ Res ; 247: 118176, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38215922

RESUMO

With the ongoing process of industrialization, the issue of declining air quality is increasingly becoming a critical concern. Accurate prediction of the Air Quality Index (AQI), considered as an all-inclusive measure representing the extent of pollutants present in the atmosphere, is of paramount importance. This study introduces a novel methodology that combines stacking ensemble and error correction to improve AQI prediction. Additionally, the reptile search algorithm (RSA) is employed for optimizing model parameters. In this study, four distinct regional AQI data containing a collection of 34864 data samples are collected. Initially, we perform cross-validation on ten commonly used single models to obtain prediction results. Then, based on evaluation indices, five models are selected for ensemble. The results of the study show that the model proposed in this paper achieves an improvement of around 10% in terms of accuracy when compared to the conventional model. Thus, the model introduced in this study offers a more scientifically grounded approach in tackling air pollution.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Poluentes Ambientais , Poluição do Ar/análise , Poluentes Atmosféricos/análise , Algoritmos , Projetos de Pesquisa
14.
Sensors (Basel) ; 24(9)2024 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-38733032

RESUMO

Performing a minimally invasive surgery comes with a significant advantage regarding rehabilitating the patient after the operation. But it also causes difficulties, mainly for the surgeon or expert who performs the surgical intervention, since only visual information is available and they cannot use their tactile senses during keyhole surgeries. This is the case with laparoscopic hysterectomy since some organs are also difficult to distinguish based on visual information, making laparoscope-based hysterectomy challenging. In this paper, we propose a solution based on semantic segmentation, which can create pixel-accurate predictions of surgical images and differentiate the uterine arteries, ureters, and nerves. We trained three binary semantic segmentation models based on the U-Net architecture with the EfficientNet-b3 encoder; then, we developed two ensemble techniques that enhanced the segmentation performance. Our pixel-wise ensemble examines the segmentation map of the binary networks on the lowest level of pixels. The other algorithm developed is a region-based ensemble technique that takes this examination to a higher level and makes the ensemble based on every connected component detected by the binary segmentation networks. We also introduced and trained a classic multi-class semantic segmentation model as a reference and compared it to the ensemble-based approaches. We used 586 manually annotated images from 38 surgical videos for this research and published this dataset.


Assuntos
Algoritmos , Laparoscopia , Redes Neurais de Computação , Ureter , Artéria Uterina , Humanos , Laparoscopia/métodos , Feminino , Ureter/diagnóstico por imagem , Ureter/cirurgia , Artéria Uterina/cirurgia , Artéria Uterina/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Semântica , Histerectomia/métodos
15.
Sensors (Basel) ; 24(15)2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-39124045

RESUMO

Creating an effective deep learning technique for accurately diagnosing leak signals across diverse environments is crucial for integrating artificial intelligence (AI) into the power plant industry. We propose an automatic weight redistribution ensemble model based on transfer learning (TL) for detecting leaks in diverse power plant environments, overcoming the challenges of site-specific AI methods. This innovative model processes time series acoustic data collected from multiple homogeneous sensors located at different positions into three-dimensional root-mean-square (RMS) and frequency volume features, enabling accurate leak detection. Utilizing a TL-driven, two-stage learning process, we first train residual-network-based models for each domain using these preprocessed features. Subsequently, these models are retrained in an ensemble for comprehensive leak detection across domains, with control weight ratios finely adjusted through a softmax score-based approach. The experiment results demonstrate that the proposed method effectively distinguishes low-level leaks and noise compared to existing techniques, even when the data available for model training are very limited.

16.
BMC Med ; 21(1): 19, 2023 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-36647108

RESUMO

BACKGROUND: Beginning May 7, 2022, multiple nations reported an unprecedented surge in monkeypox cases. Unlike past outbreaks, differences in affected populations, transmission mode, and clinical characteristics have been noted. With the existing uncertainties of the outbreak, real-time short-term forecasting can guide and evaluate the effectiveness of public health measures. METHODS: We obtained publicly available data on confirmed weekly cases of monkeypox at the global level and for seven countries (with the highest burden of disease at the time this study was initiated) from the Our World in Data (OWID) GitHub repository and CDC website. We generated short-term forecasts of new cases of monkeypox across the study areas using an ensemble n-sub-epidemic modeling framework based on weekly cases using 10-week calibration periods. We report and assess the weekly forecasts with quantified uncertainty from the top-ranked, second-ranked, and ensemble sub-epidemic models. Overall, we conducted 324 weekly sequential 4-week ahead forecasts across the models from the week of July 28th, 2022, to the week of October 13th, 2022. RESULTS: The last 10 of 12 forecasting periods (starting the week of August 11th, 2022) show either a plateauing or declining trend of monkeypox cases for all models and areas of study. According to our latest 4-week ahead forecast from the top-ranked model, a total of 6232 (95% PI 487.8, 12,468.0) cases could be added globally from the week of 10/20/2022 to the week of 11/10/2022. At the country level, the top-ranked model predicts that the USA will report the highest cumulative number of new cases for the 4-week forecasts (median based on OWID data: 1806 (95% PI 0.0, 5544.5)). The top-ranked and weighted ensemble models outperformed all other models in short-term forecasts. CONCLUSIONS: Our top-ranked model consistently predicted a decreasing trend in monkeypox cases on the global and country-specific scale during the last ten sequential forecasting periods. Our findings reflect the potential impact of increased immunity, and behavioral modification among high-risk populations.


Assuntos
Epidemias , Mpox , Humanos , Mpox/epidemiologia , Surtos de Doenças , Previsões , Saúde Pública
17.
Mod Pathol ; 36(5): 100118, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36805793

RESUMO

Screening of lymph node metastases in colorectal cancer (CRC) can be a cumbersome task, but it is amenable to artificial intelligence (AI)-assisted diagnostic solution. Here, we propose a deep learning-based workflow for the evaluation of CRC lymph node metastases from digitized hematoxylin and eosin-stained sections. A segmentation model was trained on 100 whole-slide images (WSIs). It achieved a Matthews correlation coefficient of 0.86 (±0.154) and an acceptable Hausdorff distance of 135.59 µm (±72.14 µm), indicating a high congruence with the ground truth. For metastasis detection, 2 models (Xception and Vision Transformer) were independently trained first on a patch-based breast cancer lymph node data set and were then fine-tuned using the CRC data set. After fine-tuning, the ensemble model showed significant improvements in the F1 score (0.797-0.949; P <.00001) and the area under the receiver operating characteristic curve (0.959-0.978; P <.00001). Four independent cohorts (3 internal and 1 external) of CRC lymph nodes were used for validation in cascading segmentation and metastasis detection models. Our approach showed excellent performance, with high sensitivity (0.995, 1.0) and specificity (0.967, 1.0) in 2 validation cohorts of adenocarcinoma cases (n = 3836 slides) when comparing slide-level labels with the ground truth (pathologist reports). Similarly, an acceptable performance was achieved in a validation cohort (n = 172 slides) with mucinous and signet-ring cell histology (sensitivity, 0.872; specificity, 0.936). The patch-based classification confidence was aggregated to overlay the potential metastatic regions within each lymph node slide for visualization. We also applied our method to a consecutive case series of lymph nodes obtained over the past 6 months at our institution (n = 217 slides). The overlays of prediction within lymph node regions matched 100% when compared with a microscope evaluation by an expert pathologist. Our results provide the basis for a computer-assisted diagnostic tool for easy and efficient lymph node screening in patients with CRC.


Assuntos
Inteligência Artificial , Neoplasias Colorretais , Humanos , Metástase Linfática/patologia , Diagnóstico por Computador , Linfonodos/patologia , Aprendizado de Máquina , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/patologia
18.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33388743

RESUMO

MOTIVATION: mRNA location corresponds to the location of protein translation and contributes to precise spatial and temporal management of the protein function. However, current assignment of subcellular localization of eukaryotic mRNA reveals important limitations: (1) turning multiple classifications into multiple dichotomies makes the training process tedious; (2) the majority of the models trained by classical algorithm are based on the extraction of single sequence information; (3) the existing state-of-the-art models have not reached an ideal level in terms of prediction and generalization ability. To achieve better assignment of subcellular localization of eukaryotic mRNA, a better and more comprehensive model must be developed. RESULTS: In this paper, SubLocEP is proposed as a two-layer integrated prediction model for accurate prediction of the location of sequence samples. Unlike the existing models based on limited features, SubLocEP comprehensively considers additional feature attributes and is combined with LightGBM to generated single feature classifiers. The initial integration model (single-layer model) is generated according to the categories of a feature. Subsequently, two single-layer integration models are weighted (sequence-based: physicochemical properties = 3:2) to produce the final two-layer model. The performance of SubLocEP on independent datasets is sufficient to indicate that SubLocEP is an accurate and stable prediction model with strong generalization ability. Additionally, an online tool has been developed that contains experimental data and can maximize the user convenience for estimation of subcellular localization of eukaryotic mRNA.


Assuntos
Modelos Genéticos , Proteínas/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Máquina de Vetores de Suporte , Bases de Dados Genéticas , Eucariotos/citologia , Eucariotos/genética , Eucariotos/metabolismo , Células Eucarióticas/metabolismo , Células Eucarióticas/ultraestrutura , Humanos , Proteínas/metabolismo , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo
19.
Amino Acids ; 55(9): 1121-1136, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37402073

RESUMO

The ongoing COVID-19 pandemic has caused dramatic loss of human life. There is an urgent need for safe and efficient anti-coronavirus infection drugs. Anti-coronavirus peptides (ACovPs) can inhibit coronavirus infection. With high-efficiency, low-toxicity, and broad-spectrum inhibitory effects on coronaviruses, they are promising candidates to be developed into a new type of anti-coronavirus drug. Experiment is the traditional way of ACovPs' identification, which is less efficient and more expensive. With the accumulation of experimental data on ACovPs, computational prediction provides a cheaper and faster way to find anti-coronavirus peptides' candidates. In this study, we ensemble several state-of-the-art machine learning methodologies to build nine classification models for the prediction of ACovPs. These models were pre-trained using deep neural networks, and the performance of our ensemble model, ACP-Dnnel, was evaluated across three datasets and independent dataset. We followed Chou's 5-step rules. (1) we constructed the benchmark datasets data1, data2, and data3 for training and testing, and introduced the independent validation dataset ACVP-M; (2) we analyzed the peptides sequence composition feature of the benchmark dataset; (3) we constructed the ACP-Dnnel model with deep convolutional neural network (DCNN) merged the bi-directional long short-term memory (BiLSTM) as the base model for pre-training to extract the features embedded in the benchmark dataset, and then, nine classification algorithms were introduced to ensemble together for classification prediction and voting together; (4) tenfold cross-validation was introduced during the training process, and the final model performance was evaluated; (5) finally, we constructed a user-friendly web server accessible to the public at http://150.158.148.228:5000/ . The highest accuracy (ACC) of ACP-Dnnel reaches 97%, and the Matthew's correlation coefficient (MCC) value exceeds 0.9. On three different datasets, its average accuracy is 96.0%. After the latest independent dataset validation, ACP-Dnnel improved at MCC, SP, and ACC values 6.2%, 7.5% and 6.3% greater, respectively. It is suggested that ACP-Dnnel can be helpful for the laboratory identification of ACovPs, speeding up the anti-coronavirus peptide drug discovery and development. We constructed the web server of anti-coronavirus peptides' prediction and it is available at http://150.158.148.228:5000/ .


Assuntos
COVID-19 , Pandemias , Humanos , Peptídeos/farmacologia , Peptídeos/química , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina
20.
J Biomed Inform ; 144: 104390, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37182592

RESUMO

Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.


Assuntos
Transtorno Autístico , Registros Eletrônicos de Saúde , Criança , Humanos , Lactente , Pré-Escolar , Transtorno Autístico/diagnóstico , Valor Preditivo dos Testes , Narração , Eletrônica
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa