Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 7.052
Filtrar
1.
J Korean Med Sci ; 36(5): e46, 2021 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-33527788

RESUMEN

BACKGROUND: It is difficult to distinguish subtle differences shown in computed tomography (CT) images of coronavirus disease 2019 (COVID-19) and bacterial pneumonia patients, which often leads to an inaccurate diagnosis. It is desirable to design and evaluate interpretable feature extraction techniques to describe the patient's condition. METHODS: This is a retrospective cohort study of 170 confirmed patients with COVID-19 or bacterial pneumonia acquired at Yeungnam University Hospital in Daegu, Korea. The Lung and lesion regions were segmented to crop the lesion into 2D patches to train a classifier model that could differentiate between COVID-19 and bacterial pneumonia. The K-means algorithm was used to cluster deep features extracted by the trained model into 20 groups. Each lesion patch cluster was described by a characteristic imaging term for comparison. For each CT image containing multiple lesions, a histogram of lesion types was constructed using the cluster information. Finally, a Support Vector Machine classifier was trained with the histogram and radiomics features to distinguish diseases and severity. RESULTS: The 20 clusters constructed from 170 patients were reviewed based on common radiographic appearance types. Two clusters showed typical findings of COVID-19, with two other clusters showing typical findings related to bacterial pneumonia. Notably, there is one cluster that showed bilateral diffuse ground-glass opacities (GGOs) in the central and peripheral lungs and was considered to be a key factor for severity classification. The proposed method achieved an accuracy of 91.2% for classifying COVID-19 and bacterial pneumonia patients with 95% reported for severity classification. The CT quantitative parameters represented by the values of cluster 8 were correlated with existing laboratory data and clinical parameters. CONCLUSION: Deep chest CT analysis with constructed lesion clusters revealed well-known COVID-19 CT manifestations comparable to manual CT analysis. The constructed histogram features improved accuracy for both diseases and severity classification, and showed correlations with laboratory data and clinical parameters. The constructed histogram features can provide guidance for improved analysis and treatment of COVID-19.


Asunto(s)
/diagnóstico por imagen , Pulmón/diagnóstico por imagen , Neumonía Bacteriana/diagnóstico por imagen , Tomografía Computarizada por Rayos X , Adulto , Anciano , Algoritmos , Inteligencia Artificial , Análisis por Conglomerados , Aprendizaje Profundo , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reconocimiento de Normas Patrones Automatizadas , Reproducibilidad de los Resultados , República de Corea/epidemiología , Estudios Retrospectivos , Índice de Severidad de la Enfermedad , Máquina de Vectores de Soporte
2.
J Med Internet Res ; 23(1): e25535, 2021 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-33404516

RESUMEN

BACKGROUND: Effectively identifying patients with COVID-19 using nonpolymerase chain reaction biomedical data is critical for achieving optimal clinical outcomes. Currently, there is a lack of comprehensive understanding in various biomedical features and appropriate analytical approaches for enabling the early detection and effective diagnosis of patients with COVID-19. OBJECTIVE: We aimed to combine low-dimensional clinical and lab testing data, as well as high-dimensional computed tomography (CT) imaging data, to accurately differentiate between healthy individuals, patients with COVID-19, and patients with non-COVID viral pneumonia, especially at the early stage of infection. METHODS: In this study, we recruited 214 patients with nonsevere COVID-19, 148 patients with severe COVID-19, 198 noninfected healthy participants, and 129 patients with non-COVID viral pneumonia. The participants' clinical information (ie, 23 features), lab testing results (ie, 10 features), and CT scans upon admission were acquired and used as 3 input feature modalities. To enable the late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of CT scans. We then developed 3 machine learning models (ie, k-nearest neighbor, random forest, and support vector machine models) based on the combined 43 features from all 3 modalities to differentiate between the following 4 classes: nonsevere, severe, healthy, and viral pneumonia. RESULTS: Multimodal features provided substantial performance gain from the use of any single feature modality. All 3 machine learning models had high overall prediction accuracy (95.4%-97.7%) and high class-specific prediction accuracy (90.6%-99.9%). CONCLUSIONS: Compared to the existing binary classification benchmarks that are often focused on single-feature modality, this study's hybrid deep learning-machine learning framework provided a novel and effective breakthrough for clinical applications. Our findings, which come from a relatively large sample size, and analytical workflow will supplement and assist with clinical decision support for current COVID-19 diagnostic methods and other clinical applications with high-dimensional multimodal biomedical features.


Asunto(s)
/diagnóstico , Sistemas de Apoyo a Decisiones Clínicas , Salud , Aprendizaje Automático , Neumonía Viral/diagnóstico , /diagnóstico por imagen , Diagnóstico Diferencial , Humanos , Persona de Mediana Edad , Neumonía Viral/diagnóstico por imagen , Máquina de Vectores de Soporte , Tomografía Computarizada por Rayos X
3.
Toxicol Lett ; 340: 4-14, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33421549

RESUMEN

Reproductive toxicity endpoints are a significant safety concern in the assessment of the adverse effects of chemicals in drug discovery. Computational models that can accurately predict a chemical's toxic potential are increasingly pursued to replace traditional animal experiments. Thus, ensemble learning models were built to predict the reproductive toxicity of compounds. Our ensemble models were developed using support vector machine, random forest, and extreme gradient boosting methods and 9 molecular fingerprints calculated for a dataset containing 1823 chemicals. The best prediction performance was achieved by the Ensemble-Top12 model, with an accuracy (ACC) of 86.33 %, a sensitivity (SEN) of 82.02 %, a specificity (SPE) of 90.19 %, and an area under the receiver operating characteristic curve (AUC) of 0.937 in 5-fold cross-validation and ACC, SEN, SPE, and AUC values of 84.38 %, 86.90 %, 90.67 %, and 0.920, respectively, in external validation. We also defined the applicability domain (AD) of the ensemble model by calculating the Tanimoto distance of the training set. Compared with models in existing literature, our ensemble model achieves relatively high ACC, SPE and AUC values. We also identified several fingerprint features related to chemical reproductive toxicity. Considering the performance of model, we recommend using the Ensemble-Top12 model to predict reproductive toxicity in early drug development.


Asunto(s)
Algoritmos , Aprendizaje Automático , Reproducción/efectos de los fármacos , Animales , Simulación por Computador , Humanos , Relación Estructura-Actividad Cuantitativa , Máquina de Vectores de Soporte
4.
Food Chem ; 339: 127795, 2021 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-32836023

RESUMEN

Anthocyanin derivatives and chromatic characteristics of 234 different-vintage red wine were investigated based on a targeted HPLC-MS/MS and CIELAB approach. The K-means cluster analysis showed that the evolution pattern varies amongst anthocyanin derivative classes. Their stabilities are: pinotins > flavanyl-pyranoanthocyanins, vitisin A > monomeric anthocyanin, direct anthocyanin-flavan-3-ols condensation products > vitisin B, anthocyanin ethyl-linked flavan-3-ols products. The proportion of most pyranoanthocyanins becomes more significant among all detected anthocyanin derivatives during wine aging, whereas flavanols-related anthocyanin derivatives (except for flavanyl-pyranoanthocyanins) decreased drastically. PLSR showed that aging tawny characteristics is related to pyranoanthocyanins except for vitisin B, especially pinotins, whereas monomeric anthocyanins and flavanol-related derivates (except for flavanyl-pyranoanthocyanins) contribute to red violet color. But aging color density is more associated with the content of vitisin A and flavanyl-pyranoanthocyanins. Two predictive models based on random forest and support vector machine modeling showed good performance in predicting the extent of wine aging.


Asunto(s)
Antocianinas/análisis , Antocianinas/metabolismo , Análisis de los Alimentos/métodos , Metabolómica/métodos , Vino/análisis , Benzofuranos/análisis , Benzofuranos/metabolismo , Cromatografía Líquida de Alta Presión , Color , Análisis de los Alimentos/estadística & datos numéricos , Análisis de los Mínimos Cuadrados , Análisis Multivariante , Fenoles/análisis , Fenoles/metabolismo , Máquina de Vectores de Soporte , Espectrometría de Masas en Tándem , Factores de Tiempo
5.
Water Res ; 188: 116535, 2021 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-33147564

RESUMEN

Similar to the worldwide proliferation of urbanization, micropollutants have been involved in aquatic and ecological environmental systems. These pollutants have the propensity to wreak havoc on human health and the ecological system; hence, it is important to persistently monitor micropollutants in the environment. Micropollutants are commonly quantified via target analysis using high resolution mass spectrometry and the stable isotope labeled (SIL) standard. However, the cost-intensiveness of this standard presents a major obstacle in measuring micropollutants. This study resolved this problem by developing data-driven models, including deep learning (DL) and machine learning (ML), to estimate the concentration of micropollutants without resorting to the SIL standard. Our study hypothesized that natural organic matter (NOM) could replace internal standards if there was a specific mass spectrum (MS) subset, including NOM information, which correlated with an SIL standard peak. Therefore, we analyzed the MS to find the specific MS subsets for replacing the SIL standard peak. Thirty-five alternative MS subsets were determined for applying DL and ML as input data. Thereafter, we trained four different DL models, namely, ResNet101, GoogLeNet, VGG16, and Inception v3, as well as three different ML models, i.e., random forest (RF), support vector machine (SVM), and artificial neural network (ANN). A total of 680 MS data were used for the model training to estimate five different micropollutants, namely Sulpiride, Metformin, and Benzotriazole. Among the DL models, ResNet 101 exhibited the highest model performance, showing that the average validation R2 and MSE were 0.84 and 0.26 ng/L, respectively, while RF was the best in the ML models, manifesting R2 and MSE values of 0.69 and 0.58 ng/L. The trained models showed accurate training and validation results for the estimation of the five micropollutant concentrations. Therefore, this study demonstrates that the suggested analysis has a potential for alternative micropollutant measurement that has rapid and economic vantages.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Humanos , Isótopos , Estándares de Referencia , Máquina de Vectores de Soporte
6.
Environ Pollut ; 270: 116281, 2021 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-33348140

RESUMEN

Mapping soil contamination enables the delineation of areas where protection measures are needed. Traditional soil sampling on a grid pattern followed by chemical analysis and geostatistical interpolation methods (GIMs), such as Kriging interpolation, can be costly, slow and not well-suited to highly heterogeneous soil environments. Here we propose a novel method to map soil contamination by combining high-resolution aerial imaging (HRAI) with machine learning algorithms. To support model establishment and validation, 1068 soil samples were collected from an arsenic (As) contaminated area in Zhongxiang, Hubei province, China. The average arsenic concentration was 39.88 mg/kg (SD = 213.70 mg/kg), with individual sample points determined as low risk (66.9%), medium risk (29.4%), or high risk (3.7%), respectively. Then, identified features were extracted from a HRAI image of the study area. Four machine learning algorithms were developed to predict As risk levels, including (i) support vector machine (SVM), (ii) multi-layer perceptron (MLP), (iii) random forest (RF), and (iii) extreme random forest (ERF). Among these, we found that the ERF algorithm performed best overall and that its prediction performance was generally better than that of traditional Kriging interpolation. The accuracy of ERF in test area 1 reached 0.87, performing better than RF (0.81), MLP (0.78) and SVM (0.77). The F1-score of ERF for discerning high-risk points in test area 1 was as high as 0.8. The complexity of the distribution of points with different risk levels was a decisive factor in model prediction ability. Identified features in the study area associated with fertilizer factories had the most important contribution to the ERF model. This study demonstrates that HRAI combined with machine learning has good potential to predict As soil risk levels.


Asunto(s)
Arsénico , Arsénico/análisis , China , Contaminación Ambiental , Aprendizaje Automático , Suelo , Máquina de Vectores de Soporte
7.
Food Chem ; 335: 127640, 2021 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-32738536

RESUMEN

In order to distinguish different vegetable oils, adulterated vegetable oils, and to identify and quantify counterfeit vegetable oils, a method based on a small sample size of total synchronous fluorescence (TSyF) spectra combined with convolutional neural network (CNN) was proposed. Four typical vegetable oils were classified by three ways of fine-tuning the pre-trained CNN, the pre-trained CNN as a feature extractor, and traditional chemometrics. The pre-trained CNN was combined with support vector machines to distinguish adulterated sesame oil and counterfeit sesame oil separately with 100% correct classification rates. The pre-trained CNN combined with partial least square regression was used to predict the level of counterfeit sesame oil. The coefficient of determination for calibration (Rc2) values were all greater than 0.99, and the root mean square errors of validation were 0.81% and 1.72%, respectively. These results show that it is feasible to combine TSyF spectra with CNN for vegetable oil identification.


Asunto(s)
Redes Neurales de la Computación , Aceites Vegetales/química , Espectrometría de Fluorescencia/métodos , Calidad de los Alimentos , Fraude , Análisis de los Mínimos Cuadrados , Aceite de Sésamo/química , Máquina de Vectores de Soporte
8.
Food Chem ; 337: 127986, 2021 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-32920269

RESUMEN

We have developed a novel approach that involves inception-resnet network (IRN) modeling based on infrared spectroscopy (IR) for rapid and specific detection of the fish allergen parvalbumin. SDS-PAGE and ELISA were used to validate the new method. Through training and learning with parvalbumin IR spectra from 16 fish species, IRN, support vector machine (SVM), and random forest (RF) models were successfully established and compared. The IRN model extracted highly representative features from the IR spectra, leading to high accuracy in recognizing parvalbumin (up to 97.3%) in a variety of seafood matrices. The proposed infrared spectroscopic IRN (IR-IRN) method was rapid (~20 min, cf. ELISA ~4 h) and required minimal expert knowledge for application. Thus, it could be extended for large-scale field screening and identification of parvalbumin or other potential allergens in complex food matrices.


Asunto(s)
Productos Pesqueros/análisis , Proteínas de Peces/análisis , Redes Neurales de la Computación , Parvalbúminas/análisis , Espectrofotometría Infrarroja/estadística & datos numéricos , Alérgenos/química , Animales , Electroforesis en Gel de Poliacrilamida , Ensayo de Inmunoadsorción Enzimática , Peces/inmunología , Análisis de los Alimentos/métodos , Análisis de los Alimentos/estadística & datos numéricos , Hipersensibilidad a los Alimentos , Ratones Endogámicos BALB C , Parvalbúminas/inmunología , Reproducibilidad de los Resultados , Espectrofotometría Infrarroja/métodos , Máquina de Vectores de Soporte
9.
Neural Netw ; 133: 193-206, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33220643

RESUMEN

Motor imagery (MI) brain-computer interface (BCI) and neurofeedback (NF) with electroencephalogram (EEG) signals are commonly used for motor function improvement in healthy subjects and to restore neurological functions in stroke patients. Generally, in order to decrease noisy and redundant information in unrelated EEG channels, channel selection methods are used which provide feasible BCI and NF implementations with better performances. Our assumption is that there are causal interactions between the channels of EEG signal in MI tasks that are repeated in different trials of a BCI and NF experiment. Therefore, a novel method for EEG channel selection is proposed which is based on Granger causality (GC) analysis. Additionally, the machine-learning approach is used to cluster independent component analysis (ICA) components of the EEG signal into artifact and normal EEG clusters. After channel selection, using the common spatial pattern (CSP) and regularized CSP (RCSP), features are extracted and with the k-nearest neighbor (k-NN), support vector machine (SVM) and linear discriminant analysis (LDA) classifiers, MI tasks are classified into left and right hand MI. The goal of this study is to achieve a method resulting in lower EEG channels with higher classification performance in MI-based BCI and NF by causal constraint. The proposed method based on GC, with only eight selected channels, results in 93.03% accuracy, 92.93% sensitivity, and 93.12% specificity, with RCSP feature extractor and best classifier for each subject, after being applied on Physionet MI dataset, which is increased by 3.95%, 3.73%, and 4.13%, in comparison with correlation-based channel selection method.


Asunto(s)
Interfaces Cerebro-Computador , Electroencefalografía/métodos , Imaginación/fisiología , Movimiento/fisiología , Neurorretroalimentación/métodos , Neurorretroalimentación/fisiología , Interfaces Cerebro-Computador/tendencias , Causalidad , Análisis Discriminante , Humanos , Máquina de Vectores de Soporte
10.
Neurology ; 96(5): e758-e771, 2021 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-33361262

RESUMEN

OBJECTIVE: We assessed preoperative structural brain networks and clinical characteristics of patients with drug-resistant temporal lobe epilepsy (TLE) to identify correlates of postsurgical seizure recurrences. METHODS: We examined data from 51 patients with TLE who underwent anterior temporal lobe resection (ATLR) and 29 healthy controls. For each patient, using the preoperative structural, diffusion, and postoperative structural MRI, we generated 2 networks: presurgery network and surgically spared network. Standardizing these networks with respect to controls, we determined the number of abnormal nodes before surgery and expected to be spared by surgery. We incorporated these 2 abnormality measures and 13 commonly acquired clinical data from each patient into a robust machine learning framework to estimate patient-specific chances of seizures persisting after surgery. RESULTS: Patients with more abnormal nodes had a lower chance of complete seizure freedom at 1 year and, even if seizure-free at 1 year, were more likely to relapse within 5 years. The number of abnormal nodes was greater and their locations more widespread in the surgically spared networks of patients with poor outcome than in patients with good outcome. We achieved an area under the curve of 0.84 ± 0.06 and specificity of 0.89 ± 0.09 in predicting unsuccessful seizure outcomes (International League Against Epilepsy [ILAE] 3-5) as opposed to complete seizure freedom (ILAE 1) at 1 year. Moreover, the model-predicted likelihood of seizure relapse was significantly correlated with the grade of surgical outcome at year 1 and associated with relapses up to 5 years after surgery. CONCLUSION: Node abnormality offers a personalized, noninvasive marker that can be combined with clinical data to better estimate the chances of seizure freedom at 1 year and subsequent relapse up to 5 years after ATLR. CLASSIFICATION OF EVIDENCE: This study provides Class II evidence that node abnormality predicts postsurgical seizure recurrence.


Asunto(s)
Lobectomía Temporal Anterior/métodos , Encéfalo/cirugía , Epilepsia Refractaria/cirugía , Epilepsia del Lóbulo Temporal/cirugía , Vías Nerviosas/cirugía , Máquina de Vectores de Soporte , Adulto , Encéfalo/diagnóstico por imagen , Estudios de Casos y Controles , Imagen de Difusión por Resonancia Magnética , Epilepsia Refractaria/diagnóstico por imagen , Epilepsia del Lóbulo Temporal/diagnóstico por imagen , Femenino , Humanos , Aprendizaje Automático , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , Vías Nerviosas/diagnóstico por imagen , Periodo Posoperatorio , Periodo Preoperatorio , Recurrencia , Resultado del Tratamiento
11.
Artículo en Inglés | MEDLINE | ID: mdl-33322123

RESUMEN

Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure-activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC).


Asunto(s)
Árboles de Decisión , Aprendizaje Automático , Máquina de Vectores de Soporte , Algoritmos , Análisis Discriminante , Humanos , Curva ROC
12.
PLoS One ; 15(12): e0243907, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33320890

RESUMEN

One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.


Asunto(s)
Diabetes Mellitus/diagnóstico por imagen , Neuropatías Diabéticas/diagnóstico por imagen , Aprendizaje Automático , Imagen por Resonancia Magnética , Algoritmos , Árboles de Decisión , Diabetes Mellitus/clasificación , Diabetes Mellitus/patología , Neuropatías Diabéticas/clasificación , Neuropatías Diabéticas/patología , Femenino , Humanos , Masculino , Neuroimagen/métodos , Máquina de Vectores de Soporte
13.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 37(6): 1037-1044, 2020 Dec 25.
Artículo en Chino | MEDLINE | ID: mdl-33369343

RESUMEN

To enhance the accuracy of computer-aided diagnosis of adolescent depression based on electroencephalogram signals, this study collected signals of 32 female adolescents (16 depressed and 16 healthy, age: 16.3 ± 1.3) with eyes colsed for 4 min in a resting state. First, based on the phase synchronization between the signals, the phase-locked value (PLV) method was used to calculate brain functional connectivity in the θ and α frequency bands, respectively. Then based on the graph theory method, the network parameters, such as strength of the weighted network, average characteristic path length, and average clustering coefficient, were calculated separately ( P < 0.05). Next, using the relationship between multiple thresholds and network parameters, the area under the curve (AUC) of each network parameter was extracted as new features ( P < 0.05). Finally, support vector machine (SVM) was used to classify the two groups with the network parameters and their AUC as features. The study results show that with strength, average characteristic path length, and average clustering coefficient as features, the classification accuracy in the θ band is increased from 69% to 71%, 66% to 77%, and 50% to 68%, respectively. In the α band, the accuracy is increased from 72% to 79%, 69% to 82%, and 65% to 75%, respectively. And from overall view, when AUC of network parameters was used as a feature in the α band, the classification accuracy is improved compared to the network parameter feature. In the θ band, only the AUC of average clustering coefficient was applied to classification, and the accuracy is improved by 17.6%. The study proved that based on graph theory, the method of feature optimization of brain function network could provide some theoretical support for the computer-aided diagnosis of adolescent depression.


Asunto(s)
Encéfalo , Máquina de Vectores de Soporte , Adolescente , Encéfalo/diagnóstico por imagen , Diagnóstico por Computador , Electroencefalografía , Femenino , Humanos
14.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 37(6): 1056-1064, 2020 Dec 25.
Artículo en Chino | MEDLINE | ID: mdl-33369345

RESUMEN

In the process of lower limb rehabilitation training, fatigue estimation is of great significance to improve the accuracy of intention recognition and avoid secondary injury. However, most of the existing methods only consider surface electromyography (sEMG) features but ignore electrocardiogram (ECG) features when performing in fatigue estimation, which leads to the low and unstable recognition efficiency. Aiming at this problem, a method that uses the fusion features of ECG and sEMG signal to estimate the fatigue during lower limb rehabilitation was proposed, and an improved particle swarm optimization-support vector machine classifier (improved PSO-SVM) was proposed and used to identify the fusion feature vector. Finally, the accurate recognition of the three states of relax, transition and fatigue was achieved, and the recognition rates were 98.5%, 93.5%, and 95.5%, respectively. Comparative experiments showed that the average recognition rate of this method was 4.50% higher than that of sEMG features alone, and 13.66% higher than that of the combined features of ECG and sEMG without feature fusion. It is proved that the feature fusion of ECG and sEMG signals in the process of lower limb rehabilitation training can be used for recognizing fatigue more accurately.


Asunto(s)
Fatiga , Extremidad Inferior , Algoritmos , Electrocardiografía , Electromiografía , Fatiga/diagnóstico , Humanos , Máquina de Vectores de Soporte
15.
BMC Bioinformatics ; 21(1): 584, 2020 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-33334319

RESUMEN

BACKGROUND: Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS: In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS: In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.


Asunto(s)
Proteínas/química , Máquina de Vectores de Soporte , Bases de Datos de Proteínas , Dimerización , Evolución Molecular , Mapas de Interacción de Proteínas , Proteínas/metabolismo
16.
BMC Bioinformatics ; 21(Suppl 16): 559, 2020 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-33323099

RESUMEN

BACKGROUND: Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs. RESULTS: Firstly, we extracted the feature of ACPs in two aspects: sequence and chemical characteristics of amino acids. For sequence, average 20 amino acids composition was extracted. For chemical characteristics, we classified amino acids into six groups based on the patterns of hydrophobic and hydrophilic residues. Then, deep belief network has been used to encode the features of ACPs. Finally, we purposed Random Relevance Vector Machines to identify the true ACPs. We call this method 'DRACP' and tested the performance of it on two independent datasets. Its AUC and AUPR are higher than 0.9 in both datasets. CONCLUSION: We developed a novel method named 'DRACP' and compared it with some traditional methods. The cross-validation results showed its effectiveness in identifying ACPs.


Asunto(s)
Antineoplásicos/uso terapéutico , Biología Computacional/métodos , Péptidos/uso terapéutico , Humanos , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Péptidos/química , Curva ROC , Máquina de Vectores de Soporte
17.
PLoS One ; 15(12): e0242899, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33320858

RESUMEN

The coronavirus disease (COVID-19), is an ongoing global pandemic caused by severe acute respiratory syndrome. Chest Computed Tomography (CT) is an effective method for detecting lung illnesses, including COVID-19. However, the CT scan is expensive and time-consuming. Therefore, this work focus on detecting COVID-19 using chest X-ray images because it is widely available, faster, and cheaper than CT scan. Many machine learning approaches such as Deep Learning, Neural Network, and Support Vector Machine; have used X-ray for detecting the COVID-19. Although the performance of those approaches is acceptable in terms of accuracy, however, they require high computational time and more memory space. Therefore, this work employs an Optimised Genetic Algorithm-Extreme Learning Machine (OGA-ELM) with three selection criteria (i.e., random, K-tournament, and roulette wheel) to detect COVID-19 using X-ray images. The most crucial strength factors of the Extreme Learning Machine (ELM) are: (i) high capability of the ELM in avoiding overfitting; (ii) its usability on binary and multi-type classifiers; and (iii) ELM could work as a kernel-based support vector machine with a structure of a neural network. These advantages make the ELM efficient in achieving an excellent learning performance. ELMs have successfully been applied in many domains, including medical domains such as breast cancer detection, pathological brain detection, and ductal carcinoma in situ detection, but not yet tested on detecting COVID-19. Hence, this work aims to identify the effectiveness of employing OGA-ELM in detecting COVID-19 using chest X-ray images. In order to reduce the dimensionality of a histogram oriented gradient features, we use principal component analysis. The performance of OGA-ELM is evaluated on a benchmark dataset containing 188 chest X-ray images with two classes: a healthy and a COVID-19 infected. The experimental result shows that the OGA-ELM achieves 100.00% accuracy with fast computation time. This demonstrates that OGA-ELM is an efficient method for COVID-19 detecting using chest X-ray images.


Asunto(s)
/diagnóstico , Aprendizaje Automático , Tórax/diagnóstico por imagen , Algoritmos , /fisiopatología , Humanos , Pulmón/diagnóstico por imagen , Pulmón/fisiopatología , Pulmón/virología , Redes Neurales de la Computación , Máquina de Vectores de Soporte , Tórax/fisiopatología , Tórax/virología , Tomografía Computarizada por Rayos X
18.
BMC Med Inform Decis Mak ; 20(1): 335, 2020 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-33317534

RESUMEN

BACKGROUND: Acute myocardial infarction (AMI) is a serious cardiovascular disease, followed by a high readmission rate within 30-days of discharge. Accurate prediction of AMI readmission is a crucial way to identify the high-risk group and optimize the distribution of medical resources. METHODS: In this study, we propose a stacking-based model to predict the risk of 30-day unplanned all-cause hospital readmissions for AMI patients based on clinical data. Firstly, we conducted an under-sampling method of neighborhood cleaning rule (NCR) to alleviate the class imbalance and then utilized a feature selection method of SelectFromModel (SFM) to select effective features. Secondly, we adopted a self-adaptive approach to select base classifiers from eight candidate models according to their performances in datasets. Finally, we constructed a three-layer stacking model in which layer 1 and layer 2 were base-layer and level 3 was meta-layer. The predictions of the base-layer were used to train the meta-layer in order to make the final forecast. RESULTS: The results show that the proposed model exhibits the highest AUC (0.720), which is higher than that of decision tree (0.681), support vector machine (0.707), random forest (0.701), extra trees (0.709), adaBoost (0.702), bootstrap aggregating (0.704), gradient boosting decision tree (0.710) and extreme gradient enhancement (0.713). CONCLUSION: It is evident that our model could effectively predict the risk of 30-day all cause hospital readmissions for AMI patients and provide decision support for the administration.


Asunto(s)
Técnicas de Apoyo para la Decisión , Infarto del Miocardio/terapia , Readmisión del Paciente , Toma de Decisiones Clínicas , Humanos , Modelos Teóricos , Infarto del Miocardio/diagnóstico , Infarto del Miocardio/epidemiología , Alta del Paciente , Medición de Riesgo , Factores de Riesgo , Máquina de Vectores de Soporte , Factores de Tiempo , Resultado del Tratamiento
19.
Environ Monit Assess ; 192(12): 776, 2020 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-33219864

RESUMEN

Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models' performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.


Asunto(s)
Monitoreo del Ambiente , Agua Subterránea , Aprendizaje Automático , Redes Neurales de la Computación , Máquina de Vectores de Soporte
20.
BMC Bioinformatics ; 21(1): 489, 2020 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-33126851

RESUMEN

BACKGROUND: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. RESULTS: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites. CONCLUSION: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .


Asunto(s)
5-Metilcitosina/metabolismo , ARN/genética , Máquina de Vectores de Soporte , Animales , Arabidopsis/genética , Secuencia de Bases , Humanos , Internet , Ratones , Curva ROC
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA