Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
PLoS One ; 16(11): e0260315, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34797894

RESUMO

Overdose prescription errors sometimes cause serious life-threatening adverse drug events, while underdose errors lead to diminished therapeutic effects. Therefore, it is important to detect and prevent these errors. In the present study, we used the one-class support vector machine (OCSVM), one of the most common unsupervised machine learning algorithms for anomaly detection, to identify overdose and underdose prescriptions. We extracted prescription data from electronic health records in Kyushu University Hospital between January 1, 2014 and December 31, 2019. We constructed an OCSVM model for each of the 21 candidate drugs using three features: age, weight, and dose. Clinical overdose and underdose prescriptions, which were identified and rectified by pharmacists before administration, were collected. Synthetic overdose and underdose prescriptions were created using the maximum and minimum doses, defined by drug labels or the UpToDate database. We applied these prescription data to the OCSVM model and evaluated its detection performance. We also performed comparative analysis with other unsupervised outlier detection algorithms (local outlier factor, isolation forest, and robust covariance). Twenty-seven out of 31 clinical overdose and underdose prescriptions (87.1%) were detected as abnormal by the model. The constructed OCSVM models showed high performance for detecting synthetic overdose prescriptions (precision 0.986, recall 0.964, and F-measure 0.973) and synthetic underdose prescriptions (precision 0.980, recall 0.794, and F-measure 0.839). In comparative analysis, OCSVM showed the best performance. Our models detected the majority of clinical overdose and underdose prescriptions and demonstrated high performance in synthetic data analysis. OCSVM models, constructed using features such as age, weight, and dose, are useful for detecting overdose and underdose prescriptions.


Assuntos
Overdose de Drogas/diagnóstico , Medicamentos sob Prescrição/efeitos adversos , Prescrições/estatística & dados numéricos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Pré-Escolar , Análise de Dados , Coleta de Dados/estatística & dados numéricos , Gerenciamento de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Lactente , Rememoração Mental , Pessoa de Meia-Idade , Máquina de Vetores de Suporte/estatística & dados numéricos , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos , Adulto Jovem
2.
Sci Rep ; 11(1): 9501, 2021 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-33947902

RESUMO

In this study, we aimed to develop and validate a machine learning-based mortality prediction model for hospitalized heat-related illness patients. After 2393 hospitalized patients were extracted from a multicentered heat-related illness registry in Japan, subjects were divided into the training set for development (n = 1516, data from 2014, 2017-2019) and the test set (n = 877, data from 2020) for validation. Twenty-four variables including characteristics of patients, vital signs, and laboratory test data at hospital arrival were trained as predictor features for machine learning. The outcome was death during hospital stay. In validation, the developed machine learning models (logistic regression, support vector machine, random forest, XGBoost) demonstrated favorable performance for outcome prediction with significantly increased values of the area under the precision-recall curve (AUPR) of 0.415 [95% confidence interval (CI) 0.336-0.494], 0.395 [CI 0.318-0.472], 0.426 [CI 0.346-0.506], and 0.528 [CI 0.442-0.614], respectively, compared to that of the conventional acute physiology and chronic health evaluation (APACHE)-II score of 0.287 [CI 0.222-0.351] as a reference standard. The area under the receiver operating characteristic curve (AUROC) values were also high over 0.92 in all models, although there were no statistical differences compared to APACHE-II. This is the first demonstration of the potential of machine learning-based mortality prediction models for heat-related illnesses.


Assuntos
Mortalidade Hospitalar/tendências , Aprendizado de Máquina/estatística & dados numéricos , APACHE , Idoso , Área Sob a Curva , Feminino , Temperatura Alta , Humanos , Unidades de Terapia Intensiva/estatística & dados numéricos , Japão , Tempo de Internação/estatística & dados numéricos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Prognóstico , Curva ROC , Sistema de Registros , Máquina de Vetores de Suporte/estatística & dados numéricos
3.
PLoS One ; 16(5): e0250631, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33979356

RESUMO

Environmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a multi-object GT image set. EMDS-5 has 21 types of EMs, each of which contains 20 original EM images, 20 single-object GT images and 20 multi-object GT images. EMDS-5 can realize to evaluate image preprocessing, image segmentation, feature extraction, image classification and image retrieval functions. In order to prove the effectiveness of EMDS-5, for each function, we select the most representative algorithms and price indicators for testing and evaluation. The image preprocessing functions contain two parts: image denoising and image edge detection. Image denoising uses nine kinds of filters to denoise 13 kinds of noises, respectively. In the aspect of edge detection, six edge detection operators are used to detect the edges of the images, and two evaluation indicators, peak-signal to noise ratio and mean structural similarity, are used for evaluation. Image segmentation includes single-object image segmentation and multi-object image segmentation. Six methods are used for single-object image segmentation, while k-means and U-net are used for multi-object segmentation. We extract nine features from the images in EMDS-5 and use the Support Vector Machine (SVM) classifier for testing. In terms of image classification, we select the VGG16 feature to test SVM, k-Nearest Neighbors, Random Forests. We test two types of retrieval approaches: texture feature retrieval and deep learning feature retrieval. We select the last layer of features of VGG16 network and ResNet50 network as feature vectors. We use mean average precision as the evaluation index for retrieval. EMDS-5 is available at the URL:https://github.com/NEUZihan/EMDS-5.git.


Assuntos
Algoritmos , Bases de Dados Factuais/estatística & dados numéricos , Microbiologia Ambiental/normas , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Máquina de Vetores de Suporte/estatística & dados numéricos , Razão Sinal-Ruído
4.
Bioorg Med Chem ; 38: 116119, 2021 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-33831697

RESUMO

In response to the pandemic caused by SARS-CoV-2, we constructed a hybrid support vector machine (SVM) classification model using a set of publicly posted SARS-CoV-2 pseudotyped particle (PP) entry assay repurposing screen data to identify novel potent compounds as a starting point for drug development to treat COVID-19 patients. Two different molecular descriptor systems, atom typing descriptors and 3D fingerprints (FPs), were employed to construct the SVM classification models. Both models achieved reasonable performance, with the area under the curve of receiver operating characteristic (AUC-ROC) of 0.84 and 0.82, respectively. The consensus prediction outperformed the two individual models with significantly improved AUC-ROC of 0.91, where the compounds with inconsistent classifications were excluded. The consensus model was then used to screen the 173,898 compounds in the NCATS annotated and diverse chemical libraries. Of the 255 compounds selected for experimental confirmation, 116 compounds exhibited inhibitory activities in the SARS-CoV-2 PP entry assay with IC50 values ranged between 0.17 µM and 62.2 µM, representing an enrichment factor of 3.2. These 116 active compounds with diverse and novel structures could potentially serve as starting points for chemistry optimization for COVID-19 drug discovery.


Assuntos
Antivirais/farmacologia , SARS-CoV-2/efeitos dos fármacos , Máquina de Vetores de Suporte/estatística & dados numéricos , Internalização do Vírus/efeitos dos fármacos , Área Sob a Curva , Bases de Dados de Compostos Químicos/estatística & dados numéricos , Reposicionamento de Medicamentos , Células HEK293 , Humanos , Testes de Sensibilidade Microbiana , Curva ROC , Bibliotecas de Moléculas Pequenas/farmacologia
5.
BMC Anesthesiol ; 21(1): 66, 2021 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-33653263

RESUMO

BACKGROUND: Estimating the depth of anaesthesia (DoA) is critical in modern anaesthetic practice. Multiple DoA monitors based on electroencephalograms (EEGs) have been widely used for DoA monitoring; however, these monitors may be inaccurate under certain conditions. In this work, we hypothesize that heart rate variability (HRV)-derived features based on a deep neural network can distinguish different anaesthesia states, providing a secondary tool for DoA assessment. METHODS: A novel method of distinguishing different anaesthesia states was developed based on four HRV-derived features in the time and frequency domain combined with a deep neural network. Four features were extracted from an electrocardiogram, including the HRV high-frequency power, low-frequency power, high-to-low-frequency power ratio, and sample entropy. Next, these features were used as inputs for the deep neural network, which utilized the expert assessment of consciousness level as the reference output. Finally, the deep neural network was compared with the logistic regression, support vector machine, and decision tree models. The datasets of 23 anaesthesia patients were used to assess the proposed method. RESULTS: The accuracies of the four models, in distinguishing the anaesthesia states, were 86.2% (logistic regression), 87.5% (support vector machine), 87.2% (decision tree), and 90.1% (deep neural network). The accuracy of deep neural network was higher than those of the logistic regression (p < 0.05), support vector machine (p < 0.05), and decision tree (p < 0.05) approaches. Our method outperformed the logistic regression, support vector machine, and decision tree methods. CONCLUSIONS: The incorporation of four HRV-derived features in the time and frequency domain and a deep neural network could accurately distinguish between different anaesthesia states; however, this study is a pilot feasibility study. The proposed method-with other evaluation methods, such as EEG-is expected to assist anaesthesiologists in the accurate evaluation of the DoA.


Assuntos
Anestesia/estatística & dados numéricos , Eletrocardiografia/métodos , Frequência Cardíaca/efeitos dos fármacos , Redes Neurais de Computação , Árvores de Decisões , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte/estatística & dados numéricos
6.
Protein J ; 40(1): 54-62, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33454893

RESUMO

To investigate the structure-dependent peptide mobility behavior in ion mobility spectrometry (IMS), quantitative structure-spectrum relationship (QSSR) is systematically modeled and predicted for the collision cross section Ω values of totally 162 single-protonated tripeptide fragments extracted from the Bacillus subtilis lipase A. Two different types of structure characterization methods, namely, local and global descriptor as well as three machine learning methods, namely, partial least squares (PLS), support vector machine (SVM) and Gaussian process (GP), are employed to parameterize and correlate the structures and Ω values of these peptide samples. In this procedure, the local descriptor is derived from the principal component analysis (PCA) of 516 physicochemical properties for 20 standard amino acids, which can be used to sequentially characterize the three amino acid residues composing a tripeptide. The global descriptor is calculated using CODESSA method, which can generate > 200 statistically significant variables to characterize the whole molecular structure of a tripeptide. The obtained QSSR models are evaluated rigorously via tenfold cross-validation and Monte Carlo cross-validation (MCCV). A comprehensive comparison is performed on the resulting statistics arising from the systematic combination of different descriptor types and machine learning methods. It is revealed that the local descriptor-based QSSR models have a better fitting ability and predictive power, but worse interpretability, than those based on the global descriptor. In addition, since the QSSR modeling using local descriptor does not consider the three-dimensional conformation of tripeptide samples, the method would be largely efficient as compared to the global descriptor.


Assuntos
Aminoácidos/química , Bacillus subtilis/química , Proteínas de Bactérias/química , Lipase/química , Oligopeptídeos/química , Máquina de Vetores de Suporte/estatística & dados numéricos , Aminoácidos/metabolismo , Bacillus subtilis/enzimologia , Proteínas de Bactérias/metabolismo , Espectrometria de Mobilidade Iônica/estatística & dados numéricos , Análise dos Mínimos Quadrados , Lipase/metabolismo , Método de Monte Carlo , Oligopeptídeos/metabolismo , Análise de Componente Principal , Relação Quantitativa Estrutura-Atividade
7.
Burns ; 47(4): 812-820, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32928613

RESUMO

Accurate classification of burn severities is of vital importance for proper burn treatments. A recent article reported that using the combination of Raman spectroscopy and optical coherence tomography (OCT) classifies different degrees of burns with an overall accuracy of 85% [1]. In this study, we demonstrate the feasibility of using Raman spectroscopy alone to classify burn severities on ex vivo porcine skin tissues. To create different levels of burns, four burn conditions were designed: (i) 200°F for 10s, (ii) 200°F for 30s, (iii) 450°F for 10s and (iv) 450°F for 30s. Raman spectra from 500-2000cm-1 were collected from samples of the four burn conditions as well as the unburnt condition. Classifications were performed using kernel support vector machine (KSVM) with features extracted from the spectra by principal component analysis (PCA), and partial least-square (PLS). Both techniques yielded an average accuracy of approximately 92%, which was independently evaluated by leave-one-out cross-validation (LOOCV). By comparison, PCA+KSVM provides higher accuracy in classifying severe burns, while PLS performs better in classifying mild burns. Variable importance in the projection (VIP) scores from the PLS models reveal that proteins and lipids, amide III, and amino acids are important indicators in separating unburnt or mild burns (200°F), while amide I has a more pronounced impact in separating severe burns (450°F).


Assuntos
Queimaduras/diagnóstico por imagem , Análise Espectral Raman/normas , Queimaduras/complicações , Humanos , Análise de Componente Principal , Índice de Gravidade de Doença , Análise Espectral Raman/métodos , Máquina de Vetores de Suporte/normas , Máquina de Vetores de Suporte/estatística & dados numéricos
8.
J Med Chem ; 63(16): 8761-8777, 2020 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-31512867

RESUMO

In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.


Assuntos
Aprendizado Profundo/estatística & dados numéricos , Compostos Orgânicos/química , Máquina de Vetores de Suporte/estatística & dados numéricos
9.
J Exp Biol ; 222(Pt 24)2019 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-31753908

RESUMO

For analysis of vocal syntax, accurate classification of call sequence structures in different behavioural contexts is essential. However, an effective, intelligent program for classifying call sequences from numerous recorded sound files is still lacking. Here, we employed three machine learning algorithms (logistic regression, support vector machine and decision trees) to classify call sequences of social vocalizations of greater horseshoe bats (Rhinolophus ferrumequinum) in aggressive and distress contexts. The three machine learning algorithms obtained highly accurate classification rates (logistic regression 98%, support vector machine 97% and decision trees 96%). The algorithms also extracted three of the most important features for the classification: the transition between two adjacent syllables, the probability of occurrences of syllables in each position of a sequence, and the characteristics of a sequence. The results of statistical analysis also supported the classification of the algorithms. The study provides the first efficient method for data mining of call sequences and the possibility of linguistic parameters in animal communication. It suggests the presence of song-like syntax in the social vocalizations emitted within a non-breeding context in a bat species.


Assuntos
Quirópteros/fisiologia , Aprendizado de Máquina/estatística & dados numéricos , Vocalização Animal , Animais , Árvores de Decisões , Ecolocação , Modelos Logísticos , Máquina de Vetores de Suporte/estatística & dados numéricos
10.
Comput Methods Programs Biomed ; 179: 104992, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31443858

RESUMO

BACKGROUND AND OBJECTIVE: Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients. METHODS: We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features. RESULTS: The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field. CONCLUSIONS: We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.


Assuntos
Doença da Artéria Coronariana/diagnóstico , Aprendizado de Máquina , Algoritmos , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Diagnóstico por Computador/estatística & dados numéricos , Feminino , Humanos , Aprendizado de Máquina/estatística & dados numéricos , Masculino , Modelos Cardiovasculares , Máquina de Vetores de Suporte/estatística & dados numéricos
11.
J Proteome Res ; 18(8): 3195-3202, 2019 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-31314536

RESUMO

Deep learning (DL), a type of machine learning approach, is a powerful tool for analyzing large sets of data that are derived from biomedical sciences. However, it remains unknown whether DL is suitable for identifying contributing factors, such as biomarkers, in quantitative proteomics data. In this study, we describe an optimized DL-based analytical approach using a data set that was generated by selected reaction monitoring-mass spectrometry (SRM-MS), comprising SRM-MS data from 1008 samples for the diagnosis of pancreatic cancer, to test its classification power. Its performance was compared with that of 5 conventional multivariate and machine learning methods: random forest (RF), support vector machine (SVM), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes (NB). The DL method yielded the best classification (AUC 0.9472 for the test data set) of all approaches. We also optimized the parameters of DL individually to determine which factors were the most significant. In summary, the DL method has advantages in classifying the quantitative proteomics data of pancreatic cancer patients, and our results suggest that its implementation can improve the performance of diagnostic assays in clinical settings.


Assuntos
Aprendizado Profundo/estatística & dados numéricos , Aprendizado de Máquina/estatística & dados numéricos , Espectrometria de Massas/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Teorema de Bayes , Análise por Conglomerados , Humanos , Modelos Logísticos , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/patologia , Máquina de Vetores de Suporte/estatística & dados numéricos
12.
BMC Psychiatry ; 19(1): 210, 2019 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-31277632

RESUMO

BACKGROUND: Previous resting-state functional magnetic resonance imaging (rs-fMRI) studies have revealed intrinsic regional activity alterations in obsessive-compulsive disorder (OCD), but those results were based on group analyses, which limits their applicability to clinical diagnosis and treatment at the level of the individual. METHODS: We examined fractional amplitude low-frequency fluctuation (fALFF) and applied support vector machine (SVM) to discriminate OCD patients from healthy controls on the basis of rs-fMRI data. Values of fALFF, calculated from 68 drug-naive OCD patients and 68 demographically matched healthy controls, served as input features for the classification procedure. RESULTS: The classifier achieved 72% accuracy (p ≤ 0.001). This discrimination was based on regions that included the left superior temporal gyrus, the right middle temporal gyrus, the left supramarginal gyrus and the superior parietal lobule. CONCLUSIONS: These results indicate that OCD-related abnormalities in temporal and parietal lobe activation have predictive power for group membership; furthermore, the findings suggest that machine learning techniques can be used to aid in the identification of individuals with OCD in clinical diagnosis.


Assuntos
Imageamento por Ressonância Magnética/estatística & dados numéricos , Transtorno Obsessivo-Compulsivo/diagnóstico por imagem , Máquina de Vetores de Suporte/estatística & dados numéricos , Adulto , Encéfalo/fisiopatologia , Mapeamento Encefálico/métodos , Estudos de Casos e Controles , Feminino , Humanos , Sistema Límbico/diagnóstico por imagem , Sistema Límbico/fisiopatologia , Imageamento por Ressonância Magnética/métodos , Masculino , Análise Multivariada , Transtorno Obsessivo-Compulsivo/patologia , Lobo Parietal/diagnóstico por imagem , Lobo Parietal/fisiopatologia , Descanso/psicologia , Lobo Temporal/diagnóstico por imagem , Lobo Temporal/fisiopatologia , Adulto Jovem
13.
Technol Health Care ; 27(S1): 31-46, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31045525

RESUMO

In the practical implementation of control of electromyography (sEMG) driven devices, algorithms should recognize the human's motion from sEMG with fast speed and high accuracy. This study proposes two feature engineering (FE) techniques, namely, feature-vector resampling and time-lag techniques, to improve the accuracy and speed of least square support vector machine (LSSVM) for wrist palmar angle estimation from sEMG feature. The root mean square error and correlation coefficients of LSSVM with FE are 9.50 ± 2.32 degree and 0.971 ± 0.018 respectively. The average training time and average execution time of LSSVM with FE in processing 12600 sEMG points are 0.016 s and 0.053 s respectively. To evaluate the proposed algorithm, its estimation results are compared with those of three other methods, namely, LSSVM, radial basis function (RBF) neural network, and RBF with FE. Experimental results verify that introduction of time-lag into feature vector can greatly improve the estimation accuracy of both RBF and LSSVM; meanwhile the application of feature-vector resampling technique can significantly increase the training and execution speed of RBF neural network and LSSVM. Among different algorithms applied in this study, LSSVM with FE techniques performed best in terms of training and execution speed, as well as estimation accuracy.


Assuntos
Eletromiografia/métodos , Máquina de Vetores de Suporte , Adulto , Algoritmos , Eletromiografia/estatística & dados numéricos , Humanos , Análise dos Mínimos Quadrados , Redes Neurais de Computação , Máquina de Vetores de Suporte/estatística & dados numéricos , Adulto Jovem
14.
Medicine (Baltimore) ; 98(14): e15022, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30946334

RESUMO

BACKGROUND: To explore whether radiomics combined with computed tomography (CT) images can be used to establish a model for differentiating high grade (International Society of Urological Pathology [ISUP] grade III-IV) from low-grade (ISUP I-II) clear cell renal cell carcinoma (ccRCC). METHODS: For this retrospective study, 3-phase contrast-enhanced CT images were collected from 227 patients with pathologically confirmed ISUP-grade ccRCC (155 cases in the low-grade group and 72 cases in the high-grade group). First, we delineated the largest dimension of the tumor in the corticomedullary and nephrographic CT images to obtain the region of interest. Second, variance selection, single variable selection, and the least absolute shrinkage and selection operator were used to select features in the corticomedullary phase, nephrographic phase, and 2-phase union samples, respectively. Finally, a model was constructed using the optimal features, and the receiver operating characteristic curve and area under the curve (AUC) were used to evaluate the predictive performance of the features in the training and validation queues. A Z test was employed to compare the differences in AUC values. RESULTS: The support vector machine (SVM) model constructed using the screening features for the 2-stage joint samples can effectively distinguish between high- and low-grade ccRCC, and obtained the highest prediction accuracy. Its AUC values in the training queue and the validation queue were 0.88 and 0.91, respectively. The results of the Z test showed that the differences between the 3 groups were not statistically significant. CONCLUSION: The SVM model constructed by CT-based radiomic features can effectively identify the ISUP grades of ccRCC.


Assuntos
Carcinoma de Células Renais/diagnóstico , Neoplasias Renais/diagnóstico , Gradação de Tumores/métodos , Máquina de Vetores de Suporte/estatística & dados numéricos , Tomografia Computadorizada por Raios X/estatística & dados numéricos , Área Sob a Curva , Carcinoma de Células Renais/patologia , Diagnóstico Diferencial , Feminino , Humanos , Neoplasias Renais/patologia , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Curva ROC , Estudos Retrospectivos
15.
Nat Protoc ; 14(4): 1206-1234, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30894694

RESUMO

Blood-based diagnostics tests, using individual or panels of biomarkers, may revolutionize disease diagnostics and enable minimally invasive therapy monitoring. However, selection of the most relevant biomarkers from liquid biosources remains an immense challenge. We recently presented the thromboSeq pipeline, which enables RNA sequencing and cancer classification via self-learning and swarm intelligence-enhanced bioinformatics algorithms using blood platelet RNA. Here, we provide the wet-lab protocol for the generation of platelet RNA-sequencing libraries and the dry-lab protocol for the development of swarm intelligence-enhanced machine-learning-based classification algorithms. The wet-lab protocol includes platelet RNA isolation, mRNA amplification, and preparation for next-generation sequencing. The dry-lab protocol describes the automated FASTQ file pre-processing to quantified gene counts, quality controls, data normalization and correction, and swarm intelligence-enhanced support vector machine (SVM) algorithm development. This protocol enables platelet RNA profiling from 500 pg of platelet RNA and allows automated and optimized biomarker panel selection. The wet-lab protocol can be performed in 5 d before sequencing, and the algorithm development can be completed in 2 d, depending on computational resources. The protocol requires basic molecular biology skills and a basic understanding of Linux and R. In all, with this protocol, we aim to enable the scientific community to test platelet RNA for diagnostic algorithm development.


Assuntos
Plaquetas/metabolismo , DNA Complementar/análise , RNA Mensageiro/análise , Análise de Sequência de RNA/métodos , Máquina de Vetores de Suporte/estatística & dados numéricos , Biomarcadores/sangue , Plaquetas/química , Biologia Computacional/métodos , DNA Complementar/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Splicing de RNA , RNA Mensageiro/genética , Análise de Sequência de RNA/estatística & dados numéricos
16.
BMC Genomics ; 20(1): 167, 2019 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-30832569

RESUMO

BACKGROUND: Deep learning has made tremendous successes in numerous artificial intelligence applications and is unsurprisingly penetrating into various biomedical domains. High-throughput omics data in the form of molecular profile matrices, such as transcriptomes and metabolomes, have long existed as a valuable resource for facilitating diagnosis of patient statuses/stages. It is timely imperative to compare deep learning neural networks against classical machine learning methods in the setting of matrix-formed omics data in terms of classification accuracy and robustness. RESULTS: Using 37 high throughput omics datasets, covering transcriptomes and metabolomes, we evaluated the classification power of deep learning compared to traditional machine learning methods. Representative deep learning methods, Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN), were deployed and explored in seeking optimal architectures for the best classification performance. Together with five classical supervised classification methods (Linear Discriminant Analysis, Multinomial Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machine), MLP and CNN were comparatively tested on the 37 datasets to predict disease stages or to discriminate diseased samples from normal samples. MLPs achieved the highest overall accuracy among all methods tested. More thorough analyses revealed that single hidden layer MLPs with ample hidden units outperformed deeper MLPs. Furthermore, MLP was one of the most robust methods against imbalanced class composition and inaccurate class labels. CONCLUSION: Our results concluded that shallow MLPs (of one or two hidden layers) with ample hidden neurons are sufficient to achieve superior and robust classification performance in exploiting numerical matrix-formed omics data for diagnosis purpose. Specific observations regarding optimal network width, class imbalance tolerance, and inaccurate labeling tolerance will inform future improvement of neural network applications on functional genomics data.


Assuntos
Aprendizado Profundo/tendências , Perfilação da Expressão Gênica/estatística & dados numéricos , Aprendizado de Máquina/tendências , Redes Neurais de Computação , Algoritmos , Inteligência Artificial/estatística & dados numéricos , Teorema de Bayes , Aprendizado Profundo/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Logísticos , Aprendizado de Máquina/estatística & dados numéricos , Metaboloma/genética , Máquina de Vetores de Suporte/estatística & dados numéricos , Máquina de Vetores de Suporte/tendências
17.
PLoS One ; 13(1): e0188996, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29304512

RESUMO

Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.


Assuntos
Aumento da Imagem/métodos , Aprendizado de Máquina/estatística & dados numéricos , Lógica Fuzzy , Modelos Estatísticos , Tecnologia de Sensoriamento Remoto/estatística & dados numéricos , Máquina de Vetores de Suporte/estatística & dados numéricos
18.
Neural Netw ; 98: 114-121, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29227960

RESUMO

Support vector ordinal regression (SVOR) is a popular method for tackling ordinal regression problems. Solution path provides a compact representation of optimal solutions for all values of regularization parameter, which is extremely useful for model selection. However, due to the complicated formulation of SVOR (including multiple equalities and extra variables), there is still no solution path algorithm proposed for SVOR. In this paper, we propose a regularization path algorithm for SVOR which can track the two sets of variables of SVOR w.r.t. the regularization parameter. Technically, we use the QR decomposition to handle the singular matrices in the regularization path. Experiment results on a variety of datasets not only confirm the effectiveness of our regularization path algorithm, but also show the superiority of our regularization path algorithm on model selection.


Assuntos
Algoritmos , Conjuntos de Dados como Assunto/estatística & dados numéricos , Máquina de Vetores de Suporte/estatística & dados numéricos
19.
Cancer Biomark ; 21(2): 393-413, 2018 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-29226857

RESUMO

Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture + morphological, and EFDs + morphological features give the highest accuracy of 99.71% and AUC of 1.00.


Assuntos
Aprendizado de Máquina/estatística & dados numéricos , Neoplasias da Próstata/diagnóstico , Máquina de Vetores de Suporte/estatística & dados numéricos , Teorema de Bayes , Humanos , Masculino , Neoplasias da Próstata/patologia
20.
Comput Inform Nurs ; 35(8): 408-416, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28800580

RESUMO

We constructed a model using a support vector machine to determine whether an inpatient will suffer a fall on a given day, depending on patient status on the previous day. Using fall report data from our own facility and intensity-of-nursing-care-needs data accumulated through hospital information systems, a dataset comprising approximately 1.2 million patient-days was created. Approximately 50% of the dataset was used as training and testing data. A multistep grid search was conducted using the semicomprehensive combination of three parameters. A discriminant model for the testing data was created for each parameter to identify which parameter had the highest score by calculating the sensitivity and specificity. The score of the model with the highest score had a sensitivity of 64.9% and a specificity of 69.6%. By adopting a method that relies on daily data recorded in the electronic medical record system and accurately predicts unknown data, we were able to overcome issues described in previous studies while simultaneously constructing a discriminant model for patients' fall risk that does not burden nurses and patients with information gathering.


Assuntos
Acidentes por Quedas/prevenção & controle , Pacientes Internados/classificação , Máquina de Vetores de Suporte/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Hospitais , Humanos , Masculino , Papel do Profissional de Enfermagem , Medição de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA