Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 27
1.
PLoS One ; 16(11): e0260315, 2021.
Article En | MEDLINE | ID: mdl-34797894

Overdose prescription errors sometimes cause serious life-threatening adverse drug events, while underdose errors lead to diminished therapeutic effects. Therefore, it is important to detect and prevent these errors. In the present study, we used the one-class support vector machine (OCSVM), one of the most common unsupervised machine learning algorithms for anomaly detection, to identify overdose and underdose prescriptions. We extracted prescription data from electronic health records in Kyushu University Hospital between January 1, 2014 and December 31, 2019. We constructed an OCSVM model for each of the 21 candidate drugs using three features: age, weight, and dose. Clinical overdose and underdose prescriptions, which were identified and rectified by pharmacists before administration, were collected. Synthetic overdose and underdose prescriptions were created using the maximum and minimum doses, defined by drug labels or the UpToDate database. We applied these prescription data to the OCSVM model and evaluated its detection performance. We also performed comparative analysis with other unsupervised outlier detection algorithms (local outlier factor, isolation forest, and robust covariance). Twenty-seven out of 31 clinical overdose and underdose prescriptions (87.1%) were detected as abnormal by the model. The constructed OCSVM models showed high performance for detecting synthetic overdose prescriptions (precision 0.986, recall 0.964, and F-measure 0.973) and synthetic underdose prescriptions (precision 0.980, recall 0.794, and F-measure 0.839). In comparative analysis, OCSVM showed the best performance. Our models detected the majority of clinical overdose and underdose prescriptions and demonstrated high performance in synthetic data analysis. OCSVM models, constructed using features such as age, weight, and dose, are useful for detecting overdose and underdose prescriptions.


Drug Overdose/diagnosis , Prescription Drugs/adverse effects , Prescriptions/statistics & numerical data , Adolescent , Adult , Aged , Aged, 80 and over , Algorithms , Child, Preschool , Data Analysis , Data Collection/statistics & numerical data , Data Management/statistics & numerical data , Databases, Factual/statistics & numerical data , Electronic Health Records/statistics & numerical data , Humans , Infant , Mental Recall , Middle Aged , Support Vector Machine/statistics & numerical data , Unsupervised Machine Learning/statistics & numerical data , Young Adult
2.
Sci Rep ; 11(1): 9501, 2021 05 04.
Article En | MEDLINE | ID: mdl-33947902

In this study, we aimed to develop and validate a machine learning-based mortality prediction model for hospitalized heat-related illness patients. After 2393 hospitalized patients were extracted from a multicentered heat-related illness registry in Japan, subjects were divided into the training set for development (n = 1516, data from 2014, 2017-2019) and the test set (n = 877, data from 2020) for validation. Twenty-four variables including characteristics of patients, vital signs, and laboratory test data at hospital arrival were trained as predictor features for machine learning. The outcome was death during hospital stay. In validation, the developed machine learning models (logistic regression, support vector machine, random forest, XGBoost) demonstrated favorable performance for outcome prediction with significantly increased values of the area under the precision-recall curve (AUPR) of 0.415 [95% confidence interval (CI) 0.336-0.494], 0.395 [CI 0.318-0.472], 0.426 [CI 0.346-0.506], and 0.528 [CI 0.442-0.614], respectively, compared to that of the conventional acute physiology and chronic health evaluation (APACHE)-II score of 0.287 [CI 0.222-0.351] as a reference standard. The area under the receiver operating characteristic curve (AUROC) values were also high over 0.92 in all models, although there were no statistical differences compared to APACHE-II. This is the first demonstration of the potential of machine learning-based mortality prediction models for heat-related illnesses.


Hospital Mortality/trends , Machine Learning/statistics & numerical data , APACHE , Aged , Area Under Curve , Female , Hot Temperature , Humans , Intensive Care Units/statistics & numerical data , Japan , Length of Stay/statistics & numerical data , Logistic Models , Male , Middle Aged , Prognosis , ROC Curve , Registries , Support Vector Machine/statistics & numerical data
3.
PLoS One ; 16(5): e0250631, 2021.
Article En | MEDLINE | ID: mdl-33979356

Environmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a multi-object GT image set. EMDS-5 has 21 types of EMs, each of which contains 20 original EM images, 20 single-object GT images and 20 multi-object GT images. EMDS-5 can realize to evaluate image preprocessing, image segmentation, feature extraction, image classification and image retrieval functions. In order to prove the effectiveness of EMDS-5, for each function, we select the most representative algorithms and price indicators for testing and evaluation. The image preprocessing functions contain two parts: image denoising and image edge detection. Image denoising uses nine kinds of filters to denoise 13 kinds of noises, respectively. In the aspect of edge detection, six edge detection operators are used to detect the edges of the images, and two evaluation indicators, peak-signal to noise ratio and mean structural similarity, are used for evaluation. Image segmentation includes single-object image segmentation and multi-object image segmentation. Six methods are used for single-object image segmentation, while k-means and U-net are used for multi-object segmentation. We extract nine features from the images in EMDS-5 and use the Support Vector Machine (SVM) classifier for testing. In terms of image classification, we select the VGG16 feature to test SVM, k-Nearest Neighbors, Random Forests. We test two types of retrieval approaches: texture feature retrieval and deep learning feature retrieval. We select the last layer of features of VGG16 network and ResNet50 network as feature vectors. We use mean average precision as the evaluation index for retrieval. EMDS-5 is available at the URL:https://github.com/NEUZihan/EMDS-5.git.


Algorithms , Databases, Factual/statistics & numerical data , Environmental Microbiology/standards , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Support Vector Machine/statistics & numerical data , Signal-To-Noise Ratio
4.
Bioorg Med Chem ; 38: 116119, 2021 05 15.
Article En | MEDLINE | ID: mdl-33831697

In response to the pandemic caused by SARS-CoV-2, we constructed a hybrid support vector machine (SVM) classification model using a set of publicly posted SARS-CoV-2 pseudotyped particle (PP) entry assay repurposing screen data to identify novel potent compounds as a starting point for drug development to treat COVID-19 patients. Two different molecular descriptor systems, atom typing descriptors and 3D fingerprints (FPs), were employed to construct the SVM classification models. Both models achieved reasonable performance, with the area under the curve of receiver operating characteristic (AUC-ROC) of 0.84 and 0.82, respectively. The consensus prediction outperformed the two individual models with significantly improved AUC-ROC of 0.91, where the compounds with inconsistent classifications were excluded. The consensus model was then used to screen the 173,898 compounds in the NCATS annotated and diverse chemical libraries. Of the 255 compounds selected for experimental confirmation, 116 compounds exhibited inhibitory activities in the SARS-CoV-2 PP entry assay with IC50 values ranged between 0.17 µM and 62.2 µM, representing an enrichment factor of 3.2. These 116 active compounds with diverse and novel structures could potentially serve as starting points for chemistry optimization for COVID-19 drug discovery.


Antiviral Agents/pharmacology , SARS-CoV-2/drug effects , Support Vector Machine/statistics & numerical data , Virus Internalization/drug effects , Area Under Curve , Databases, Chemical/statistics & numerical data , Drug Repositioning , HEK293 Cells , Humans , Microbial Sensitivity Tests , ROC Curve , Small Molecule Libraries/pharmacology
5.
BMC Anesthesiol ; 21(1): 66, 2021 03 02.
Article En | MEDLINE | ID: mdl-33653263

BACKGROUND: Estimating the depth of anaesthesia (DoA) is critical in modern anaesthetic practice. Multiple DoA monitors based on electroencephalograms (EEGs) have been widely used for DoA monitoring; however, these monitors may be inaccurate under certain conditions. In this work, we hypothesize that heart rate variability (HRV)-derived features based on a deep neural network can distinguish different anaesthesia states, providing a secondary tool for DoA assessment. METHODS: A novel method of distinguishing different anaesthesia states was developed based on four HRV-derived features in the time and frequency domain combined with a deep neural network. Four features were extracted from an electrocardiogram, including the HRV high-frequency power, low-frequency power, high-to-low-frequency power ratio, and sample entropy. Next, these features were used as inputs for the deep neural network, which utilized the expert assessment of consciousness level as the reference output. Finally, the deep neural network was compared with the logistic regression, support vector machine, and decision tree models. The datasets of 23 anaesthesia patients were used to assess the proposed method. RESULTS: The accuracies of the four models, in distinguishing the anaesthesia states, were 86.2% (logistic regression), 87.5% (support vector machine), 87.2% (decision tree), and 90.1% (deep neural network). The accuracy of deep neural network was higher than those of the logistic regression (p < 0.05), support vector machine (p < 0.05), and decision tree (p < 0.05) approaches. Our method outperformed the logistic regression, support vector machine, and decision tree methods. CONCLUSIONS: The incorporation of four HRV-derived features in the time and frequency domain and a deep neural network could accurately distinguish between different anaesthesia states; however, this study is a pilot feasibility study. The proposed method-with other evaluation methods, such as EEG-is expected to assist anaesthesiologists in the accurate evaluation of the DoA.


Anesthesia/statistics & numerical data , Electrocardiography/methods , Heart Rate/drug effects , Neural Networks, Computer , Decision Trees , Female , Humans , Male , Middle Aged , Reproducibility of Results , Support Vector Machine/statistics & numerical data
6.
Protein J ; 40(1): 54-62, 2021 02.
Article En | MEDLINE | ID: mdl-33454893

To investigate the structure-dependent peptide mobility behavior in ion mobility spectrometry (IMS), quantitative structure-spectrum relationship (QSSR) is systematically modeled and predicted for the collision cross section Ω values of totally 162 single-protonated tripeptide fragments extracted from the Bacillus subtilis lipase A. Two different types of structure characterization methods, namely, local and global descriptor as well as three machine learning methods, namely, partial least squares (PLS), support vector machine (SVM) and Gaussian process (GP), are employed to parameterize and correlate the structures and Ω values of these peptide samples. In this procedure, the local descriptor is derived from the principal component analysis (PCA) of 516 physicochemical properties for 20 standard amino acids, which can be used to sequentially characterize the three amino acid residues composing a tripeptide. The global descriptor is calculated using CODESSA method, which can generate > 200 statistically significant variables to characterize the whole molecular structure of a tripeptide. The obtained QSSR models are evaluated rigorously via tenfold cross-validation and Monte Carlo cross-validation (MCCV). A comprehensive comparison is performed on the resulting statistics arising from the systematic combination of different descriptor types and machine learning methods. It is revealed that the local descriptor-based QSSR models have a better fitting ability and predictive power, but worse interpretability, than those based on the global descriptor. In addition, since the QSSR modeling using local descriptor does not consider the three-dimensional conformation of tripeptide samples, the method would be largely efficient as compared to the global descriptor.


Amino Acids/chemistry , Bacillus subtilis/chemistry , Bacterial Proteins/chemistry , Lipase/chemistry , Oligopeptides/chemistry , Support Vector Machine/statistics & numerical data , Amino Acids/metabolism , Bacillus subtilis/enzymology , Bacterial Proteins/metabolism , Ion Mobility Spectrometry/statistics & numerical data , Least-Squares Analysis , Lipase/metabolism , Monte Carlo Method , Oligopeptides/metabolism , Principal Component Analysis , Quantitative Structure-Activity Relationship
7.
Burns ; 47(4): 812-820, 2021 06.
Article En | MEDLINE | ID: mdl-32928613

Accurate classification of burn severities is of vital importance for proper burn treatments. A recent article reported that using the combination of Raman spectroscopy and optical coherence tomography (OCT) classifies different degrees of burns with an overall accuracy of 85% [1]. In this study, we demonstrate the feasibility of using Raman spectroscopy alone to classify burn severities on ex vivo porcine skin tissues. To create different levels of burns, four burn conditions were designed: (i) 200°F for 10s, (ii) 200°F for 30s, (iii) 450°F for 10s and (iv) 450°F for 30s. Raman spectra from 500-2000cm-1 were collected from samples of the four burn conditions as well as the unburnt condition. Classifications were performed using kernel support vector machine (KSVM) with features extracted from the spectra by principal component analysis (PCA), and partial least-square (PLS). Both techniques yielded an average accuracy of approximately 92%, which was independently evaluated by leave-one-out cross-validation (LOOCV). By comparison, PCA+KSVM provides higher accuracy in classifying severe burns, while PLS performs better in classifying mild burns. Variable importance in the projection (VIP) scores from the PLS models reveal that proteins and lipids, amide III, and amino acids are important indicators in separating unburnt or mild burns (200°F), while amide I has a more pronounced impact in separating severe burns (450°F).


Burns/diagnostic imaging , Spectrum Analysis, Raman/standards , Burns/complications , Humans , Principal Component Analysis , Severity of Illness Index , Spectrum Analysis, Raman/methods , Support Vector Machine/standards , Support Vector Machine/statistics & numerical data
8.
J Med Chem ; 63(16): 8761-8777, 2020 08 27.
Article En | MEDLINE | ID: mdl-31512867

In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.


Deep Learning/statistics & numerical data , Organic Chemicals/chemistry , Support Vector Machine/statistics & numerical data
9.
J Exp Biol ; 222(Pt 24)2019 12 18.
Article En | MEDLINE | ID: mdl-31753908

For analysis of vocal syntax, accurate classification of call sequence structures in different behavioural contexts is essential. However, an effective, intelligent program for classifying call sequences from numerous recorded sound files is still lacking. Here, we employed three machine learning algorithms (logistic regression, support vector machine and decision trees) to classify call sequences of social vocalizations of greater horseshoe bats (Rhinolophus ferrumequinum) in aggressive and distress contexts. The three machine learning algorithms obtained highly accurate classification rates (logistic regression 98%, support vector machine 97% and decision trees 96%). The algorithms also extracted three of the most important features for the classification: the transition between two adjacent syllables, the probability of occurrences of syllables in each position of a sequence, and the characteristics of a sequence. The results of statistical analysis also supported the classification of the algorithms. The study provides the first efficient method for data mining of call sequences and the possibility of linguistic parameters in animal communication. It suggests the presence of song-like syntax in the social vocalizations emitted within a non-breeding context in a bat species.


Chiroptera/physiology , Machine Learning/statistics & numerical data , Vocalization, Animal , Animals , Decision Trees , Echolocation , Logistic Models , Support Vector Machine/statistics & numerical data
10.
Comput Methods Programs Biomed ; 179: 104992, 2019 Oct.
Article En | MEDLINE | ID: mdl-31443858

BACKGROUND AND OBJECTIVE: Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients. METHODS: We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features. RESULTS: The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field. CONCLUSIONS: We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.


Coronary Artery Disease/diagnosis , Machine Learning , Algorithms , Data Mining/statistics & numerical data , Databases, Factual/statistics & numerical data , Diagnosis, Computer-Assisted/statistics & numerical data , Female , Humans , Machine Learning/statistics & numerical data , Male , Models, Cardiovascular , Support Vector Machine/statistics & numerical data
11.
J Proteome Res ; 18(8): 3195-3202, 2019 08 02.
Article En | MEDLINE | ID: mdl-31314536

Deep learning (DL), a type of machine learning approach, is a powerful tool for analyzing large sets of data that are derived from biomedical sciences. However, it remains unknown whether DL is suitable for identifying contributing factors, such as biomarkers, in quantitative proteomics data. In this study, we describe an optimized DL-based analytical approach using a data set that was generated by selected reaction monitoring-mass spectrometry (SRM-MS), comprising SRM-MS data from 1008 samples for the diagnosis of pancreatic cancer, to test its classification power. Its performance was compared with that of 5 conventional multivariate and machine learning methods: random forest (RF), support vector machine (SVM), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes (NB). The DL method yielded the best classification (AUC 0.9472 for the test data set) of all approaches. We also optimized the parameters of DL individually to determine which factors were the most significant. In summary, the DL method has advantages in classifying the quantitative proteomics data of pancreatic cancer patients, and our results suggest that its implementation can improve the performance of diagnostic assays in clinical settings.


Deep Learning/statistics & numerical data , Machine Learning/statistics & numerical data , Mass Spectrometry/statistics & numerical data , Proteomics/statistics & numerical data , Algorithms , Bayes Theorem , Cluster Analysis , Humans , Logistic Models , Pancreatic Neoplasms/diagnosis , Pancreatic Neoplasms/pathology , Support Vector Machine/statistics & numerical data
12.
BMC Psychiatry ; 19(1): 210, 2019 07 05.
Article En | MEDLINE | ID: mdl-31277632

BACKGROUND: Previous resting-state functional magnetic resonance imaging (rs-fMRI) studies have revealed intrinsic regional activity alterations in obsessive-compulsive disorder (OCD), but those results were based on group analyses, which limits their applicability to clinical diagnosis and treatment at the level of the individual. METHODS: We examined fractional amplitude low-frequency fluctuation (fALFF) and applied support vector machine (SVM) to discriminate OCD patients from healthy controls on the basis of rs-fMRI data. Values of fALFF, calculated from 68 drug-naive OCD patients and 68 demographically matched healthy controls, served as input features for the classification procedure. RESULTS: The classifier achieved 72% accuracy (p ≤ 0.001). This discrimination was based on regions that included the left superior temporal gyrus, the right middle temporal gyrus, the left supramarginal gyrus and the superior parietal lobule. CONCLUSIONS: These results indicate that OCD-related abnormalities in temporal and parietal lobe activation have predictive power for group membership; furthermore, the findings suggest that machine learning techniques can be used to aid in the identification of individuals with OCD in clinical diagnosis.


Magnetic Resonance Imaging/statistics & numerical data , Obsessive-Compulsive Disorder/diagnostic imaging , Support Vector Machine/statistics & numerical data , Adult , Brain/physiopathology , Brain Mapping/methods , Case-Control Studies , Female , Humans , Limbic System/diagnostic imaging , Limbic System/physiopathology , Magnetic Resonance Imaging/methods , Male , Multivariate Analysis , Obsessive-Compulsive Disorder/pathology , Parietal Lobe/diagnostic imaging , Parietal Lobe/physiopathology , Rest/psychology , Temporal Lobe/diagnostic imaging , Temporal Lobe/physiopathology , Young Adult
13.
Technol Health Care ; 27(S1): 31-46, 2019.
Article En | MEDLINE | ID: mdl-31045525

In the practical implementation of control of electromyography (sEMG) driven devices, algorithms should recognize the human's motion from sEMG with fast speed and high accuracy. This study proposes two feature engineering (FE) techniques, namely, feature-vector resampling and time-lag techniques, to improve the accuracy and speed of least square support vector machine (LSSVM) for wrist palmar angle estimation from sEMG feature. The root mean square error and correlation coefficients of LSSVM with FE are 9.50 ± 2.32 degree and 0.971 ± 0.018 respectively. The average training time and average execution time of LSSVM with FE in processing 12600 sEMG points are 0.016 s and 0.053 s respectively. To evaluate the proposed algorithm, its estimation results are compared with those of three other methods, namely, LSSVM, radial basis function (RBF) neural network, and RBF with FE. Experimental results verify that introduction of time-lag into feature vector can greatly improve the estimation accuracy of both RBF and LSSVM; meanwhile the application of feature-vector resampling technique can significantly increase the training and execution speed of RBF neural network and LSSVM. Among different algorithms applied in this study, LSSVM with FE techniques performed best in terms of training and execution speed, as well as estimation accuracy.


Electromyography/methods , Support Vector Machine , Adult , Algorithms , Electromyography/statistics & numerical data , Humans , Least-Squares Analysis , Neural Networks, Computer , Support Vector Machine/statistics & numerical data , Young Adult
14.
Medicine (Baltimore) ; 98(14): e15022, 2019 Apr.
Article En | MEDLINE | ID: mdl-30946334

BACKGROUND: To explore whether radiomics combined with computed tomography (CT) images can be used to establish a model for differentiating high grade (International Society of Urological Pathology [ISUP] grade III-IV) from low-grade (ISUP I-II) clear cell renal cell carcinoma (ccRCC). METHODS: For this retrospective study, 3-phase contrast-enhanced CT images were collected from 227 patients with pathologically confirmed ISUP-grade ccRCC (155 cases in the low-grade group and 72 cases in the high-grade group). First, we delineated the largest dimension of the tumor in the corticomedullary and nephrographic CT images to obtain the region of interest. Second, variance selection, single variable selection, and the least absolute shrinkage and selection operator were used to select features in the corticomedullary phase, nephrographic phase, and 2-phase union samples, respectively. Finally, a model was constructed using the optimal features, and the receiver operating characteristic curve and area under the curve (AUC) were used to evaluate the predictive performance of the features in the training and validation queues. A Z test was employed to compare the differences in AUC values. RESULTS: The support vector machine (SVM) model constructed using the screening features for the 2-stage joint samples can effectively distinguish between high- and low-grade ccRCC, and obtained the highest prediction accuracy. Its AUC values in the training queue and the validation queue were 0.88 and 0.91, respectively. The results of the Z test showed that the differences between the 3 groups were not statistically significant. CONCLUSION: The SVM model constructed by CT-based radiomic features can effectively identify the ISUP grades of ccRCC.


Carcinoma, Renal Cell/diagnosis , Kidney Neoplasms/diagnosis , Neoplasm Grading/methods , Support Vector Machine/statistics & numerical data , Tomography, X-Ray Computed/statistics & numerical data , Area Under Curve , Carcinoma, Renal Cell/pathology , Diagnosis, Differential , Female , Humans , Kidney Neoplasms/pathology , Male , Middle Aged , Predictive Value of Tests , ROC Curve , Retrospective Studies
15.
Nat Protoc ; 14(4): 1206-1234, 2019 04.
Article En | MEDLINE | ID: mdl-30894694

Blood-based diagnostics tests, using individual or panels of biomarkers, may revolutionize disease diagnostics and enable minimally invasive therapy monitoring. However, selection of the most relevant biomarkers from liquid biosources remains an immense challenge. We recently presented the thromboSeq pipeline, which enables RNA sequencing and cancer classification via self-learning and swarm intelligence-enhanced bioinformatics algorithms using blood platelet RNA. Here, we provide the wet-lab protocol for the generation of platelet RNA-sequencing libraries and the dry-lab protocol for the development of swarm intelligence-enhanced machine-learning-based classification algorithms. The wet-lab protocol includes platelet RNA isolation, mRNA amplification, and preparation for next-generation sequencing. The dry-lab protocol describes the automated FASTQ file pre-processing to quantified gene counts, quality controls, data normalization and correction, and swarm intelligence-enhanced support vector machine (SVM) algorithm development. This protocol enables platelet RNA profiling from 500 pg of platelet RNA and allows automated and optimized biomarker panel selection. The wet-lab protocol can be performed in 5 d before sequencing, and the algorithm development can be completed in 2 d, depending on computational resources. The protocol requires basic molecular biology skills and a basic understanding of Linux and R. In all, with this protocol, we aim to enable the scientific community to test platelet RNA for diagnostic algorithm development.


Blood Platelets/metabolism , DNA, Complementary/analysis , RNA, Messenger/analysis , Sequence Analysis, RNA/methods , Support Vector Machine/statistics & numerical data , Biomarkers/blood , Blood Platelets/chemistry , Computational Biology/methods , DNA, Complementary/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , RNA Splicing , RNA, Messenger/genetics , Sequence Analysis, RNA/statistics & numerical data
16.
BMC Genomics ; 20(1): 167, 2019 Mar 04.
Article En | MEDLINE | ID: mdl-30832569

BACKGROUND: Deep learning has made tremendous successes in numerous artificial intelligence applications and is unsurprisingly penetrating into various biomedical domains. High-throughput omics data in the form of molecular profile matrices, such as transcriptomes and metabolomes, have long existed as a valuable resource for facilitating diagnosis of patient statuses/stages. It is timely imperative to compare deep learning neural networks against classical machine learning methods in the setting of matrix-formed omics data in terms of classification accuracy and robustness. RESULTS: Using 37 high throughput omics datasets, covering transcriptomes and metabolomes, we evaluated the classification power of deep learning compared to traditional machine learning methods. Representative deep learning methods, Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN), were deployed and explored in seeking optimal architectures for the best classification performance. Together with five classical supervised classification methods (Linear Discriminant Analysis, Multinomial Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machine), MLP and CNN were comparatively tested on the 37 datasets to predict disease stages or to discriminate diseased samples from normal samples. MLPs achieved the highest overall accuracy among all methods tested. More thorough analyses revealed that single hidden layer MLPs with ample hidden units outperformed deeper MLPs. Furthermore, MLP was one of the most robust methods against imbalanced class composition and inaccurate class labels. CONCLUSION: Our results concluded that shallow MLPs (of one or two hidden layers) with ample hidden neurons are sufficient to achieve superior and robust classification performance in exploiting numerical matrix-formed omics data for diagnosis purpose. Specific observations regarding optimal network width, class imbalance tolerance, and inaccurate labeling tolerance will inform future improvement of neural network applications on functional genomics data.


Deep Learning/trends , Gene Expression Profiling/statistics & numerical data , Machine Learning/trends , Neural Networks, Computer , Algorithms , Artificial Intelligence/statistics & numerical data , Bayes Theorem , Deep Learning/statistics & numerical data , Gene Expression Profiling/methods , Humans , Logistic Models , Machine Learning/statistics & numerical data , Metabolome/genetics , Support Vector Machine/statistics & numerical data , Support Vector Machine/trends
17.
PLoS One ; 13(1): e0188996, 2018.
Article En | MEDLINE | ID: mdl-29304512

Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.


Image Enhancement/methods , Machine Learning/statistics & numerical data , Fuzzy Logic , Models, Statistical , Remote Sensing Technology/statistics & numerical data , Support Vector Machine/statistics & numerical data
18.
Neural Netw ; 98: 114-121, 2018 Feb.
Article En | MEDLINE | ID: mdl-29227960

Support vector ordinal regression (SVOR) is a popular method for tackling ordinal regression problems. Solution path provides a compact representation of optimal solutions for all values of regularization parameter, which is extremely useful for model selection. However, due to the complicated formulation of SVOR (including multiple equalities and extra variables), there is still no solution path algorithm proposed for SVOR. In this paper, we propose a regularization path algorithm for SVOR which can track the two sets of variables of SVOR w.r.t. the regularization parameter. Technically, we use the QR decomposition to handle the singular matrices in the regularization path. Experiment results on a variety of datasets not only confirm the effectiveness of our regularization path algorithm, but also show the superiority of our regularization path algorithm on model selection.


Algorithms , Datasets as Topic/statistics & numerical data , Support Vector Machine/statistics & numerical data
19.
Cancer Biomark ; 21(2): 393-413, 2018 Feb 06.
Article En | MEDLINE | ID: mdl-29226857

Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture + morphological, and EFDs + morphological features give the highest accuracy of 99.71% and AUC of 1.00.


Machine Learning/statistics & numerical data , Prostatic Neoplasms/diagnosis , Support Vector Machine/statistics & numerical data , Bayes Theorem , Humans , Male , Prostatic Neoplasms/pathology
20.
Comput Inform Nurs ; 35(8): 408-416, 2017 Aug.
Article En | MEDLINE | ID: mdl-28800580

We constructed a model using a support vector machine to determine whether an inpatient will suffer a fall on a given day, depending on patient status on the previous day. Using fall report data from our own facility and intensity-of-nursing-care-needs data accumulated through hospital information systems, a dataset comprising approximately 1.2 million patient-days was created. Approximately 50% of the dataset was used as training and testing data. A multistep grid search was conducted using the semicomprehensive combination of three parameters. A discriminant model for the testing data was created for each parameter to identify which parameter had the highest score by calculating the sensitivity and specificity. The score of the model with the highest score had a sensitivity of 64.9% and a specificity of 69.6%. By adopting a method that relies on daily data recorded in the electronic medical record system and accurately predicts unknown data, we were able to overcome issues described in previous studies while simultaneously constructing a discriminant model for patients' fall risk that does not burden nurses and patients with information gathering.


Accidental Falls/prevention & control , Inpatients/classification , Support Vector Machine/statistics & numerical data , Electronic Health Records/statistics & numerical data , Female , Hospitals , Humans , Male , Nurse's Role , Risk Assessment
...