Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 9 de 9
Filtrer
Plus de filtres











Base de données
Gamme d'année
1.
Methods ; 231: 15-25, 2024 Aug 30.
Article de Anglais | MEDLINE | ID: mdl-39218170

RÉSUMÉ

Predicting drug-target interactions (DTI) is a crucial stage in drug discovery and development. Understanding the interaction between drugs and targets is essential for pinpointing the specific relationship between drug molecules and targets, akin to solving a link prediction problem using information technology. While knowledge graph (KG) and knowledge graph embedding (KGE) methods have been rapid advancements and demonstrated impressive performance in drug discovery, they often lack authenticity and accuracy in identifying DTI. This leads to increased misjudgment rates and reduced efficiency in drug development. To address these challenges, our focus lies in refining the accuracy of DTI prediction models through KGE, with a specific emphasis on causal intervention confidence measures (CI). These measures aim to assess triplet scores, enhancing the precision of the predictions. Comparative experiments conducted on three datasets and utilizing 9 KGE models reveal that our proposed confidence measure approach via causal intervention, significantly improves the accuracy of DTI link prediction compared to traditional approaches. Furthermore, our experimental analysis delves deeper into the embedding of intervention values, offering valuable insights for guiding the design and development of subsequent drug development experiments. As a result, our predicted outcomes serve as valuable guidance in the pursuit of more efficient drug development processes.

2.
Comput Struct Biotechnol J ; 23: 1234-1243, 2024 Dec.
Article de Anglais | MEDLINE | ID: mdl-38550971

RÉSUMÉ

Effective management of chronic diseases and cancer can greatly benefit from disease-specific biomarkers that enable informative screening and timely diagnosis. IgG N-glycans found in human plasma have the potential to be minimally invasive disease-specific biomarkers for all stages of disease development due to their plasticity in response to various genetic and environmental stimuli. Data analysis and machine learning (ML) approaches can assist in harnessing the potential of IgG glycomics towards biomarker discovery and the development of reliable predictive tools for disease screening. This study proposes an ML-based N-glycomic analysis framework that can be employed to build, optimise, and evaluate multiple ML pipelines to stratify patients based on disease risk in an interpretable manner. To design and test this framework, a published colorectal cancer (CRC) dataset from the Study of Colorectal Cancer in Scotland (SOCCS) cohort (1999-2006) was used. In particular, among the different pipelines tested, an XGBoost-based ML pipeline, which was tuned using multi-objective optimisation, calibrated using an inductive Venn-Abers predictor (IVAP), and evaluated via a nested cross-validation (NCV) scheme, achieved a mean area under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.771 when classifying between age-, and sex-matched healthy controls and CRC patients. This performance suggests the potential of using the relative abundance of IgG N-glycans to define populations at elevated CRC risk who merit investigation or surveillance. Finally, the IgG N-glycans that highly impact CRC classification decisions were identified using a global model-agnostic interpretability technique, namely Accumulated Local Effects (ALE). We envision that open-source computational frameworks, such as the one presented herein, will be useful in supporting the translation of glycan-based biomarkers into clinical applications.

3.
Comput Methods Programs Biomed ; 248: 108118, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38489935

RÉSUMÉ

BACKGROUND: Estimating the risk of a difficult tracheal intubation should help clinicians in better anaesthesia planning, to maximize patient safety. Routine bedside screenings suffer from low sensitivity. OBJECTIVE: To develop and evaluate machine learning (ML) and deep learning (DL) algorithms for the reliable prediction of intubation risk, using information about airway morphology. METHODS: Observational, prospective cohort study enrolling n=623 patients who underwent tracheal intubation: 53/623 difficult cases (prevalence 8.51%). First, we used our previously validated deep convolutional neural network (DCNN) to extract 2D image coordinates for 27 + 13 relevant anatomical landmarks in two preoperative photos (frontal and lateral views). Here we propose a method to determine the 3D pose of the camera with respect to the patient and to obtain the 3D world coordinates of these landmarks. Then we compute a novel set of dM=59 morphological features (distances, areas, angles and ratios), engineered with our anaesthesiologists to characterize each individual's airway anatomy towards prediction. Subsequently, here we propose four ad hoc ML pipelines for difficult intubation prognosis, each with four stages: feature scaling, imputation, resampling for imbalanced learning, and binary classification (Logistic Regression, Support Vector Machines, Random Forests and eXtreme Gradient Boosting). These compound ML pipelines were fed with the dM=59 morphological features, alongside dD=7 demographic variables. Here we trained them with automatic hyperparameter tuning (Bayesian search) and probability calibration (Platt scaling). In addition, we developed an ad hoc multi-input DCNN to estimate the intubation risk directly from each pair of photographs, i.e. without any intermediate morphological description. Performance was evaluated using optimal Bayesian decision theory. It was compared against experts' judgement and against state-of-the-art methods (three clinical formulae, four ML, four DL models). RESULTS: Our four ad hoc ML pipelines with engineered morphological features achieved similar discrimination capabilities: median AUCs between 0.746 and 0.766. They significantly outperformed both expert judgement and all state-of-the-art methods (highest AUC at 0.716). Conversely, our multi-input DCNN yielded low performance due to overfitting. This same behaviour occurred for the state-of-the-art DL algorithms. Overall, the best method was our XGB pipeline, with the fewest false negatives at the optimal Bayesian decision threshold. CONCLUSIONS: We proposed and validated ML models to assist clinicians in anaesthesia planning, providing a reliable calibrated estimate of airway intubation risk, which outperformed expert assessments and state-of-the-art methods. Our novel set of engineered features succeeded in providing informative descriptions for prognosis.


Sujet(s)
Intubation trachéale , Apprentissage machine , Humains , Théorème de Bayes , Études prospectives , Intubation trachéale/méthodes ,
4.
Article de Anglais | MEDLINE | ID: mdl-37982231

RÉSUMÉ

To enhance the accuracy of motor imagery(MI)EEG signal recognition, two methods, namely power spectral density and wavelet packet decomposition combined with a common spatial pattern, were employed to explore the feature information in depth in MI EEG signals. The extracted MI EEG signal features were subjected to series feature fusion, and the F-test method was used to select features with higher information content. Here regarding the accuracy of MI EEG signal classification, we further proposed the Platt Scaling probability calibration method was used to calibrate the results obtained from six basic classifiers, namely random forest (RF), support vector machines (SVM), Logistic Regression (LR), Gaussian naïve bayes (GNB), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). From these 12 classifiers, three to four with higher accuracy were selected for model fusion. The proposed method was validated on Datasets 2a of the 4th International BCI Competition, achieving an average accuracy of MI EEG data of nine subjects reached 91.46%, which indicates that model fusion was an effective method to improve classification accuracy, and provides some reference value for the research on MI brain-machine interface.

5.
BioData Min ; 14(1): 38, 2021 Aug 13.
Article de Anglais | MEDLINE | ID: mdl-34389029

RÉSUMÉ

BACKGROUND: Although many patients receive good prognoses with standard therapy, 30-50% of diffuse large B-cell lymphoma (DLBCL) cases may relapse after treatment. Statistical or computational intelligent models are powerful tools for assessing prognoses; however, many cannot generate accurate risk (probability) estimates. Thus, probability calibration-based versions of traditional machine learning algorithms are developed in this paper to predict the risk of relapse in patients with DLBCL. METHODS: Five machine learning algorithms were assessed, namely, naïve Bayes (NB), logistic regression (LR), random forest (RF), support vector machine (SVM) and feedforward neural network (FFNN), and three methods were used to develop probability calibration-based versions of each of the above algorithms, namely, Platt scaling (Platt), isotonic regression (IsoReg) and shape-restricted polynomial regression (RPR). Performance comparisons were based on the average results of the stratified hold-out test, which was repeated 500 times. We used the AUC to evaluate the discrimination ability (i.e., classification ability) of the model and assessed the model calibration (i.e., risk prediction accuracy) using the H-L goodness-of-fit test, ECE, MCE and BS. RESULTS: Sex, stage, IPI, KPS, GCB, CD10 and rituximab were significant factors predicting the 3-year recurrence rate of patients with DLBCL. For the 5 uncalibrated algorithms, the LR (ECE = 8.517, MCE = 20.100, BS = 0.188) and FFNN (ECE = 8.238, MCE = 20.150, BS = 0.184) models were well-calibrated. The errors of the initial risk estimate of the NB (ECE = 15.711, MCE = 34.350, BS = 0.212), RF (ECE = 12.740, MCE = 27.200, BS = 0.201) and SVM (ECE = 9.872, MCE = 23.800, BS = 0.194) models were large. With probability calibration, the biased NB, RF and SVM models were well-corrected. The calibration errors of the LR and FFNN models were not further improved regardless of the probability calibration method. Among the 3 calibration methods, RPR achieved the best calibration for both the RF and SVM models. The power of IsoReg was not obvious for the NB, RF or SVM models. CONCLUSIONS: Although these algorithms all have good classification ability, several cannot generate accurate risk estimates. Probability calibration is an effective method of improving the accuracy of these poorly calibrated algorithms. Our risk model of DLBCL demonstrates good discrimination and calibration ability and has the potential to help clinicians make optimal therapeutic decisions to achieve precision medicine.

6.
Article de Anglais | MEDLINE | ID: mdl-34299986

RÉSUMÉ

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle's longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.


Sujet(s)
Conduite automobile , Accidents de la route , Théorème de Bayes , Humains , Apprentissage machine , Machine à vecteur de support
7.
BMC Med Inform Decis Mak ; 21(1): 14, 2021 01 07.
Article de Anglais | MEDLINE | ID: mdl-33413321

RÉSUMÉ

BACKGROUND: Under the influences of chemotherapy regimens, clinical staging, immunologic expressions and other factors, the survival rates of patients with diffuse large B-cell lymphoma (DLBCL) are different. The accurate prediction of mortality hazards is key to precision medicine, which can help clinicians make optimal therapeutic decisions to extend the survival times of individual patients with DLBCL. Thus, we have developed a predictive model to predict the mortality hazard of DLBCL patients within 2 years of treatment. METHODS: We evaluated 406 patients with DLBCL and collected 17 variables from each patient. The predictive variables were selected by the Cox model, the logistic model and the random forest algorithm. Five classifiers were chosen as the base models for ensemble learning: the naïve Bayes, logistic regression, random forest, support vector machine and feedforward neural network models. We first calibrated the biased outputs from the five base models by using probability calibration methods (including shape-restricted polynomial regression, Platt scaling and isotonic regression). Then, we aggregated the outputs from the various base models to predict the 2-year mortality of DLBCL patients by using three strategies (stacking, simple averaging and weighted averaging). Finally, we assessed model performance over 300 hold-out tests. RESULTS: Gender, stage, IPI, KPS and rituximab were significant factors for predicting the deaths of DLBCL patients within 2 years of treatment. The stacking model that first calibrated the base model by shape-restricted polynomial regression performed best (AUC = 0.820, ECE = 8.983, MCE = 21.265) in all methods. In contrast, the performance of the stacking model without undergoing probability calibration is inferior (AUC = 0.806, ECE = 9.866, MCE = 24.850). In the simple averaging model and weighted averaging model, the prediction error of the ensemble model also decreased with probability calibration. CONCLUSIONS: Among all the methods compared, the proposed model has the lowest prediction error when predicting the 2-year mortality of DLBCL patients. These promising results may indicate that our modeling strategy of applying probability calibration to ensemble learning is successful.


Sujet(s)
Lymphome B diffus à grandes cellules , Théorème de Bayes , Calibrage , Humains , Modèles logistiques , Lymphome B diffus à grandes cellules/traitement médicamenteux , Pronostic
8.
PeerJ ; 8: e10501, 2020.
Article de Anglais | MEDLINE | ID: mdl-33354434

RÉSUMÉ

BACKGROUND: Low-coverage sequencing is a cost-effective way to obtain reads spanning an entire genome. However, read depth at each locus is low, making sequencing error difficult to separate from actual variation. Prior to variant calling, sequencer reads are aligned to a reference genome, with alignments stored in Sequence Alignment/Map (SAM) files. Each alignment has a mapping quality (MAPQ) score indicating the probability a read is incorrectly aligned. This study investigated the recalibration of probability estimates used to compute MAPQ scores for improving variant calling performance in single-sample, low-coverage settings. MATERIALS AND METHODS: Simulated tomato, hot pepper and rice genomes were implanted with known variants. From these, simulated paired-end reads were generated at low coverage and aligned to the original reference genomes. Features extracted from the SAM formatted alignment files for tomato were used to train machine learning models to detect incorrectly aligned reads and output estimates of the probability of misalignment for each read in all three data sets. MAPQ scores were then re-computed from these estimates. Next, the SAM files were updated with new MAPQ scores. Finally, Variant calling was performed on the original and recalibrated alignments and the results compared. RESULTS: Incorrectly aligned reads comprised only 0.16% of the reads in the training set. This severe class imbalance required special consideration for model training. The F1 score for detecting misaligned reads ranged from 0.76 to 0.82. The best performing model was used to compute new MAPQ scores. Single Nucleotide Polymorphism (SNP) detection was improved after mapping score recalibration. In rice, recall for called SNPs increased by 5.2%, while for tomato and pepper it increased by 3.1% and 1.5%, respectively. For all three data sets the precision of SNP calls ranged from 0.91 to 0.95, and was largely unchanged both before and after mapping score recalibration. CONCLUSION: Recalibrating MAPQ scores delivers modest improvements in single-sample variant calling results. Some variant callers operate on multiple samples simultaneously. They exploit every sample's reads to compensate for the low read-depth of individual samples. This improves polymorphism detection and genotype inference. It may be that small improvements in single-sample settings translate to larger gains in a multi-sample experiment. A study to investigate this is ongoing.

9.
Proc IEEE Int Symp Biomed Imaging ; 2020: 363-367, 2020 Apr.
Article de Anglais | MEDLINE | ID: mdl-35261721

RÉSUMÉ

In this work, we improve the performance of multi-atlas segmentation (MAS) by integrating the recently proposed VoteNet model with the joint label fusion (JLF) approach. Specifically, we first illustrate that using a deep convolutional neural network to predict atlas probabilities can better distinguish correct atlas labels from incorrect ones than relying on image intensity difference as is typical in JLF. Motivated by this finding, we propose VoteNet+, an improved deep network to locally predict the probability of an atlas label to differ from the label of the target image. Furthermore, we show that JLF is more suitable for the VoteNet framework as a label fusion method than plurality voting. Lastly, we use Platt scaling to calibrate the probabilities of our new model. Results on LPBA40 3D MR brain images show that our proposed method can achieve better performance than VoteNet.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE