Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 6 de 6
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
Biol Direct ; 16(1): 7, 2021 02 08.
Article de Anglais | MEDLINE | ID: mdl-33557857

RÉSUMÉ

Cancer is a poligenetic disease with each cancer type having a different mutation profile. Genomic data can be utilized to detect these profiles and to diagnose and differentiate cancer types. Variant calling provide mutation information. Gene expression data reveal the altered cell behaviour. The combination of the mutation and expression information can lead to accurate discrimination of different cancer types. In this study, we utilized and transferred the information of existing mutations for a novel gene selection method for gene expression data. We tested the proposed method in order to diagnose and differentiate cancer types. It is a disease specific method as both the mutations and expressions are filtered according to the selected cancer types. Our experiment results show that the proposed gene selection method leads to similar or improved performance metrics compared to classical feature selection methods and curated gene sets.


Sujet(s)
Analyse de profil d'expression de gènes/méthodes , Génomique/statistiques et données numériques , Apprentissage machine , Tumeurs/classification , Algorithmes , Tumeurs/génétique
2.
Med Biol Eng Comput ; 58(11): 2757-2773, 2020 Nov.
Article de Anglais | MEDLINE | ID: mdl-32910301

RÉSUMÉ

In recent years, there is an increasing interest in building e-health systems. The systems built to deliver the health services with the use of internet and communication technologies aim to reduce the costs arising from outpatient visits of patients. Some of the related recent studies propose machine learning-based telediagnosis and telemonitoring systems for Parkinson's disease (PD). Motivated from the studies showing the potential of speech disorders in PD telemonitoring systems, in this study, we aim to estimate the severity of PD from voice recordings of the patients using motor Unified Parkinson's Disease Rating Scale (UPDRS) as the evaluation metric. For this purpose, we apply various speech processing algorithms to the voice signals of the patients and then use these features as input to a two-stage estimation model. The first step is to apply a wrapper-based feature selection algorithm, called Boruta, and select the most informative speech features. The second step is to feed the selected set of features to a decision tree-based boosting algorithm, extreme gradient boosting, which has been recently applied successfully in many machine learning tasks due to its generalization ability and speed. The feature selection analysis showed that the vibration pattern of the vocal fold is an important indicator of PD severity. Besides, we also investigate the effectiveness of using age and years passed since diagnosis as covariates together with speech features. The lowest mean absolute error with 3.87 was obtained by combining these covariates and speech features with prediction level fusion. Graphical Abstract Framework for the proposed UPDRS estimation model.


Sujet(s)
Algorithmes , Diagnostic assisté par ordinateur , Maladie de Parkinson/diagnostic , Parole , Facteurs âges , Sujet âgé , Femelle , Humains , Apprentissage machine , Mâle , Adulte d'âge moyen , Auto-évaluation (psychologie) , Indice de gravité de la maladie , Traitement du signal assisté par ordinateur , Enregistrement sur bande , Télémédecine/méthodes
3.
BMC Bioinformatics ; 20(1): 324, 2019 Jun 13.
Article de Anglais | MEDLINE | ID: mdl-31195961

RÉSUMÉ

BACKGROUND: As DNA sequencing technologies are improving and getting cheaper, genomic data can be utilized for diagnosis of many diseases such as cancer. Human raw genome data is huge in size for computational systems. Therefore, there is a need for a compact and accurate representation of the valuable information in DNA. The occurrence of complex genetic disorders often results from multiple gene mutations. The effect of each mutation is not equal for the development of a disease. Inspired from the field of information retrieval, we propose using the term frequency (tf) and BM25 term weighting measures with the inverse document frequency (idf) and relevance frequency (rf) measures to weight genes based on their mutations. The underlying assumption is that the more mutations a gene has in patients with a certain disease and the less mutations it has in other patients, the more discriminative that gene is. RESULTS: We evaluated the proposed representations on the task of cancer type classification. We applied various machine learning techniques using the tf-idf and tf-rf schemes and their BM25 versions. Our results show that the BM25-tf-rf representation leads to improved classification accuracy and f-score values compared to the other representations. The highest accuracy (76.44%) and f-score (76.95%) are achieved with the BM25-tf-rf based data representation. CONCLUSIONS: As a result of our experiments, the BM25-tf-rf scheme and the proposed neural network model is shown to be the best performing classification system for our case study of cancer type classification. This system is further utilized for causal gene analysis. Examples from the most effective genes that are used for decision making are found to be in the literature as target or causal genes.


Sujet(s)
Génomique/méthodes , Modèles génétiques , Modèles statistiques , Mutation/génétique , Bases de données génétiques , Exons/génétique , Humains , Introns/génétique , Apprentissage machine , Tumeurs/génétique ,
4.
Int J Data Min Bioinform ; 10(2): 162-74, 2014.
Article de Anglais | MEDLINE | ID: mdl-25796736

RÉSUMÉ

Computational annotation and prediction of protein structure is very important in the post-genome era due to existence of many different proteins, most of which are yet to be verified. Mutual information based feature selection methods can be used in selecting such minimal yet predictive subsets of features. However, as protein features are organised into natural partitions, individual feature selection that ignores the presence of these views, dismantles them, and treats their variables intermixed along with those of others at best results in a complex un-interpretable predictive system for such multi-view datasets. In this paper, instead of selecting a subset of individual features, each feature subset is passed through a clustering step so that it is represented in discrete form using the cluster indices; this makes mutual information based methods applicable to view-selection. We present our experimental results on a multi-view protein dataset that are used to predict protein structure.


Sujet(s)
Algorithmes , Bases de données de protéines , Modèles chimiques , Protéines/composition chimique , Protéines/ultrastructure , Alignement de séquences/méthodes , Analyse de séquence de protéine/méthodes , Séquence d'acides aminés , Simulation numérique , Fouille de données/méthodes , Modèles moléculaires , Données de séquences moléculaires , Reconnaissance automatique des formes/méthodes , Conformation des protéines
5.
IEEE J Biomed Health Inform ; 17(4): 828-34, 2013 Jul.
Article de Anglais | MEDLINE | ID: mdl-25055311

RÉSUMÉ

There has been an increased interest in speech pattern analysis applications of Parkinsonism for building predictive telediagnosis and telemonitoring models. For this purpose, we have collected a wide variety of voice samples, including sustained vowels, words, and sentences compiled from a set of speaking exercises for people with Parkinson's disease. There are two main issues in learning from such a dataset that consists of multiple speech recordings per subject: 1) How predictive these various types, e.g., sustained vowels versus words, of voice samples are in Parkinson's disease (PD) diagnosis? 2) How well the central tendency and dispersion metrics serve as representatives of all sample recordings of a subject? In this paper, investigating our Parkinson dataset using well-known machine learning tools, as reported in the literature, sustained vowels are found to carry more PD-discriminative information. We have also found that rather than using each voice recording of each subject as an independent data sample, representing the samples of a subject with central tendency and dispersion metrics improves generalization of the predictive model.


Sujet(s)
Maladie de Parkinson/physiopathologie , Reconnaissance automatique des formes/méthodes , Spectrographie sonore/méthodes , Parole/physiologie , Voix/physiologie , Adulte , Sujet âgé , Bases de données factuelles , Femelle , Humains , Mâle , Adulte d'âge moyen , Machine à vecteur de support
6.
Biomed Eng Online ; 2: 5, 2003 Mar 04.
Article de Anglais | MEDLINE | ID: mdl-12685939

RÉSUMÉ

This study proposes an intelligent data analysis approach to investigate and interpret the distinctive factors of diabetes mellitus patients with and without ischemic (non-embolic type) stroke in a small population. The database consists of a total of 16 features collected from 44 diabetic patients. Features include age, gender, duration of diabetes, cholesterol, high density lipoprotein, triglyceride levels, neuropathy, nephropathy, retinopathy, peripheral vascular disease, myocardial infarction rate, glucose level, medication and blood pressure. Metric and non-metric features are distinguished. First, the mean and covariance of the data are estimated and the correlated components are observed. Second, major components are extracted by principal component analysis. Finally, as common examples of local and global classification approach, a k-nearest neighbor and a high-degree polynomial classifier such as multilayer perceptron are employed for classification with all the components and major components case. Macrovascular changes emerged as the principal distinctive factors of ischemic-stroke in diabetes mellitus. Microvascular changes were generally ineffective discriminators. Recommendations were made according to the rules of evidence-based medicine. Briefly, this case study, based on a small population, supports theories of stroke in diabetes mellitus patients and also concludes that the use of intelligent data analysis improves personalized preventive intervention.


Sujet(s)
Infarctus encéphalique/épidémiologie , Diabète/épidémiologie , Modèles statistiques , Encéphalopathie ischémique/épidémiologie , Comorbidité , Analyse statistique factorielle , Humains , Facteurs de risque
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...