Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Adv Exp Med Biol ; 820: 49-59, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25417015

RESUMO

Disordered proteins lack specific 3D structure in their native state and have been implicated with numerous cellular functions as well as with the induction of severe diseases, e.g., cardiovascular and neurodegenerative diseases as well as diabetes. Due to their conformational flexibility they are often found to interact with a multitude of protein molecules; this one-to-many interaction which is vital for their versatile functioning involves short consensus protein sequences, which are normally detected using slow and cumbersome experimental procedures. In this work we exploit information from disorder-oriented protein interaction networks focused specifically on humans, in order to assemble, by means of overrepresentation, a set of sequence patterns that mediate the functioning of disordered proteins; hence, we are able to identify how a single protein achieves such functional promiscuity. Next, we study the sequential characteristics of the extracted patterns, which exhibit a striking preference towards a very limited subset of amino acids; specifically, residues leucine, glutamic acid, and serine are particularly frequent among the extracted patterns, and we also observe a nontrivial propensity towards alanine and glycine. Furthermore, based on the extracted patterns we set off to infer potential functional implications in order to verify our findings and potentially further extrapolate our knowledge regarding the functioning of disordered proteins. We observe that the extracted patterns are primarily involved with regulation, binding and posttranslational modifications, which constitute the most prominent functions of disordered proteins.


Assuntos
Modelos Moleculares , Conformação Proteica , Mapas de Interação de Proteínas , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Humanos , Dados de Sequência Molecular , Ligação Proteica , Proteínas/metabolismo
2.
Biomed Rep ; 20(3): 45, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38357244

RESUMO

Allicin is a thiosulphate molecule produced in garlic (Allium sativum) and has a wide range of biological actions and pharmaceutical applications. Its precursor molecule is the non-proteinogenic amino acid alliin (S-allylcysteine sulphoxide). The alliin biosynthetic pathway in garlic involves a group of enzymes, members of which are the γ-glutamyl-transpeptidase isoenzymes, Allium sativum γ-glutamyl-transpeptidase AsGGT1, AsGGT2 and AsGGT3, which catalyze the removal of the γ-glutamyl group from γ-glutamyl-S-allyl-L-cysteine to produce S-allyl-L-cysteine. This removal is followed by an S-oxygenation, which leads to the biosynthesis of alliin. The aim of the present study is to annotate previously discovered genes of garlic γ-glutamyl-transpeptidases, as well as a fourth candidate gene (AsGGT4) that has yet not been described. The annotation includes identifying the loci of the genes in the garlic genome, revealing the overall structure and conserved regions of these genes, and elucidating the evolutionary history of these enzymes through their phylogenetic analysis. The genomic structure of γ-glutamyl-transpeptidase genes is conserved; each gene consists of seven exons, and these genes are located on different chromosomes. AsGGT3 and AsGGT4 enzymes contain a signal peptide. To that end, the AsGGT3 protein sequence was corrected; four indel events occurring in AsGGT3 coding regions suggested that at least in the garlic variety Ershuizao, AsGGT3 may be a pseudogene. Finally, the use of protein structure prediction tools allowed the visualization of the tertiary structure of the candidate peptide.

3.
Comput Struct Biotechnol J ; 23: 2152-2162, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38827234

RESUMO

Background and objective: Systemic autoinflammatory diseases (SAIDs) are characterized by widespread inflammation, but for most of them there is a lack of specific biomarkers for accurate diagnosis. Although a number of machine learning algorithms have been used to analyze SAID datasets, aiding in the discovery of novel biomarkers, there is a growing recognition of the importance of SAID timeseries clustering, as it can capture the temporal dynamics of gene expression patterns. Methodology: This paper proposes a novel clustering methodology to efficiently associate three-dimensional data. The algorithm utilizes competitive learning to create a self-organizing neural network and adjust neuron positions in time-dependent and high dimensional feature space in order to assign them as clustering centers. The quantitative evaluation of the clustering was based on well-known clustering indices. Furthermore, a differential expression analysis and classification pipeline was employed to assess the capability of the proposed methodology to extract more accurate pathway-specific genes from its clusters. For that, a comparative analysis was also conducted against a heuristic timeseries clustering method. Results: The proposed methodology achieved better overall clustering indices scores and classification metrics using genes derived from its clusters. Notable cases include a threefold increase in the Calinski-Harabasz clustering index, a twofold improvement in the Davies-Bouldin clustering index and a ∼60% increase in the classification specificity score. Conclusion: A novel clustering methodology was developed and applied on several gene expression timeseries datasets from systemic autoinflammatory diseases, and its ability to efficiently produce well separated clusters compared to existing heuristic methods was demonstrated.

4.
Eur Heart J Digit Health ; 5(5): 542-550, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39318697

RESUMO

Aims: Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests. Methods and results: The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results' generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to -0.79 for random forests (RFs), and from 0.76 to -0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81-0.89), while GB exhibited a 4.8% increase (0.83-0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study. Conclusion: Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38082601

RESUMO

An emerging area in data science that has lately gained attention is the virtual population (VP) and synthetic data generation. This field has the potential to significantly affect the healthcare industry by providing a means to augment clinical research databases that have a shortage of subjects. The current study provides a comparative analysis of five distinct approaches for creating virtual data populations from real patient data. The data set utilized for the current analyses involved clinical data collected among patients scheduled for elective coronary artery bypass graft surgery (CABG). To that end, the five computational techniques employed to augment the given dataset were: (i) Tabular Preset, (ii) Gaussian Copula Model (iii) Generative Adversarial Network based (GAN) Deep Learning data synthesizer (CTGAN), (iv) a variation of the CTGAN Model (Copula GAN), and (v) VAE-based Deep Learning data synthesizer (TVAE). The performance of these techniques was assessed against their effectiveness in producing high-quality virtual data. For this purpose, dataset correlation matrices, cosine similarity distance, density histograms, and kernel density estimation are employed to perform a comparative analysis of each attribute and the respective synthetic equivalent. Our findings demonstrate that Gaussian Copula Model prevails in creating virtual data with consistent distributions (Kolmogorov-Smirnov (KS) and Chi-Squared (CS) tests equal to 0.9 and 0.98, respectively) and correlation patterns (average cosine similarity equals to 0.95).Clinical Relevance- It has been shown that the use of a VP can increase the predictive performance of a ML model, i.e., above using a smaller non-augmented population.


Assuntos
Ponte de Artéria Coronária , Coração , Humanos , Doença Crônica , Confiabilidade dos Dados , Ciência de Dados
6.
Artigo em Inglês | MEDLINE | ID: mdl-38083327

RESUMO

A preliminary analysis was conducted on data acquired from RNA sequencing and SomaScan platforms, for the classification of patients with Inflammation of Unknown Origin. To this end, a multimodal data integration approach was designed, by combining the two platforms, in order to assess the potentiality of learning estimators, using the differentially expressed features from the independent profiling experiments of both platforms. The classification framing was the differentiation of Inflammation of Unknown Origin patients against a multitude of Systemic Autoinflammatory disease patients. Separate false discovery rate analyses were performed on each dataset to extract statistically significant features between the two designated sample groups. Genomic analysis managed higher overall classification metrics compared to proteomic analysis, averaging an ~19% increase overall metrics and classifiers, with a ~0.07% increase in standard error. The multimodal data integration approach achieved similar results to the individual platforms' analyses. More specifically, it managed the same classification accuracy, sensitivity, and specificity scores as the best individual analysis, with the simple Logistic Regression estimator.Clinical Relevance- This study highlights the advantage of exploiting RNA sequencing data to identify potential Inflammation of Unknown Origin disease specific biomarkers, even against other Systemic Autoinflammatory diseases. These findings are further emphasized given the non-apparent clinical discrepancy between Inflammation of Unknown Origin and other Systemic Autoinflammatory diseases.


Assuntos
Doenças Hereditárias Autoinflamatórias , Proteômica , Humanos , Proteômica/métodos , RNA-Seq , Genômica/métodos , Análise de Sequência de RNA/métodos , Síndrome
7.
Artigo em Inglês | MEDLINE | ID: mdl-36086666

RESUMO

A meta-analysis study was conducted to compare high-throughput technologies in the classification of Adult-Onset Still's Disease patients, using differentially expressed genes from independent profiling experiments. We exploited two publicly available datasets from the Gene Expression Omnibus and performed a separate differential expression analysis on each dataset to extract statistically important genes. We then mapped the genes of the two datasets and subsequently we employed well-established machine learning algorithms to evaluate the denoted genes as candidate biomarkers. Using next-generation sequencing data, we managed to achieve the maximum (100%) classification accuracy, sensitivity and specificity with the Gradient Boosting and the Random Forest classifiers, compared to the 83% of the DNA microarray data. Clinical Relevance- When biomarkers derived from one study are applied to the data of another, in many cases the results may diverge significantly. Here we establish that in cross-profiling meta-analysis approaches based on differential expression analysis, next-generation sequencing data provide more accurate results than microarray experiments in the classification of Adult-Onset Still's Disease patients.


Assuntos
Perfilação da Expressão Gênica , Doença de Still de Início Tardio , Biomarcadores , Perfilação da Expressão Gênica/métodos , Humanos , Aprendizado de Máquina , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Doença de Still de Início Tardio/diagnóstico , Doença de Still de Início Tardio/genética
8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 329-332, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36085667

RESUMO

Glucose prediction is used in diabetes self-management as it allows to take suitable actions for proper glycemic regulation of the patient. The aim of this work is the short-term personalized glucose prediction in patients with Type 1 diabetes mellitus (T1DM). In this scope, we compared two different models, an autoregressive moving average (ARMA) model and a long short-term memory (LSTM) model for different prediction horizons. The comparison of two models was performed using the evaluation metrics of root mean square error (RMSE) and mean absolute error (MAE). The models were trained and tested in 29 real patients. The results shown that the LSTM model had better performance than ARMA with RMSE 3.13, 6.41 and 8.81 mg/dL and MAE 1.98, 5.06 and 6.47 mg/dL for 5-, 15- and 30-minutes prediction horizon.


Assuntos
Diabetes Mellitus Tipo 1 , Benchmarking , Glicemia , Diabetes Mellitus Tipo 1/diagnóstico , Glucose , Comportamentos Relacionados com a Saúde , Humanos
9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 1020-1023, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36086001

RESUMO

Although several studies have utilized AI (artificial intelligence)-based solutions to enhance the decision making for mechanical ventilation, as well as, for mortality in COVID-19, the extraction of explainable predictors regarding heparin's effect in intensive care and mortality has been left unresolved. In the present study, we developed an explainable AI (XAI) workflow to shed light into predictors for admission in the intensive care unit (ICU), as well as, for mortality across those hospitalized COVID-19 patients who received heparin. AI empowered classifiers, such as, the hybrid Extreme gradient boosting (HXGBoost) with customized loss functions were trained on time-series curated clinical data to develop robust AI models. Shapley additive explanation analysis (SHAP) was conducted to determine the positive or negative impact of the predictors in the model's output. The HXGBoost predicted the risk for intensive care and mortality with 0.84 and 0.85 accuracy, respectively. SHAP analysis indicated that the low percentage of lymphocytes at day 7 along with increased FiO2 at days 1 and 5, low SatO2 at days 3 and 7 increase the probability for mortality and highlight the positive effect of heparin administration at the early days of hospitalization for reducing mortality.


Assuntos
COVID-19 , Respiração Artificial , Inteligência Artificial , Heparina/uso terapêutico , Mortalidade Hospitalar , Humanos
10.
Comput Biol Med ; 141: 105176, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35007991

RESUMO

The coronavirus disease 2019 (COVID-19) which is caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is consistently causing profound wounds in the global healthcare system due to its increased transmissibility. Currently, there is an urgent unmet need to identify the underlying dynamic associations among COVID-19 patients and distinguish patient subgroups with common clinical profiles towards the development of robust classifiers for ICU admission and mortality. To address this need, we propose a four step pipeline which: (i) enhances the quality of multiple timeseries clinical data through an automated data curation workflow, (ii) deploys Dynamic Bayesian Networks (DBNs) for the detection of features with increased connectivity based on dynamic association analysis across multiple points, (iii) utilizes Self Organizing Maps (SOMs) and trajectory analysis for the early identification of COVID-19 patients with common clinical profiles, and (iv) trains robust multiple additive regression trees (MART) for ICU admission and mortality classification based on the extracted homogeneous clusters, to identify risk factors and biomarkers for disease progression. The contribution of the extracted clusters and the dynamically associated clinical data improved the classification performance for ICU admission to sensitivity 0.83 and specificity 0.83, and for mortality to sensitivity 0.74 and specificity 0.76. Additional information was included to enhance the performance of the classifiers yielding an increase by 4% in sensitivity and specificity for mortality. According to the risk factor analysis, the number of lymphocytes, SatO2, PO2/FiO2, and O2 supply type were highlighted as risk factors for ICU admission and the percentage of neutrophils and lymphocytes, PO2/FiO2, LDH, and ALP for mortality, among others. To our knowledge, this is the first study that combines dynamic modeling with clustering analysis to identify homogeneous groups of COVID-19 patients towards the development of robust classifiers for ICU admission and mortality.


Assuntos
COVID-19 , Teorema de Bayes , Hospitalização , Humanos , Unidades de Terapia Intensiva , Estudos Retrospectivos , SARS-CoV-2
11.
BMC Bioinformatics ; 12: 142, 2011 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-21569261

RESUMO

BACKGROUND: In peptides and proteins, only a small percentile of peptide bonds adopts the cis configuration. Especially in the case of amide peptide bonds, the amount of cis conformations is quite limited thus hampering systematic studies, until recently. However, lately the emerging population of databases with more 3D structures of proteins has produced a considerable number of sequences containing non-proline cis formations (cis-nonPro). RESULTS: In our work, we extract regular expression-type patterns that are descriptive of regions surrounding the cis-nonPro formations. For this purpose, three types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, and iii) pattern discovery using a structural equivalency set. Afterwards, using each pattern as predicate, we search the Eukaryotic Linear Motif (ELM) resource to identify potential functional implications of regions with cis-nonPro peptide bonds. The patterns extracted from each type of pattern discovery are further employed, in order to formulate a pattern-based classifier, which is used to discriminate between cis-nonPro and trans-nonPro formations. CONCLUSIONS: In terms of functional implications, we observe a significant association of cis-nonPro peptide bonds towards ligand/binding functionalities. As for the pattern-based classification scheme, the highest results were obtained using the structural equivalency set, which yielded 70% accuracy, 77% sensitivity and 63% specificity.


Assuntos
Algoritmos , Prolina/química , Proteínas/química , Homologia Estrutural de Proteína , Amidas/química , Cristalografia por Raios X , Bases de Dados Factuais , Bases de Dados de Proteínas , Peptídeos/química , Prolina/análise , Conformação Proteica , Estrutura Terciária de Proteína
12.
Front Immunol ; 12: 700582, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34456913

RESUMO

Multiple sclerosis (MS) is one of the most common autoimmune diseases which is commonly diagnosed and monitored using magnetic resonance imaging (MRI) with a combination of clinical manifestations. The purpose of this review is to highlight the main applications of Machine Learning (ML) models and their performance in the MS field using MRI. We reviewed the articles of the last decade and grouped them based on the applications of ML in MS using MRI data into four categories: 1) Automated diagnosis of MS, 2) Prediction of MS disease progression, 3) Differentiation of MS stages, 4) Differentiation of MS from similar disorders.


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos , Esclerose Múltipla/diagnóstico por imagem , Neuroimagem/métodos , Humanos
13.
Comput Struct Biotechnol J ; 19: 5546-5555, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34712399

RESUMO

Artificial Intelligence (AI) has recently altered the landscape of cancer research and medical oncology using traditional Machine Learning (ML) algorithms and cutting-edge Deep Learning (DL) architectures. In this review article we focus on the ML aspect of AI applications in cancer research and present the most indicative studies with respect to the ML algorithms and data used. The PubMed and dblp databases were considered to obtain the most relevant research works of the last five years. Based on a comparison of the proposed studies and their research clinical outcomes concerning the medical ML application in cancer research, three main clinical scenarios were identified. We give an overview of the well-known DL and Reinforcement Learning (RL) methodologies, as well as their application in clinical practice, and we briefly discuss Systems Biology in cancer research. We also provide a thorough examination of the clinical scenarios with respect to disease diagnosis, patient classification and cancer prognosis and survival. The most relevant studies identified in the preceding year are presented along with their primary findings. Furthermore, we examine the effective implementation and the main points that need to be addressed in the direction of robustness, explainability and transparency of predictive models. Finally, we summarize the most recent advances in the field of AI/ML applications in cancer research and medical oncology, as well as some of the challenges and open issues that need to be addressed before data-driven models can be implemented in healthcare systems to assist physicians in their daily practice.

14.
Comput Struct Biotechnol J ; 19: 3058-3068, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34136104

RESUMO

Unlike autoimmune diseases, there is no known constitutive and disease-defining biomarker for systemic autoinflammatory diseases (SAIDs). Kawasaki disease (KD) is one of the "undiagnosed" types of SAIDs whose pathogenic mechanism and gene mutation still remain unknown. To address this issue, we have developed a sequential computational workflow which clusters KD patients with similar gene expression profiles across the three different KD phases (Acute, Subacute and Convalescent) and utilizes the resulting clustermap to detect prominent genes that can be used as diagnostic biomarkers for KD. Self-Organizing Maps (SOMs) were employed to cluster patients with similar gene expressions across the three phases through inter-phase and intra-phase clustering. Then, false discovery rate (FDR)-based feature selection was applied to detect genes that significantly deviate across the per-phase clusters. Our results revealed five genes as candidate biomarkers for KD diagnosis, namely, the HLA-DQB1, HLA-DRA, ZBTB48, TNFRSF13C, and CASD1. To our knowledge, these five genes are reported for the first time in the literature. The impact of the discovered genes for KD diagnosis against the known ones was demonstrated by training boosting ensembles (AdaBoost and XGBoost) for KD classification on common platform and cross-platform datasets. The classifiers which were trained on the proposed genes from the common platform data yielded an average increase by 4.40% in accuracy, 5.52% in sensitivity, and 3.57% in specificity than the known genes in the Acute and Subacute phases, followed by a notable increase by 2.30% in accuracy, 2.20% in sensitivity, and 4.70% in specificity in the cross-platform analysis.

15.
Diagnostics (Basel) ; 12(1)2021 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-35054223

RESUMO

BACKGROUND: Although several studies have been launched towards the prediction of risk factors for mortality and admission in the intensive care unit (ICU) in COVID-19, none of them focuses on the development of explainable AI models to define an ICU scoring index using dynamically associated biological markers. METHODS: We propose a multimodal approach which combines explainable AI models with dynamic modeling methods to shed light into the clinical features of COVID-19. Dynamic Bayesian networks were used to seek associations among cytokines across four time intervals after hospitalization. Explainable gradient boosting trees were trained to predict the risk for ICU admission and mortality towards the development of an ICU scoring index. RESULTS: Our results highlight LDH, IL-6, IL-8, Cr, number of monocytes, lymphocyte count, TNF as risk predictors for ICU admission and survival along with LDH, age, CRP, Cr, WBC, lymphocyte count for mortality in the ICU, with prediction accuracy 0.79 and 0.81, respectively. These risk factors were combined with dynamically associated biological markers to develop an ICU scoring index with accuracy 0.9. CONCLUSIONS: to our knowledge, this is the first multimodal and explainable AI model which quantifies the risk of intensive care with accuracy up to 0.9 across multiple timepoints.

16.
Comput Biol Med ; 116: 103577, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-32001012

RESUMO

Genomic profiling of cancer studies has generated comprehensive gene expression patterns for diverse phenotypes. Computational methods which employ transcriptomics datasets have been proposed to model gene expression data. Dynamic Bayesian Networks (DBNs) have been used for modeling time series datasets and for the inference of regulatory networks. Furthermore, cancer classification through DBN-based approaches could reveal the importance of exploiting knowledge from statistically significant genes and key regulatory molecules. Although microarray datasets have been employed extensively by several classification methods for decision making, the use of new knowledge from the pathway level has not been addressed adequately in the literature in terms of DBNs for cancer classification. In the present study, we identify the genes that act as regulators and mediate the activity of transcription factors that have been found in all promoters of our differentially expressed gene sets. These features serve as potential priors for distinguishing tumor from normal samples using a DBN-based classification approach. We employed three microarray datasets from the Gene Expression Omnibus (GEO) public functional repository and performed differential expression analysis. Promoter and pathway analysis of the identified genes revealed the key regulators which influence the transcription mechanisms of these genes. We applied the DBN algorithm on selected genes and identified the features that can accurately classify the samples into tumors and controls. Both accuracy and Area Under the Curve (AUC) were high for the gene sets comprising of the differentially expressed genes along with their master regulators (accuracy: 70.8%-98.5%; AUC: 0.562-0.985).


Assuntos
Redes Reguladoras de Genes , Neoplasias , Algoritmos , Teorema de Bayes , Biologia Computacional , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos
17.
IEEE Open J Eng Med Biol ; 1: 49-56, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35402956

RESUMO

Lymphoma development constitutes one of the most serious clinico-pathological manifestations of patients with Sjögren's Syndrome (SS). Over the last decades the risk for lymphomagenesis in SS patients has been studied aiming to identify novel biomarkers and risk factors predicting lymphoma development in this patient population. Objective: The current study aims to explore whether genetic susceptibility profiles of SS patients along with known clinical, serological and histological risk factors enhance the accuracy of predicting lymphoma development in this patient population. Methods: The potential predicting role of both genetic variants, clinical and laboratory risk factors were investigated through a Machine Learning-based (ML) framework which encapsulates ensemble classifiers. Results: Ensemble methods empower the classification accuracy with approaches which are sensitive to minor perturbations in the training phase. The evaluation of the proposed methodology based on a 10-fold stratified cross validation procedure yielded considerable results in terms of balanced accuracy (GB: 0.7780 ± 0.1514, RF Gini: 0.7626 ± 0.1787, RF Entropy: 0.7590 ± 0.1837). Conclusions: The initial clinical, serological, histological and genetic findings at an early diagnosis have been exploited in an attempt to establish predictive tools in clinical practice and further enhance our understanding towards lymphoma development in SS.

18.
BMC Bioinformatics ; 10: 113, 2009 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-19379512

RESUMO

BACKGROUND: Polypeptides are composed of amino acids covalently bonded via a peptide bond. The majority of peptide bonds in proteins is found to occur in the trans conformation. In spite of their infrequent occurrence, cis peptide bonds play a key role in the protein structure and function, as well as in many significant biological processes. RESULTS: We perform a systematic analysis of regions in protein sequences that contain a proline cis peptide bond in order to discover non-random associations between the primary sequence and the nature of proline cis/trans isomerization. For this purpose an efficient pattern discovery algorithm is employed which discovers regular expression-type patterns that are overrepresented (i.e. appear frequently repeated) in a set of sequences. Four types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, iii) pattern discovery using a structural equivalency set and iv) pattern discovery using certain amino acids' physicochemical properties. The extracted patterns are carefully validated using a specially implemented scoring function and a significance measure (i.e. log-probability estimate) indicative of their specificity. The score threshold for the first three types of pattern discovery is 0.90 while for the last type of pattern discovery 0.80. Regarding the significance measure, all patterns yielded values in the range [-9, -31] which ensure that the derived patterns are highly unlikely to have emerged by chance. Among the highest scoring patterns, most of them are consistent with previous investigations concerning the neighborhood of cis proline peptide bonds, and many new ones are identified. Finally, the extracted patterns are systematically compared against the PROSITE database, in order to gain insight into the functional implications of cis prolyl bonds. CONCLUSION: Cis patterns with matches in the PROSITE database fell mostly into two main functional clusters: family signatures and protein signatures. However considerable propensity was also observed for targeting signals, active and phosphorylation sites as well as domain signatures.


Assuntos
Peptídeos/química , Prolina/química , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Proteínas/química
19.
J Biomed Inform ; 42(1): 140-9, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18586558

RESUMO

In protein structures the peptide bond is found to be in trans conformation in the majority of the cases. Only a small fraction of peptide bonds in proteins is reported to be in cis conformation. Most of these instances (>90%) occur when the peptide bond is an imide (X-Pro) rather than an amide bond (X-nonPro). Due to the implication of cis/trans isomerization in many biologically significant processes, the accurate prediction of the peptide bond conformation is of high interest. In this study, we evaluate the effect of a wide range of features, towards the reliable prediction of both proline and non-proline cis/trans isomerization. We use evolutionary profiles, secondary structure information, real-valued solvent accessibility predictions for each amino acid and the physicochemical properties of the surrounding residues. We also explore the predictive impact of a modified feature vector, which consists of condensed position-specific scoring matrices (PSSMX), secondary structure and solvent accessibility. The best discriminating ability is achieved using the first feature vector combined with a wrapper feature selection algorithm and a support vector machine (SVM). The proposed method results in 70% accuracy, 75% sensitivity and 71% positive predictive value (PPV) in the prediction of the peptide bond conformation between any two amino acids. The output of the feature selection stage is investigated in order to identify discriminatory features as well as the contribution of each neighboring residue in the formation of the peptide bond, thus, advancing our knowledge towards cis/trans isomerization.


Assuntos
Inteligência Artificial , Fenômenos Químicos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Algoritmos , Peptidilprolil Isomerase , Prolina/química , Conformação Proteica , Estrutura Secundária de Proteína , Curva ROC , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estereoisomerismo
20.
Sensors (Basel) ; 9(2): 731-55, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-22399936

RESUMO

The objective of the current study was the development of a reliable modeling platform to calculate in real time the personal exposure and the associated health risk for filling station employees evaluating current environmental parameters (traffic, meteorological and amount of fuel traded) determined by the appropriate sensor network. A set of Artificial Neural Networks (ANNs) was developed to predict benzene exposure pattern for the filling station employees. Furthermore, a Physiology Based Pharmaco-Kinetic (PBPK) risk assessment model was developed in order to calculate the lifetime probability distribution of leukemia to the employees, fed by data obtained by the ANN model. Bayesian algorithm was involved in crucial points of both model sub compartments. The application was evaluated in two filling stations (one urban and one rural). Among several algorithms available for the development of the ANN exposure model, Bayesian regularization provided the best results and seemed to be a promising technique for prediction of the exposure pattern of that occupational population group. On assessing the estimated leukemia risk under the scope of providing a distribution curve based on the exposure levels and the different susceptibility of the population, the Bayesian algorithm was a prerequisite of the Monte Carlo approach, which is integrated in the PBPK-based risk model. In conclusion, the modeling system described herein is capable of exploiting the information collected by the environmental sensors in order to estimate in real time the personal exposure and the resulting health risk for employees of gasoline filling stations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA