Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Int Arch Occup Environ Health ; 95(8): 1785-1796, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35551477

RESUMEN

PURPOSE: Exposures related to beryllium (Be) are an enduring concern among workers in the nuclear weapons and other high-tech industries, calling for regular and rigorous biological monitoring. Conventional biomonitoring of Be in urine is not informative of cumulative exposure nor health outcomes. Biomarkers of exposure to Be based on non-invasive biomonitoring could help refine disease risk assessment. In a cohort of workers with Be exposure, we employed blood plasma extracellular vesicles (EVs) to discover novel biomarkers of exposure to Be. METHODS: EVs were isolated from plasma using size-exclusion chromatography and subjected to mass spectrometry-based proteomics. A protein-based classifier was developed using LASSO regression and validated by ELISA. RESULTS: We discovered a dual biomarker signature comprising zymogen granule protein 16B and putative protein FAM10A4 that differentiated between Be-exposed and -unexposed subjects. ELISA-based quantification of the biomarkers in an independent cohort of samples confirmed higher expression of the signature in the Be-exposed group, displaying high predictive accuracy (AUROC = 0.919). Furthermore, the biomarkers efficiently discriminated high- and low-exposure groups (AUROC = 0.749). CONCLUSIONS: This is the first report of EV biomarkers associated with Be exposure and exposure levels. The biomarkers could be implemented in resource-limited settings for Be exposure assessment.


Asunto(s)
Berilio , Vesículas Extracelulares , Berilio/metabolismo , Biomarcadores , Vesículas Extracelulares/química , Vesículas Extracelulares/metabolismo , Humanos , Espectrometría de Masas , Proteómica/métodos
2.
Eur Spine J ; 2022 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-36565345

RESUMEN

PURPOSE: Chronic low back pain (cLBP) is a common health condition worldwide and a leading cause of disability with an estimated lifetime prevalence of 80-90% in industrialized countries. However, we have had limited success in treating cLBP likely due to its non-specific heterogeneous nature that goes beyond detectable anatomical changes. We propose that omics technologies as precision medicine tools are well suited to provide insight into its pathophysiology and provide diagnostic markers and therapeutic targets. Therefore, in this review, we explore the current state of omics technologies in the diagnosis and classification of cLBP. We identify factors that may serve as markers to differentiate between acute and chronic cases of low back pain (LBP). Finally, we also discuss some challenges that must be overcome to successfully apply precision medicine to the diagnosis and treatment of cLBP. METHODS: A literature search for the current applications of omics technologies to chronic low back pain was performed using the following search terms- "back pain," "low back pain," "proteomics," "transcriptomics", "epigenomics," "genomics," "omics." We reviewed molecular markers identified from 35 studies which hold promise in providing information regarding molecular insights into cLBP. RESULTS: GWAS studies have found evidence for the role of single nucleotide polymorphisms (SNPs) associated with pain pathways in individuals with cLBP. Epigenomic modifications in patients with cLBP have been found to be enriched among genes involved in immune signaling and inflammation. Transcriptomics profiles of patients with cLBP show multiple lines of evidence for the role of inflammation in cLBP. The glycomics profiles of patients with cLBP are similar to those of patients with inflammatory conditions. Proteomics and microbiomics show promise but have limited studies currently. CONCLUSION: Omics technologies have identified associations between inflammatory and pain pathways in the pathophysiology of cLBP. However, in order to integrate information across the range of studies, it is important for the field to identify and adopt standardized definitions of cLBP and control patients. Additionally, most papers have applied a single omics method to a sampling of cLBP patients which have yielded limited insight into the pathophysiology of cLBP. Therefore, we recommend a multi-omics approach applied to large global consortia for advancing subphenotyping and better management of cLBP, via improved identification of diagnostic markers and therapeutic targets.

3.
J Biomed Inform ; 107: 103455, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32497685

RESUMEN

Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial, resulting from the collective activity of several factors, each with a small effect. Bayesian rule learning (BRL) is a rule model inferred from learning Bayesian networks from data, and has been shown to be effective in modeling high-dimensional datasets. However, BRL is not efficient at modeling multifactorial diseases since it suffers from data fragmentation during learning. In this paper, we overcome this limitation by implementing and evaluating three types of ensemble model combination strategies with BRL- uniform combination (UC; same as Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)- collectively called Ensemble Bayesian Rule Learning (EBRL). We also introduce a novel method to visualize EBRL models, called the Bayesian Rule Ensemble Visualizing tool (BREVity), which helps extract interpret the most important rule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMC achieve better predictive performance than BMA and other classic machine learning methods. Furthermore, BMC is found to be more reliable than UC, when the ensemble includes sub-optimal models resulting from the stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemble methods for predictive performance, while also providing interpretable explanations for its predictions.


Asunto(s)
Aprendizaje Automático , Teorema de Bayes
4.
Neuroimage ; 178: 183-197, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29793060

RESUMEN

Deep neural networks are increasingly being used in both supervised learning for classification tasks and unsupervised learning to derive complex patterns from the input data. However, the successful implementation of deep neural networks using neuroimaging datasets requires adequate sample size for training and well-defined signal intensity based structural differentiation. There is a lack of effective automated diagnostic tools for the reliable detection of brain dysmaturation in the neonatal period, related to small sample size and complex undifferentiated brain structures, despite both translational research and clinical importance. Volumetric information alone is insufficient for diagnosis. In this study, we developed a computational framework for the automated classification of brain dysmaturation from neonatal MRI, by combining a specific deep neural network implementation with neonatal structural brain segmentation as a method for both clinical pattern recognition and data-driven inference into the underlying structural morphology. We implemented three-dimensional convolution neural networks (3D-CNNs) to specifically classify dysplastic cerebelli, a subset of surface-based subcortical brain dysmaturation, in term infants born with congenital heart disease. We obtained a 0.985 ±â€¯0. 0241-classification accuracy of subtle cerebellar dysplasia in CHD using 10-fold cross-validation. Furthermore, the hidden layer activations and class activation maps depicted regional vulnerability of the superior surface of the cerebellum, (composed of mostly the posterior lobe and the midline vermis), in regards to differentiating the dysplastic process from normal tissue. The posterior lobe and the midline vermis provide regional differentiation that is relevant to not only to the clinical diagnosis of cerebellar dysplasia, but also genetic mechanisms and neurodevelopmental outcome correlates. These findings not only contribute to the detection and classification of a subset of neonatal brain dysmaturation, but also provide insight to the pathogenesis of cerebellar dysplasia in CHD. In addition, this is one of the first examples of the application of deep learning to a neuroimaging dataset, in which the hidden layer activation revealed diagnostically and biologically relevant features about the clinical pathogenesis. The code developed for this project is open source, published under the BSD License, and designed to be generalizable to applications both within and beyond neonatal brain imaging.


Asunto(s)
Cerebelo/diagnóstico por imagen , Cerebelo/patología , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Redes Neurales de la Computación , Neuroimagen/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Enfermedades Cerebelosas/diagnóstico por imagen , Enfermedades Cerebelosas/etiología , Aprendizaje Profundo , Cardiopatías Congénitas/complicaciones , Humanos , Recién Nacido
5.
BMC Cancer ; 16: 184, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26944944

RESUMEN

BACKGROUND: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue. METHODS: Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis. RESULTS: All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method. CONCLUSIONS: The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.


Asunto(s)
Genómica/métodos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Teorema de Bayes , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/genética , Análisis por Conglomerados , Biología Computacional/métodos , Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Pronóstico , Reproducibilidad de los Resultados
6.
BMC Bioinformatics ; 16: 226, 2015 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-26202217

RESUMEN

BACKGROUND: Most 'transcriptomic' data from microarrays are generated from small sample sizes compared to the large number of measured biomarkers, making it very difficult to build accurate and generalizable disease state classification models. Integrating information from different, but related, 'transcriptomic' data may help build better classification models. However, most proposed methods for integrative analysis of 'transcriptomic' data cannot incorporate domain knowledge, which can improve model performance. To this end, we have developed a methodology that leverages transfer rule learning and functional modules, which we call TRL-FM, to capture and abstract domain knowledge in the form of classification rules to facilitate integrative modeling of multiple gene expression data. TRL-FM is an extension of the transfer rule learner (TRL) that we developed previously. The goal of this study was to test our hypothesis that "an integrative model obtained via the TRL-FM approach outperforms traditional models based on single gene expression data sources". RESULTS: To evaluate the feasibility of the TRL-FM framework, we compared the area under the ROC curve (AUC) of models developed with TRL-FM and other traditional methods, using 21 microarray datasets generated from three studies on brain cancer, prostate cancer, and lung disease, respectively. The results show that TRL-FM statistically significantly outperforms TRL as well as traditional models based on single source data. In addition, TRL-FM performed better than other integrative models driven by meta-analysis and cross-platform data merging. CONCLUSIONS: The capability of utilizing transferred abstract knowledge derived from source data using feature mapping enables the TRL-FM framework to mimic the human process of learning and adaptation when performing related tasks. The novel TRL-FM methodology for integrative modeling for multiple 'transcriptomic' datasets is able to intelligently incorporate domain knowledge that traditional methods might disregard, to boost predictive power and generalization performance. In this study, TRL-FM's abstraction of knowledge is achieved in the form of functional modules, but the overall framework is generalizable in that different approaches of acquiring abstract knowledge can be integrated into this framework.


Asunto(s)
Algoritmos , Modelos Genéticos , Biomarcadores/metabolismo , Bases de Datos Factuales , Expresión Génica , Humanos , Neoplasias/metabolismo , Neoplasias/patología
7.
Biomed Eng Online ; 14 Suppl 2: S7, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26329721

RESUMEN

BACKGROUND: Pediatric cardiomyopathies are a rare, yet heterogeneous group of pathologies of the myocardium that are routinely examined clinically using Cardiovascular Magnetic Resonance Imaging (cMRI). This gold standard powerful non-invasive tool yields high resolution temporal images that characterize myocardial tissue. The complexities associated with the annotation of images and extraction of markers, necessitate the development of efficient workflows to acquire, manage and transform this data into actionable knowledge for patient care to reduce mortality and morbidity. METHODS: We develop and test a novel informatics framework called cMRI-BED for biomarker extraction and discovery from such complex pediatric cMRI data that includes the use of a suite of tools for image processing, marker extraction and predictive modeling. We applied our workflow to obtain and analyze a dataset of 83 de-identified cases and controls containing cMRI-derived biomarkers for classifying positive versus negative findings of cardiomyopathy in children. Bayesian rule learning (BRL) methods were applied to derive understandable models in the form of propositional rules with posterior probabilities pertaining to their validity. Popular machine learning methods in the WEKA data mining toolkit were applied using default parameters to assess cross-validation performance of this dataset using accuracy and percentage area under ROC curve (AUC) measures. RESULTS: The best 10-fold cross validation predictive performance obtained on this cMRI-derived biomarker dataset was 80.72% accuracy and 79.6% AUC by a BRL decision tree model, which is promising from this type of rare data. Moreover, we were able to verify that mycocardial delayed enhancement (MDE) status, which is known to be an important qualitative factor in the classification of cardiomyopathies, is picked up by our rule models as an important variable for prediction. CONCLUSIONS: Preliminary results show the feasibility of our framework for processing such data while also yielding actionable predictive classification rules that can augment knowledge conveyed in cardiac radiology outcome reports. Interactions between MDE status and other cMRI parameters that are depicted in our rules warrant further investigation and validation. Predictive rules learned from cMRI data to classify positive and negative findings of cardiomyopathy can enhance scientific understanding of the underlying interactions among imaging-derived parameters.


Asunto(s)
Cardiomiopatías/diagnóstico , Cardiomiopatías/metabolismo , Imagen por Resonancia Magnética , Informática Médica/métodos , Miocardio/metabolismo , Adolescente , Teorema de Bayes , Biomarcadores/metabolismo , Cardiomiopatías/clasificación , Niño , Preescolar , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador , Lactante , Recién Nacido , Masculino , Curva ROC , Adulto Joven
8.
Cancer ; 120(24): 3902-13, 2014 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-25100294

RESUMEN

BACKGROUND: Esophageal adenocarcinoma (EAC) is associated with a dismal prognosis. The identification of cancer biomarkers can advance the possibility for early detection and better monitoring of tumor progression and/or response to therapy. The authors present results from the development of a serum-based, 4-protein (biglycan, myeloperoxidase, annexin-A6, and protein S100-A9) biomarker panel for EAC. METHODS: A vertically integrated, proteomics-based biomarker discovery approach was used to identify candidate serum biomarkers for the detection of EAC. Liquid chromatography-tandem mass spectrometry analysis was performed on formalin-fixed, paraffin-embedded tissue samples that were collected from across the Barrett esophagus (BE)-EAC disease spectrum. The mass spectrometry-based spectral count data were used to guide the selection of candidate serum biomarkers. Then, the serum enzyme-linked immunosorbent assay data were validated in an independent cohort and were used to develop a multiparametric risk-assessment model to predict the presence of disease. RESULTS: With a minimum threshold of 10 spectral counts, 351 proteins were identified as differentially abundant along the spectrum of Barrett esophagus, high-grade dysplasia, and EAC (P<.05). Eleven proteins from this data set were then tested using enzyme-linked immunosorbent assays in serum samples, of which 5 proteins were significantly elevated in abundance among patients who had EAC compared with normal controls, which mirrored trends across the disease spectrum present in the tissue data. By using serum data, a Bayesian rule-learning predictive model with 4 biomarkers was developed to accurately classify disease class; the cross-validation results for the merged data set yielded accuracy of 87% and an area under the receiver operating characteristic curve of 93%. CONCLUSIONS: Serum biomarkers hold significant promise for the early, noninvasive detection of EAC.


Asunto(s)
Adenocarcinoma/diagnóstico , Anexina A6/sangre , Biglicano/sangre , Biomarcadores de Tumor/sangre , Calgranulina B/sangre , Detección Precoz del Cáncer/métodos , Neoplasias Esofágicas/diagnóstico , Peroxidasa/sangre , Adenocarcinoma/sangre , Esófago de Barrett/sangre , Cromatografía Liquida , Neoplasias Esofágicas/sangre , Humanos , Modelos Biológicos , Espectrometría de Masas en Tándem
9.
BMC Bioinformatics ; 12: 309, 2011 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-21798039

RESUMEN

BACKGROUND: Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization. RESULTS: On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naïve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI. CONCLUSIONS: On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data.


Asunto(s)
Teorema de Bayes , Perfilación de la Expresión Génica/métodos , Proteómica/métodos , Algoritmos , Minería de Datos
10.
Bioinformatics ; 26(5): 668-75, 2010 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-20080512

RESUMEN

MOTIVATION: Disease state prediction from biomarker profiling studies is an important problem because more accurate classification models will potentially lead to the discovery of better, more discriminative markers. Data mining methods are routinely applied to such analyses of biomedical datasets generated from high-throughput 'omic' technologies applied to clinical samples from tissues or bodily fluids. Past work has demonstrated that rule models can be successfully applied to this problem, since they can produce understandable models that facilitate review of discriminative biomarkers by biomedical scientists. While many rule-based methods produce rules that make predictions under uncertainty, they typically do not quantify the uncertainty in the validity of the rule itself. This article describes an approach that uses a Bayesian score to evaluate rule models. RESULTS: We have combined the expressiveness of rules with the mathematical rigor of Bayesian networks (BNs) to develop and evaluate a Bayesian rule learning (BRL) system. This system utilizes a novel variant of the K2 algorithm for building BNs from the training data to provide probabilistic scores for IF-antecedent-THEN-consequent rules using heuristic best-first search. We then apply rule-based inference to evaluate the learned models during 10-fold cross-validation performed two times. The BRL system is evaluated on 24 published 'omic' datasets, and on average it performs on par or better than other readily available rule learning methods. Moreover, BRL produces models that contain on average 70% fewer variables, which means that the biomarker panels for disease prediction contain fewer markers for further verification and validation by bench scientists.


Asunto(s)
Teorema de Bayes , Minería de Datos/métodos , Proteómica/métodos , Biomarcadores/análisis , Proteoma/metabolismo
11.
J Biomed Inform ; 44 Suppl 1: S17-S23, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21571094

RESUMEN

We present a novel framework for integrative biomarker discovery from related but separate data sets created in biomarker profiling studies. The framework takes prior knowledge in the form of interpretable, modular rules, and uses them during the learning of rules on a new data set. The framework consists of two methods of transfer of knowledge from source to target data: transfer of whole rules and transfer of rule structures. We evaluated the methods on three pairs of data sets: one genomic and two proteomic. We used standard measures of classification performance and three novel measures of amount of transfer. Preliminary evaluation shows that whole-rule transfer improves classification performance over using the target data alone, especially when there is more source data than target data. It also improves performance over using the union of the data sets.


Asunto(s)
Biomarcadores , Perfilación de la Expresión Génica/métodos , Algoritmos , Inteligencia Artificial , Proteómica
12.
Muscle Nerve ; 42(1): 104-11, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20583124

RESUMEN

Recent studies using mass spectrometry have discovered candidate biomarkers for amyotrophic lateral sclerosis (ALS). However, those studies utilized small numbers of ALS and control subjects. Additional studies using larger subject cohorts are required to verify these candidate biomarkers. Cerebrospinal fluid (CSF) samples from 100 patients with ALS, 100 disease control, and 41 healthy control subjects were examined by mass spectrometry. Sixty-one mass spectral peaks exhibited altered levels between ALS and controls. Mass peaks for cystatin C and transthyretin were reduced in ALS, whereas mass peaks for posttranslational modified transthyretin and C-reactive protein (CRP) were increased. CRP levels were 5.84 +/- 1.01 ng/ml for controls and 11.24 +/- 1.52 ng/ml for ALS subjects, as determined by enzyme-linked immunoassay. This study verified prior mass spectrometry results for cystatin C and transthyretin in ALS. CRP levels were increased in the CSF of ALS patients, and cystatin C level correlated with survival in patients with limb-onset disease. Our biomarker panel predicted ALS with an overall accuracy of 82%.


Asunto(s)
Esclerosis Amiotrófica Lateral/diagnóstico , Esclerosis Amiotrófica Lateral/genética , Proteómica , Adulto , Esclerosis Amiotrófica Lateral/líquido cefalorraquídeo , Biomarcadores , Proteína C-Reactiva/líquido cefalorraquídeo , Cistatina C/sangre , Progresión de la Enfermedad , Ensayo de Inmunoadsorción Enzimática , Femenino , Humanos , Indicadores y Reactivos , Masculino , Espectrometría de Masas , Persona de Mediana Edad , Prealbúmina/análisis , Prealbúmina/metabolismo , Sobrevida
13.
BMC Bioinformatics ; 10 Suppl 9: S16, 2009 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-19761570

RESUMEN

BACKGROUND: The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB) that contains previously identified and validated proteomic biomarkers to select m/zs in a proteomic dataset prior to analysis to increase performance. RESULTS: We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection. CONCLUSION: Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteoma/análisis , Proteómica/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos
14.
PLoS One ; 14(8): e0220283, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31381589

RESUMEN

Finding optimal blood pressure (BP) target and BP treatment after acute ischemic or hemorrhagic strokes is an area of controversy and a significant unmet need in the critical care of stroke victims. Numerous large prospective clinical trials have been done to address this question but have generated neutral or conflicting results. One major limitation that may have contributed to so many neutral or conflicting clinical trial results is the "one-size fit all" approach to BP targets, while the optimal BP target likely varies between individuals. We address this problem with the Acute Intervention Model of Blood Pressure (AIM-BP) framework: an individualized, human interpretable model of BP and its control in the acute care setting. The framework consists of two components: one, a model of BP homeostasis and the various effects that perturb it; and two, a parameter estimator that can learn clinically important model parameters on a patient by patient basis. By estimating the parameters of the AIM-BP model for a given patient, the effectiveness of antihypertensive medication can be quantified separately from the patient's spontaneous BP trends. We hypothesize that the AIM-BP is a sufficient framework for estimating parameters of a homeostasis perturbation model of a stroke patient's BP time course and the AIM-BP parameter estimator can do so as accurately and consistently as a state-of-the-art maximum likelihood estimation method. We demonstrate that this is the case in a proof of concept of the AIM-BP framework, using simulated clinical scenarios modeled on stroke patients from real world intensive care datasets.


Asunto(s)
Presión Sanguínea , Cuidados Críticos/métodos , Hemorragias Intracraneales/complicaciones , Medicina de Precisión/métodos , Accidente Cerebrovascular/complicaciones , Accidente Cerebrovascular/fisiopatología , Anciano , Humanos , Modelos Lineales , Masculino , Accidente Cerebrovascular/terapia
15.
World J Clin Oncol ; 9(5): 98-109, 2018 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-30254965

RESUMEN

AIM: To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine. METHODS: Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRLp. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRLp to other state-of-the-art classifiers commonly used in biomedicine. RESULTS: We evaluated the degree of incorporation of prior knowledge into BRLp, with simulated data by measuring the Graph Edit Distance between the true data-generating model and the model learned by BRLp. We specified the true model using informative structure priors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRLp caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve (AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor (EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRLp model. This relevant background knowledge also led to a gain in AUC. CONCLUSION: BRLp enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.

16.
Data (Basel) ; 2(1)2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28243594

RESUMEN

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

17.
Data (Basel) ; 2(1)2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28331847

RESUMEN

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.

18.
J Comput Biol ; 13(2): 394-406, 2006 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-16597248

RESUMEN

Protein fold recognition is an important step towards understanding protein three-dimensional structures and their functions. A conditional graphical model, i.e., segmentation conditional random fields (SCRFs), is proposed as an effective solution to this problem. In contrast to traditional graphical models, such as the hidden Markov model (HMM), SCRFs follow a discriminative approach. Therefore, it is flexible to include any features in the model, such as overlapping or long-range interaction features over the whole sequence. The model also employs a convex optimization function, which results in globally optimal solutions to the model parameters. On the other hand, the segmentation setting in SCRFs makes their graphical structures intuitively similar to the protein 3-D structures and more importantly provides a framework to model the long-range interactions between secondary structures directly. Our model is applied to predict the parallel beta-helix fold, an important fold in bacterial pathogenesis and carbohydrate binding/cleavage. The cross-family validation shows that SCRFs not only can score all known beta-helices higher than non-beta-helices in the Protein Data Bank (PDB), but also accurately locates rungs in known beta-helix proteins. Our method outperforms BetaWrap, a state-of-the-art algorithm for predicting beta-helix folds, and HMMER, a general motif detection algorithm based on HMM, and has the additional advantage of general application to other protein folds. Applying our prediction model to the Uniprot Database, we identify previously unknown potential beta-helices.


Asunto(s)
Bases de Datos de Proteínas , Pliegue de Proteína , Proteínas/química , Proteínas/genética , Alineación de Secuencia/estadística & datos numéricos , Algoritmos , Animales , Pollos , Biología Computacional , Humanos , Ratones , Modelos Moleculares , Estructura Secundaria de Proteína , Ratas , Programas Informáticos
19.
Data (Basel) ; 1(3)2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28239609

RESUMEN

Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.

20.
Artículo en Inglés | MEDLINE | ID: mdl-26306226

RESUMEN

In this era of precision medicine, understanding the epigenetic differences in lung cancer subtypes could lead to personalized therapies by possibly reversing these alterations. Traditional methods for analyzing microarray data rely on the use of known pathways. We propose a novel workflow, called Junction trees to Knowledge (J2K) framework, for creating interpretable graphical representations that can be derived directly from in silico analysis of microarray data. Our workflow has three steps, preprocessing (discretization and feature selection), construction of a Bayesian network and, its subsequent transformation into a Junction tree. We used data from the Cancer Genome Atlas to perform preliminary analyses of this J2K framework. We found relevant cliques of methylated sites that are junctions of the network along with potential methylation biomarkers in the lung cancer pathogenesis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA