Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Biomedicines ; 11(11)2023 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-38002015

RESUMEN

Endometriosis is defined as the presence of estrogen-dependent endometrial-like tissue outside the uterine cavity. Despite extensive research, endometriosis is still an enigmatic disease and is challenging to diagnose and treat. A common clinical finding is the association of endometriosis with multiple diseases. We use a total of 627,566 clinically collected data from cases of endometriosis (0.82%) and controls (99.18%) to construct and evaluate predictive models. We develop a machine learning platform to construct diagnostic tools for endometriosis. The platform consists of logistic regression, decision tree, random forest, AdaBoost, and XGBoost for prediction, and uses Shapley Additive Explanation (SHAP) values to quantify the importance of features. In the model selection phase, the constructed XGBoost model performs better than other algorithms while achieving an area under the curve (AUC) of 0.725 on the test set during the evaluation phase, resulting in a specificity of 62.9% and a sensitivity of 68.6%. The model leads to a quite low positive predictive value of 1.5%, but a quite satisfactory negative predictive value of 99.58%. Moreover, the feature importance analysis points to age, infertility, uterine fibroids, anxiety, and allergic rhinitis as the top five most important features for predicting endometriosis. Although these results show the feasibility of using machine learning to improve the diagnosis of endometriosis, more research is required to improve the performance of predictive models for the diagnosis of endometriosis. This state of affairs is in part attributed to the complex nature of the condition and, at the same time, the administrative nature of our features. Should more informative features be used, we could possibly achieve a higher AUC for predicting endometriosis. As a result, we merely perceive the constructed predictive model as a tool to provide auxiliary information in clinical practice.

2.
HIV AIDS (Auckl) ; 15: 387-397, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37426767

RESUMEN

Background: HIV is a growing public health burden that threatens thousands of people in Kazakhstan. Countries around the world, including Kazakhstan, are facing significant problems in predicting HIV infection prevalence. It is crucial to understand the epidemiological trends of infectious diseases and to monitor the prevalence of HIV in a long-term perspective. Thus, in this study, we aimed to forecast the prevalence of HIV in Kazakhstan for 10 years from 2020 to 2030 by using mathematical modeling and time series analysis. Methods: We use statistical Autoregressive Integrated Moving Average (ARIMA) models and a nonlinear epidemic Susceptible-Infected (SI) model to forecast the HIV infection prevalence rate in Kazakhstan. We estimated the parameters of the models using open data on the prevalence of HIV infection among women and men (aged 15-49 years) in Kazakhstan provided by the Kazakhstan Bureau of National Statistics. We also predict the effect of pre-exposure prophylaxis (PrEP) control measures on the prevalence rate. Results: The ARIMA (1,2,0) model suggests that the prevalence of HIV infection in Kazakhstan will increase from 0.29 in 2021 to 0.47 by 2030. On the other hand, the SI model suggests that this parameter will increase to 0.60 by 2030 based on the same data. Both models were statistically significant by Akaike Information Criterion corrected (AICc) score and by the goodness of fit. HIV prevention under the PrEP strategy on the SI model showed a significant effect on the reduction of the HIV prevalence rate. Conclusion: This study revealed that ARIMA (1,2,0) predicts a linear increasing trend, while SI forecasts a nonlinear increase with a higher prevalence of HIV. Therefore, it is recommended for healthcare providers and policymakers use this model to calculate the cost required for the regional allocation of healthcare resources. Moreover, this model can be used for planning effective healthcare treatments.

3.
Sci Rep ; 13(1): 8412, 2023 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-37225754

RESUMEN

Diabetes mellitus (DM) affects the quality of life and leads to disability, high morbidity, and premature mortality. DM is a risk factor for cardiovascular, neurological, and renal diseases, and places a major burden on healthcare systems globally. Predicting the one-year mortality of patients with DM can considerably help clinicians tailor treatments to patients at risk. In this study, we aimed to show the feasibility of predicting the one-year mortality of DM patients based on administrative health data. We use clinical data for 472,950 patients that were admitted to hospitals across Kazakhstan between mid-2014 to December 2019 and were diagnosed with DM. The data was divided into four yearly-specific cohorts (2016-, 2017-, 2018-, and 2019-cohorts) to predict mortality within a specific year based on clinical and demographic information collected up to the end of the preceding year. We then develop a comprehensive machine learning platform to construct a predictive model of one-year mortality for each year-specific cohort. In particular, the study implements and compares the performance of nine classification rules for predicting the one-year mortality of DM patients. The results show that gradient-boosting ensemble learning methods perform better than other algorithms across all year-specific cohorts while achieving an area under the curve (AUC) between 0.78 and 0.80 on independent test sets. The feature importance analysis conducted by calculating SHAP (SHapley Additive exPlanations) values shows that age, duration of diabetes, hypertension, and sex are the top four most important features for predicting one-year mortality. In conclusion, the results show that it is possible to use machine learning to build accurate predictive models of one-year mortality for DM patients based on administrative health data. In the future, integrating this information with laboratory data or patients' medical history could potentially boost the performance of the predictive models.


Asunto(s)
Diabetes Mellitus , Calidad de Vida , Humanos , Kazajstán/epidemiología , Diabetes Mellitus/epidemiología , Mortalidad Prematura , Aprendizaje Automático
4.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 319-324, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34891300

RESUMEN

Deep learning methods, and in particular Convolutional Neural Networks (CNNs), have shown breakthrough performance in a wide variety of classification applications, including electroencephalogram-based Brain Computer Interfaces (BCIs). Despite the advances in the field, BCIs are still far from the subject-independent decoding of brain activities, primarily due to substantial inter-subject variability. In this study, we examine the potential application of an ensemble CNN classifier to integrate the capabilities of CNN architectures and ensemble learning for decoding EEG signals collected in motor imagery experiments. The results prove the superiority of the proposed ensemble CNN in comparison with the average base CNN classifiers, with an improvement up to 9% in classification accuracy depending on the test subject. The results also show improvement with respect to the performance of a number of state-of-the-art methods that have been previously used for subject-independent classification in the same datasets used here (i.e., BCI Competition IV 2A and 2B datasets).


Asunto(s)
Interfaces Cerebro-Computador , Imaginación , Algoritmos , Electroencefalografía , Redes Neurales de la Computación
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 910-914, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34891438

RESUMEN

Common Spatial Pattern (CSP) is a popular feature extraction algorithm used for electroencephalogram (EEG) data classification in brain-computer interfaces. One of the critical operations used in CSP is taking the average of trial covariance matrices for each class. In this regard, the arithmetic mean, which minimizes the sum of squared Euclidean distances to the data points, is conventionally used; however, this operation ignores the Riemannian geometry in the manifold of covariance matrices. To alleviate this problem, Fréchet mean determined using different Riemannian distances have been used. In this paper, we are primarily concerned with the following question: Does using the Fréchet mean with Riemannian distances instead of arithmetic mean in averaging CSP covariance matrices improve the subject-independent classification of motor imagery (MI)? To answer this question we conduct a comparative study using the largest MI dataset to date, with 54 subjects and a total of 21,600 trials of left-and right-hand MI. The results indicate a general trend of having a statistically significant better performance when the Riemannian geometry is used.


Asunto(s)
Interfaces Cerebro-Computador , Algoritmos , Electroencefalografía , Mano , Humanos , Imágenes en Psicoterapia
6.
IEEE J Biomed Health Inform ; 23(5): 2009-2020, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-30668507

RESUMEN

Constructing accurate predictive models is at the heart of brain-computer interfaces (BCIs) because these models can ultimately translate brain activities into communication and control commands. The majority of the previous work in BCI use spatial, temporal, or spatiotemporal features of event-related potentials (ERPs). In this study, we examined the discriminatory effect of their spatiospectral features to capture the most relevant set of neural activities from electroencephalographic recordings that represent users' mental intent. In this regard, we model ERP waveforms using a sum of sinusoids with unknown amplitudes, frequencies, and phases. The effect of this signal modeling step is to represent high-dimensional ERP waveforms in a substantially lower dimensionality space, which includes their dominant power spectral contents. We found that the most discriminative frequencies for accurate decoding of visual attention modulated ERPs lie in a spectral range less than 6.4 Hz. This was empirically verified by treating dominant frequency contents of ERP waveforms as feature vectors in the state-of-the-art machine learning techniques used herein. The constructed predictive models achieved remarkable performance, which for some subjects was as high as 94% as measured by the area under curve. Using these spectral contents, we further studied the discriminatory effect of each channel and proposed an efficient strategy to choose subject-specific subsets of channels that generally led to classifiers with comparable performance.


Asunto(s)
Interfaces Cerebro-Computador , Electroencefalografía/métodos , Potenciales Evocados/fisiología , Adulto , Humanos , Aprendizaje Automático , Procesamiento de Señales Asistido por Computador , Adulto Joven
7.
BMC Bioinformatics ; 18(Suppl 4): 154, 2017 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-28361669

RESUMEN

BACKGROUND: Time-Frequency (TF) analysis has been extensively used for the analysis of non-stationary numeric signals in the past decade. At the same time, recent studies have statistically confirmed the non-stationarity of genomic non-numeric sequences and suggested the use of non-stationary analysis for these sequences. The conventional approach to analyze non-numeric genomic sequences using techniques specific to numerical data is to convert non-numerical data into numerical values in some way and then apply time or transform domain signal processing algorithms. Nevertheless, this approach raises questions regarding the relative magnitudes under numeric transforms, which can potentially lead to spurious patterns or misinterpretation of results. RESULTS: In this paper, using the notion of interpretive signal processing (ISP) and by redefining correlation functions for non-numeric sequences, a general class of TF transforms are extended and applied to non-numerical genomic sequences. The technique has been successfully evaluated on synthetic and real DNA sequences. CONCLUSION: The proposed framework is fairly generic and is believed to be useful for extracting quantitative and visual information regarding local and global periodicity, symmetry, (non-) stationarity and spectral color of genomic sequences. The notion of interpretive time-frequency analysis introduced in this work can be considered as the first step towards the development of a rigorous mathematical construct for genomic signal processing.


Asunto(s)
Algoritmos , Braquiuros/genética , Genoma , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Animales , Modelos Estadísticos , Factores de Tiempo
8.
BMC Syst Biol ; 11(Suppl 3): 19, 2017 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-28361705

RESUMEN

BACKGROUND: Alcoholism has a strong genetic component. Twin studies have demonstrated the heritability of a large proportion of phenotypic variance of alcoholism ranging from 50-80%. The search for genetic variants associated with this complex behavior has epitomized sequence-based studies for nearly a decade. The limited success of genome-wide association studies (GWAS), possibly precipitated by the polygenic nature of complex traits and behaviors, however, has demonstrated the need for novel, multivariate models capable of quantitatively capturing interactions between a host of genetic variants and their association with non-genetic factors. In this regard, capturing the network of SNP by SNP or SNP by environment interactions has recently gained much interest. RESULTS: Here, we assessed 3,776 individuals to construct a network capable of detecting and quantifying the interactions within and between plausible genetic and environmental factors of alcoholism. In this regard, we propose the use of first-order dependence tree of maximum weight as a potential statistical learning technique to delineate the pattern of dependencies underpinning such a complex trait. Using a predictive based analysis, we further rank the genes, demographic factors, biological pathways, and the interactions represented by our SNP [Formula: see text]SNP[Formula: see text]E network. The proposed framework is quite general and can be potentially applied to the study of other complex traits.


Asunto(s)
Alcoholismo/genética , Interacción Gen-Ambiente , Polimorfismo de Nucleótido Simple , Biología de Sistemas , Estudio de Asociación del Genoma Completo , Humanos , Grupos Raciales/genética
9.
Bioinformatics ; 32(22): 3461-3468, 2016 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-27485443

RESUMEN

MOTIVATION: The biomarker discovery process in high-throughput genomic profiles has presented the statistical learning community with a challenging problem, namely learning when the number of variables is comparable or exceeding the sample size. In these settings, many classical techniques including linear discriminant analysis (LDA) falter. Poor performance of LDA is attributed to the ill-conditioned nature of sample covariance matrix when the dimension and sample size are comparable. To alleviate this problem, regularized LDA (RLDA) has been classically proposed in which the sample covariance matrix is replaced by its ridge estimate. However, the performance of RLDA depends heavily on the regularization parameter used in the ridge estimate of sample covariance matrix. RESULTS: We propose a range-search technique for efficient estimation of the optimum regularization parameter. Using an extensive set of simulations based on synthetic and gene expression microarray data, we demonstrate the robustness of the proposed technique to Gaussianity, an assumption used in developing the core estimator. We compare the performance of the technique in terms of accuracy and efficiency with classical techniques for estimating the regularization parameter. In terms of accuracy, the results indicate that the proposed method vastly improves on similar techniques that use classical plug-in estimator. In that respect, it is better or comparable to cross-validation-based search strategies while, depending on the sample size and dimensionality, being tens to hundreds of times faster to compute. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/danik0411/optimum-rlda CONTACT: amin.zollanvari@nu.edu.kzSupplementary information: Supplementary materials are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biomarcadores , Genómica , Animales , Biometría , Análisis Discriminante , Humanos , Distribución Normal , Tamaño de la Muestra
10.
EURASIP J Bioinform Syst Biol ; 2016(1): 2, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26834782

RESUMEN

In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.

11.
Cancer Inform ; 14(Suppl 5): 109-21, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-27081307

RESUMEN

High-dimensional data generally refer to data in which the number of variables is larger than the sample size. Analyzing such datasets poses great challenges for classical statistical learning because the finite-sample performance of methods developed within classical statistical learning does not live up to classical asymptotic premises in which the sample size unboundedly grows for a fixed dimensionality of observations. Much work has been done in developing mathematical-statistical techniques for analyzing high-dimensional data. Despite remarkable progress in this field, many practitioners still utilize classical methods for analyzing such datasets. This state of affairs can be attributed, in part, to a lack of knowledge and, in part, to the ready-to-use computational and statistical software packages that are well developed for classical techniques. Moreover, many scientists working in a specific field of high-dimensional statistical learning are either not aware of other existing machineries in the field or are not willing to try them out. The primary goal in this work is to bring together various machineries of high-dimensional analysis, give an overview of the important results, and present the operating conditions upon which they are grounded. When appropriate, readers are referred to relevant review articles for more information on a specific subject.

12.
Bioinformatics ; 30(23): 3349-55, 2014 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-25123902

RESUMEN

MOTIVATION: It is commonly assumed in pattern recognition that cross-validation error estimation is 'almost unbiased' as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics. RESULTS: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an 'almost unbiased' theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cross-validation is used. AVAILABILITY AND IMPLEMENTATION: The source code in C++, along with the Supplementary Materials, is available at: http://gsp.tamu.edu/Publications/supplementary/zollanvari13/.


Asunto(s)
Sesgo de Selección , Humanos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedad de Parkinson/genética , Probabilidad , Transcriptoma
13.
Pattern Recognit ; 47(6): 2178-2192, 2014 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-24729636

RESUMEN

The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.

14.
Artículo en Inglés | MEDLINE | ID: mdl-24303313

RESUMEN

Here we describe a prediction-based framework to analyze omic data and generate models for both disease diagnosis and identification of cellular pathways which are significant in complex diseases. Our framework differs from previous analysis in its use of underlying biology (cellular pathways/gene-sets) to produce predictive feature-disease models. In our study of alcoholism, lung cancer, and schizophrenia, we demonstrate the framework's ability to robustly analyze omic data of multiple types and sources, identify significant features sets, and produce accurate predictive models.

15.
Sankhya Ser A ; 75(2)2013 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-24288447

RESUMEN

We provide a fundamental theorem that can be used in conjunction with Kolmogorov asymptotic conditions to derive the first moments of well-known estimators of the actual error rate in linear discriminant analysis of a multivariate Gaussian model under the assumption of a common known covariance matrix. The estimators studied in this paper are plug-in and smoothed resubstitution error estimators, both of which have not been studied before under Kolmogorov asymptotic conditions. As a result of this work, we present an optimal smoothing parameter that makes the smoothed resubstitution an unbiased estimator of the true error. For the sake of completeness, we further show how to utilize the presented fundamental theorem to achieve several previously reported results, namely the first moment of the resubstitution estimator and the actual error rate. We provide numerical examples to show the accuracy of the succeeding finite sample approximations in situations where the number of dimensions is comparable or even larger than the sample size.

16.
Pattern Recognit ; 46(11): 3017-3029, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24039299

RESUMEN

This paper provides exact analytical expressions for the first and second moments of the true error for linear discriminant analysis (LDA) when the data are univariate and taken from two stochastic Gaussian processes. The key point is that we assume a general setting in which the sample data from each class do not need to be identically distributed or independent within or between classes. We compare the true errors of designed classifiers under the typical i.i.d. model and when the data are correlated, providing exact expressions and demonstrating that, depending on the covariance structure, correlated data can result in classifiers with either greater error or less error than when training with uncorrelated data. The general theory is applied to autoregressive and moving-average models of the first order, and it is demonstrated using real genomic data.

17.
J Am Med Inform Assoc ; 20(e2): e281-7, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23907284

RESUMEN

OBJECTIVE: To develop methods for visual analysis of temporal phenotype data available through electronic health records (EHR). MATERIALS AND METHODS: 24 580 adults from the multiparameter intelligent monitoring in intensive care V.6 (MIMIC II) EHR database of critically ill patients were analyzed, with significant temporal associations visualized as a map of associations between hospital length of stay (LOS) and ICD-9-CM codes. An expanded phenotype, using ICD-9-CM, microbiology, and computerized physician order entry data, was defined for hospital-acquired Clostridium difficile (HA-CDI). LOS, estimated costs, 30-day post-discharge mortality, and antecedent medication provider order entry were evaluated for HA-CDI cases compared to randomly selected controls. RESULTS: Temporal phenome analysis revealed 191 significant codes (p value, adjusted for false discovery rate, ≤0.05). HA-CDI was identified in 414 cases, and was associated with longer median LOS, 20 versus 9 days, and adjusted HR 0.33 (95% CI 0.28 to 0.39). This prolongation carries an estimated annual incremental cost increase of US$1.2-2.0 billion in the USA alone. DISCUSSION: Comprehensive EHR data have made large-scale phenome-based analysis feasible. Time-dependent pathological disease states have dynamic phenomic evolution, which may be captured through visual analytical approaches. Although MIMIC II is a single institutional retrospective database, our approach should be portable to other EHR data sources, including prospective 'learning healthcare systems'. For example, interventions to prevent HA-CDI could be dynamically evaluated using the same techniques. CONCLUSIONS: The new visual analytical method described in this paper led directly to the identification of numerous hospital-acquired conditions, which could be further explored through an expanded phenotype definition.


Asunto(s)
Infección Hospitalaria/diagnóstico , Registros Electrónicos de Salud , Fenotipo , Adulto , Enfermedad Crítica , Humanos , Enfermedad Iatrogénica , Unidades de Cuidados Intensivos/organización & administración , Clasificación Internacional de Enfermedades , Tiempo
18.
Pattern Recognit ; 46(10): 2783-2797, 2013 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26279589

RESUMEN

Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/Publications/supplementary/shahrokh12a.

19.
Brief Bioinform ; 13(4): 430-45, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22833495

RESUMEN

Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). 'Data-driven' approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while 'design-driven' approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to -omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top-down and bottom-up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.


Asunto(s)
Minería de Datos/métodos , Biología de Sistemas , Bioingeniería/métodos , Biotecnología
20.
Artículo en Inglés | MEDLINE | ID: mdl-22779044

RESUMEN

The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts-rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...