Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
J Infect Dis ; 226(5): 766-777, 2022 09 13.
Artículo en Inglés | MEDLINE | ID: mdl-35267024

RESUMEN

BACKGROUND: Excessive complement activation has been implicated in the pathogenesis of coronavirus disease 2019 (COVID-19), but the mechanisms leading to this response remain unclear. METHODS: We measured plasma levels of key complement markers, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA and antibodies against SARS-CoV-2 and seasonal human common cold coronaviruses (CCCs) in hospitalized patients with COVID-19 of moderate (n = 18) and critical severity (n = 37) and in healthy controls (n = 10). RESULTS: We confirmed that complement activation is systemically increased in patients with COVID-19 and is associated with a worse disease outcome. We showed that plasma levels of C1q and circulating immune complexes were markedly increased in patients with severe COVID-19 and correlated with higher immunoglobulin (Ig) G titers, greater complement activation, and higher disease severity score. Additional analyses showed that the classical pathway was the main arm responsible for augmented complement activation in severe patients. In addition, we demonstrated that a rapid IgG response to SARS-CoV-2 and an anamnestic IgG response to the nucleoprotein of the CCCs were strongly correlated with circulating immune complex levels, complement activation, and disease severity. CONCLUSIONS: These findings indicate that early, nonneutralizing IgG responses may play a key role in complement overactivation in severe COVID-19. Our work underscores the urgent need to develop therapeutic strategies to modify complement overactivation in patients with COVID-19.


Asunto(s)
COVID-19 , Anticuerpos Antivirales , Proteínas de la Nucleocápside de Coronavirus , Humanos , Inmunoglobulina G , SARS-CoV-2
2.
Sensors (Basel) ; 22(9)2022 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-35591199

RESUMEN

Nutrient regulation in aquaponic environments has been a topic of research for many years. Most studies have focused on appropriate control of nutrients in an aquaponic set-up, but very little research has been conducted on commercial-scale applications. In our model, the input data were sourced on a weekly basis from three commercial aquaponic farms in Southeast Texas over the course of a year. Due to the limited number of data points, dimensionality reduction techniques such as pairwise correlation matrix were used to remove the highly correlated predictors. Feature selection techniques such as the XGBoost classifier and Recursive Feature Elimination with ExtraTreesClassifier were used to rank the features in order of their relative importance. Ammonium and calcium were found to be the top two nutrient predictors, and based on the months in which lettuce was cultivated, the median of these nutrient values from the historical dataset served as the optimal concentration to be maintained in the aquaponic solution to sustain healthy growth of tilapia fish and lettuce plants in a coupled set-up. To accomplish this, Vernier sensors were used to measure the nutrient values and actuator systems were built to dispense the appropriate nutrient into the ecosystem via a closed loop.


Asunto(s)
Ecosistema , Nutrientes , Animales , Peces , Lactuca , Aprendizaje Automático
3.
BMC Genomics ; 20(Suppl 6): 435, 2019 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-31189480

RESUMEN

BACKGROUND: Single-cell gene expression measurements offer opportunities in deriving mechanistic understanding of complex diseases, including cancer. However, due to the complex regulatory machinery of the cell, gene regulatory network (GRN) model inference based on such data still manifests significant uncertainty. RESULTS: The goal of this paper is to develop optimal classification of single-cell trajectories accounting for potential model uncertainty. Partially-observed Boolean dynamical systems (POBDS) are used for modeling gene regulatory networks observed through noisy gene-expression data. We derive the exact optimal Bayesian classifier (OBC) for binary classification of single-cell trajectories. The application of the OBC becomes impractical for large GRNs, due to computational and memory requirements. To address this, we introduce a particle-based single-cell classification method that is highly scalable for large GRNs with much lower complexity than the optimal solution. CONCLUSION: The performance of the proposed particle-based method is demonstrated through numerical experiments using a POBDS model of the well-known T-cell large granular lymphocyte (T-LGL) leukemia network with noisy time-series gene-expression data.


Asunto(s)
Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Redes Reguladoras de Genes , Leucemia Linfocítica Granular Grande/genética , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Modelos Genéticos , Incertidumbre
4.
BMC Bioinformatics ; 18(1): 519, 2017 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-29178844

RESUMEN

BACKGROUND: Gene regulatory networks govern the function of key cellular processes, such as control of the cell cycle, response to stress, DNA repair mechanisms, and more. Boolean networks have been used successfully in modeling gene regulatory networks. In the Boolean network model, the transcriptional state of each gene is represented by 0 (inactive) or 1 (active), and the relationship among genes is represented by logical gates updated at discrete time points. However, the Boolean gene states are never observed directly, but only indirectly and incompletely through noisy measurements based on expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays. The Partially-Observed Boolean Dynamical System (POBDS) signal model is distinct from other deterministic and stochastic Boolean network models in removing the requirement of a directly observable Boolean state vector and allowing uncertainty in the measurement process, addressing the scenario encountered in practice in transcriptomic analysis. RESULTS: BoolFilter is an R package that implements the POBDS model and associated algorithms for state and parameter estimation. It allows the user to estimate the Boolean states, network topology, and measurement parameters from time series of transcriptomic data using exact and approximated (particle) filters, as well as simulate the transcriptomic data for a given Boolean network model. Some of its infrastructure, such as the network interface, is the same as in the previously published R package for Boolean Networks BoolNet, which enhances compatibility and user accessibility to the new package. CONCLUSIONS: We introduce the R package BoolFilter for Partially-Observed Boolean Dynamical Systems (POBDS). The BoolFilter package provides a useful toolbox for the bioinformatics community, with state-of-the-art algorithms for simulation of time series transcriptomic data as well as the inverse process of system identification from data obtained with various expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays.


Asunto(s)
Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Modelos Biológicos , Interfaz Usuario-Computador
5.
Bioinformatics ; 30(23): 3349-55, 2014 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-25123902

RESUMEN

MOTIVATION: It is commonly assumed in pattern recognition that cross-validation error estimation is 'almost unbiased' as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics. RESULTS: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an 'almost unbiased' theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cross-validation is used. AVAILABILITY AND IMPLEMENTATION: The source code in C++, along with the Supplementary Materials, is available at: http://gsp.tamu.edu/Publications/supplementary/zollanvari13/.


Asunto(s)
Sesgo de Selección , Humanos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedad de Parkinson/genética , Probabilidad , Transcriptoma
6.
Bioinformatics ; 28(4): 564-72, 2012 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-22155863

RESUMEN

MOTIVATION: Peptide detection is a crucial step in mass spectrometry (MS) based proteomics. Most existing algorithms are based upon greedy isotope template matching and thus may be prone to error propagation and ineffective to detect overlapping peptides. In addition, existing algorithms usually work at different charge states separately, isolating useful information that can be drawn from other charge states, which may lead to poor detection of low abundance peptides. RESULTS: BPDA2d models spectra as a mixture of candidate peptide signals and systematically evaluates all possible combinations of possible peptide candidates to interpret the given spectra. For each candidate, BPDA2d takes into account its elution profile, charge state distribution and isotope pattern, and it combines all evidence to infer the candidate's signal and existence probability. By piecing all evidence together--especially by deriving information across charge states--low abundance peptides can be better identified and peptide detection rates can be improved. Instead of local template matching, BPDA2d performs global optimization for all candidates and systematically optimizes their signals. Since BPDA2d looks for the optimal among all possible interpretations of the given spectra, it has the capability in handling complex spectra where features overlap. BPDA2d estimates the posterior existence probability of detected peptides, which can be directly used for probability-based evaluation in subsequent processing steps. Our experiments indicate that BPDA2d outperforms state-of-the-art detection methods on both simulated data and real liquid chromatography-mass spectrometry data, according to sensitivity and detection accuracy. AVAILABILITY: The BPDA2d software package is available at http://gsp.tamu.edu/Publications/supplementary/sun11a/.


Asunto(s)
Algoritmos , Péptidos/análisis , Proteómica/métodos , Programas Informáticos , Teorema de Bayes , Cromatografía Liquida , Espectrometría de Masas , Probabilidad
7.
Metabolism ; 141: 155399, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36642114

RESUMEN

BACKGROUND: Production rates of the short-chain fatty acids (SCFA) acetate, propionate, and butyrate, which are beneficial metabolites of the intestinal microbiota, are difficult to measure in humans due to inaccessibility of the intestine to perform measurements, and the high first-pass metabolism of SCFAs in colonocytes and liver. We developed a stable tracer pulse approach to estimate SCFA whole-body production (WBP) in the accessible pool representing the systemic circulation and interstitial fluid. Compartmental modeling of plasma enrichment data allowed us to additionally calculate SCFA kinetics and pool sizes in the inaccessible pool likely representing the intestine with microbiota. We also studied the effects of aging and the presence of Chronic Obstructive Pulmonary Disease (COPD) on SCFA kinetics. METHODS: In this observational study, we designed a two-compartmental model to determine SCFA kinetics in 31 young (20-29 y) and 71 older (55-87 y) adults, as well as in 33 clinically stable patients with moderate to very severe COPD (mean (SD) FEV1, 46.5 (16.2)% of predicted). Participants received in the fasted state a pulse containing stable tracers of acetate, propionate, and butyrate intravenously and blood was sampled four times over a 30 min period. We measured tracer-tracee ratios by GC-MS and used parameters obtained from two-exponential curve fitting to calculate non-compartmental SCFA WBP and perform compartmental analysis. Statistics were done by ANCOVA. RESULTS: Acetate, propionate, and butyrate WBP and fluxes between the accessible and inaccessible pools were lower in older than young adults (all q < 0.0001). Moreover, older participants had lower acetate (q < 0.0001) and propionate (q = 0.019) production rates in the inaccessible pool as well as smaller sizes of the accessible and inaccessible acetate pools (both q < 0.0001) than young participants. WBP, compartmental SCFA kinetics, and pool sizes did not differ between COPD patients and older adults (all q > 0.05). Overall and independent of the group studied, calculated production rates in the inaccessible pool were on average 7 (acetate), 11 (propionate), and 16 (butyrate) times higher than non-compartmental WBP, and sizes of inaccessible pools were 24 (acetate), 31 (propionate), and 55 (butyrate) times higher than sizes of accessible pools (all p < 0.0001). CONCLUSION: Non-compartmental production measurements of SCFAs in the accessible pool (i.e. systemic circulation) substantially underestimate the SCFA production in the inaccessible pool, which likely represents the intestine with microbiota, as assessed by compartmental analysis.


Asunto(s)
Ácidos Grasos Volátiles , Propionatos , Adulto Joven , Humanos , Anciano , Acetatos/metabolismo , Butiratos , Envejecimiento
8.
BMC Genomics ; 13 Suppl 6: S2, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23134670

RESUMEN

MOTIVATION: Mass spectrometry is a complex technique used for large-scale protein profiling with clinical and pharmaceutical applications. While individual components in the system have been studied extensively, little work has been done to integrate various modules and evaluate them from a systems point of view. RESULTS: In this work, we investigate this problem by putting together the different modules in a typical proteomics work flow, in order to capture and analyze key factors that impact the number of identified peptides and quantified proteins, protein quantification error, differential expression results, and classification performance. The proposed proteomics pipeline model can be used to optimize the work flow as well as to pinpoint critical bottlenecks worth investing time and resources into for improving performance. Using the model-based approach proposed here, one can study systematically the critical problem of proteomic biomarker discovery, by means of simulation using ground-truthed synthetic MS data.


Asunto(s)
Modelos Teóricos , Proteómica , Algoritmos , Cromatografía Líquida de Alta Presión , Espectrometría de Masas , Péptidos/análisis
9.
Bioinformatics ; 27(21): 3056-64, 2011 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-21914630

RESUMEN

MOTIVATION: In small-sample settings, bolstered error estimation has been shown to perform better than cross-validation and competitively with bootstrap with regard to various criteria. The key issue for bolstering performance is the variance setting for the bolstering kernel. Heretofore, this variance has been determined in a non-parametric manner from the data. Although bolstering based on this variance setting works well for small feature sets, results can deteriorate for high-dimensional feature spaces. RESULTS: This article computes an optimal kernel variance depending on the classification rule, sample size, model and feature space, both the original number and the number remaining after feature selection. A key point is that the optimal variance is robust relative to the model. This allows us to develop a method for selecting a suitable variance to use in real-world applications where the model is not known, but the other factors in determining the optimal kernel are known. AVAILABILITY: Companion website at http://compbio.tgen.org/paper_supp/high_dim_bolstering. CONTACT: edward@mail.ece.tamu.edu.


Asunto(s)
Perfilación de la Expresión Génica , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Interpretación Estadística de Datos , Femenino , Humanos , Mieloma Múltiple/genética , Mieloma Múltiple/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Tamaño de la Muestra
10.
PLoS One ; 17(8): e0269401, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35972941

RESUMEN

With the recent advances in the field of alternate agriculture, there has been an ever-growing demand for aquaponics as a potential substitute for traditional agricultural techniques for improving sustainable food production. However, the lack of data-driven methods and approaches for aquaponic cultivation remains a challenge. The objective of this research is to investigate statistical methods to make inferences using small datasets for nutrient control in aquaponics to optimize yield. In this work, we employed the Density-Based Synthetic Minority Over-sampling TEchnique (DB-SMOTE) to address dataset imbalance, and ExtraTreesClassifer and Recursive Feature Elimination (RFE) to choose the relevant features. Synthetic data generation techniques such as the Monte-Carlo (MC) sampling techniques were used to generate enough data points and different feature engineering techniques were used on the predictors before evaluating the performance of kernel-based classifiers with the goal of controlling nutrients in the aquaponic solution for optimal growth.[27-35].


Asunto(s)
Aprendizaje Automático , Nutrientes , Agricultura
11.
BMC Bioinformatics ; 12 Suppl 10: S5, 2011 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-22165852

RESUMEN

BACKGROUND: RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. RESULTS: Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. CONCLUSION: The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.


Asunto(s)
Pollos/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Pulmón/metabolismo , Análisis de Secuencia de ARN , Animales , Biblioteca de Genes , Anotación de Secuencia Molecular , ARN Mensajero/genética
12.
Math Biosci Eng ; 18(6): 7685-7710, 2021 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-34814270

RESUMEN

Mathematical models are widely recognized as an important tool for analyzing and understanding the dynamics of infectious disease outbreaks, predict their future trends, and evaluate public health intervention measures for disease control and elimination. We propose a novel stochastic metapopulation state-space model for COVID-19 transmission, which is based on a discrete-time spatio-temporal susceptible, exposed, infected, recovered, and deceased (SEIRD) model. The proposed framework allows the hidden SEIRD states and unknown transmission parameters to be estimated from noisy, incomplete time series of reported epidemiological data, by application of unscented Kalman filtering (UKF), maximum-likelihood adaptive filtering, and metaheuristic optimization. Experiments using both synthetic data and real data from the Fall 2020 COVID-19 wave in the state of Texas demonstrate the effectiveness of the proposed model.


Asunto(s)
COVID-19 , Humanos , Modelos Teóricos , SARS-CoV-2
13.
BMC Bioinformatics ; 11: 490, 2010 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-20920238

RESUMEN

BACKGROUND: Mass spectrometry (MS) is an essential analytical tool in proteomics. Many existing algorithms for peptide detection are based on isotope template matching and usually work at different charge states separately, making them ineffective to detect overlapping peptides and low abundance peptides. RESULTS: We present BPDA, a Bayesian approach for peptide detection in data produced by MS instruments with high enough resolution to baseline-resolve isotopic peaks, such as MALDI-TOF and LC-MS. We model the spectra as a mixture of candidate peptide signals, and the model is parameterized by MS physical properties. BPDA is based on a rigorous statistical framework and avoids problems, such as voting and ad-hoc thresholding, generally encountered in algorithms based on template matching. It systematically evaluates all possible combinations of possible peptide candidates to interpret a given spectrum, and iteratively finds the best fitting peptide signal in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. In contrast to previous detection methods, BPDA performs deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals and produces higher sensitivities and more robust results. Unlike template-matching algorithms, BPDA can handle complex data where features overlap. Our experimental results indicate that BPDA performs well on simulated data and real MS data sets, for various resolutions and signal to noise ratios, and compares very favorably with commonly used commercial and open-source software, such as flexAnalysis, OpenMS, and Decon2LS, according to sensitivity and detection accuracy. CONCLUSION: Unlike previous detection methods, which only employ isotopic distributions and work at each single charge state alone, BPDA takes into account the charge state distribution as well, thus lending information to better identify weak peptide signals and produce more robust results. The proposed approach is based on a rigorous statistical framework, which avoids problems generally encountered in algorithms based on template matching. Our experiments indicate that BPDA performs well on both simulated data and real data, and compares very favorably with commonly used commercial and open-source software. The BPDA software can be downloaded from http://gsp.tamu.edu/Publications/supplementary/sun10a/bpda.


Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Péptidos/análisis , Péptidos/química , Programas Informáticos , Bases de Datos de Proteínas , Proteoma/análisis
14.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1105-1114, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30418915

RESUMEN

We propose a novel methodology for fault detection and diagnosis in partially-observed Boolean dynamical systems (POBDS). These are stochastic, highly nonlinear, and derivativeless systems, rendering difficult the application of classical fault detection and diagnosis methods. The methodology comprises two main approaches. The first addresses the case when the normal mode of operation is known but not the fault modes. It applies an innovations filter (IF) to detect deviations from the nominal normal mode of operation. The second approach is applicable when the set of possible fault models is finite and known, in which case we employ a multiple model adaptive estimation (MMAE) approach based on a likelihood-ratio (LR) statistic. Unknown system parameters are estimated by an adaptive expectation-maximization (EM) algorithm. Particle filtering techniques are used to reduce the computational complexity in the case of systems with large state-spaces. The efficacy of the proposed methodology is demonstrated by numerical experiments with a large gene regulatory network (GRN) with stuck-at faults observed through a single noisy time series of RNA-seq gene expression measurements.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Modelos Genéticos , Algoritmos , RNA-Seq , Saccharomycetales/genética , Procesos Estocásticos
15.
BMC Bioinformatics ; 10 Suppl 11: S10, 2009 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-19811675

RESUMEN

BACKGROUND: Nanomaterials are being manufactured on a commercial scale for use in medical, diagnostic, energy, component and communications industries. However, concerns over the safety of engineered nanomaterials have surfaced. Humans can be exposed to nanomaterials in different ways such as inhalation or exposure through the integumentary system. RESULTS: The interactions of engineered nanomaterials with primary human cells was investigated, using a systems biology approach combining gene expression microarray profiling with dynamic experimental parameters. In this experiment, primary human epidermal keratinocytes cells were exposed to several low-micron to nano-scale materials, and gene expression was profiled over both time and dose to compile a comprehensive picture of nanomaterial-cellular interactions. Very few gene-expression studies so far have dealt with both time and dose response simultaneously. Here, we propose different approaches to this kind of analysis. First, we used heat maps and multi-dimensional scaling (MDS) plots to visualize the dose response of nanomaterials over time. Then, in order to find out the most common patterns in gene-expression profiles, we used self-organizing maps (SOM) combined with two different criteria to determine the number of clusters. The consistency of SOM results is discussed in context of the information derived from the MDS plots. Finally, in order to identify the genes that have significantly different responses among different levels of dose of each treatment while accounting for the effect of time at the same time, we used a two-way ANOVA model, in connection with Tukey's additivity test and the Box-Cox transformation. The results are discussed in the context of the cellular responses of engineered nanomaterials. CONCLUSION: The analysis presented here lead to interesting and complementary conclusions about the response across time of human epidermal keratinocytes after exposure to nanomaterials. For example, we observed that gene expression for most treatments become closer to the expression of the baseline cultures as time proceeds. The genes found to be differentially-expressed are involved in a number of cellular processes, including regulation of transcription and translation, protein localization, transport, cell cycle progression, cell migration, cytoskeletal reorganization, signal transduction, and development.


Asunto(s)
Epidermis/metabolismo , Perfilación de la Expresión Génica , Queratinocitos/metabolismo , Nanoestructuras/química , Células Epidérmicas , Humanos , Queratinocitos/citología , Análisis de Secuencia por Matrices de Oligonucleótidos
16.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1250-1261, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-29993697

RESUMEN

Control of gene regulatory networks (GRNs) to shift gene expression from undesirable states to desirable ones has received much attention in recent years. Most of the existing methods assume that the cost of intervention at each state and time point, referred to as the immediate cost function, is fully known. In this paper, we employ the Partially-Observed Boolean Dynamical System (POBDS) signal model for a time sequence of noisy expression measurement from a Boolean GRN and develop a Bayesian Inverse Reinforcement Learning (BIRL) approach to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert. The Boolean Kalman Smoother (BKS) algorithm is used for optimally mapping the available gene-expression data into a sequence of Boolean states, and then the BIRL method is efficiently combined with the Q-learning algorithm for quantification of the immediate cost function. The performance of the proposed methodology is investigated by applying a state-feedback controller to two GRN models: a melanoma WNT5A Boolean network and a p53-MDM2 negative feedback loop Boolean network, when the cost of the undesirable states, and thus the identity of the undesirable genes, is learned using the proposed methodology.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Aprendizaje Automático , Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Melanoma/metabolismo , Modelos Biológicos , Modelos Genéticos , Proteínas Proto-Oncogénicas c-mdm2/metabolismo , Neoplasias Cutáneas/metabolismo , Programas Informáticos , Proteína p53 Supresora de Tumor/metabolismo , Proteína Wnt-5a/metabolismo
17.
Cancer Inform ; 18: 1176935119860822, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31360060

RESUMEN

Observational case-control studies for biomarker discovery in cancer studies often collect data that are sampled separately from the case and control populations. We present an analysis of the bias in the estimation of the precision of classifiers designed on separately sampled data. The analysis consists of both theoretical and numerical results, which show that classifier precision estimates can display strong bias under separating sampling, with the bias magnitude depending on the difference between the true case prevalence in the population and the sample prevalence in the data. We show that this bias is systematic in the sense that it cannot be reduced by increasing sample size. If information about the true case prevalence is available from public health records, then a modified precision estimator that uses the known prevalence displays smaller bias, which can in fact be reduced to zero as sample size increases under regularity conditions on the classification algorithm. The accuracy of the theoretical analysis and the performance of the precision estimators under separate sampling are confirmed by numerical experiments using synthetic and real data from published observational case-control studies. The results with real data confirmed that under separately sampled data, the usual estimator produces larger, ie, more optimistic, precision estimates than the estimator using the true prevalence value.

18.
Artículo en Inglés | MEDLINE | ID: mdl-29053466

RESUMEN

This paper studies classification of gene-expression trajectories coming from two classes, healthy and mutated (cancerous) using Boolean networks with perturbation (BNps) to model the dynamics of each class at the state level. Each class has its own BNp, which is partially known based on gene pathways. We employ a Gaussian model at the observation level to show the expression values of the genes given the hidden binary states at each time point. We use expectation maximization (EM) to learn the BNps and the unknown model parameters, derive closed-form updates for the parameters, and propose a learning algorithm. After learning, a plug-in Bayes classifier is used to classify unlabeled trajectories, which can have missing data. Measuring gene expressions at different times yields trajectories only when measurements come from a single cell. In multiple-cell scenarios, the expression values are averages over many cells with possibly different states. Via the central-limit theorem, we propose another model for expression data in multiple-cell scenarios. Simulations demonstrate that single-cell trajectory data can outperform multiple-cell average expression data relative to classification error, especially in high-noise situations. We also consider data generated via a mammalian cell-cycle network, both the wild-type and with a common mutation affecting p27.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Análisis de la Célula Individual/métodos , Algoritmos , Animales , Teorema de Bayes , Humanos , Modelos Genéticos , Modelos Estadísticos , Neoplasias/genética , Neoplasias/metabolismo
19.
IEEE Trans Biomed Eng ; 66(10): 2861-2868, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-30716030

RESUMEN

Dengue has become one of the most important worldwide arthropod-borne diseases. Dengue phenotypes are based on laboratorial and clinical exams, which are known to be inaccurate. OBJECTIVE: We present a machine learning approach for the prediction of dengue fever severity based solely on human genome data. METHODS: One hundred and two Brazilian dengue patients and controls were genotyped for 322 innate immunity single nucleotide polymorphisms (SNPs). Our model uses a support vector machine algorithm to find the optimal loci classification subset and then an artificial neural network (ANN) is used to classify patients into dengue fever or severe dengue. RESULTS: The ANN trained on 13 key immune SNPs selected under dominant or recessive models produced median values of accuracy greater than 86%, and sensitivity and specificity over 98% and 51%, respectively. CONCLUSION: The proposed classification method, using only genome markers, can be used to identify individuals at high risk for developing the severe dengue phenotype even in uninfected conditions. SIGNIFICANCE: Our results suggest that the genetic context is a key element in phenotype definition in dengue. The methodology proposed here is extendable to other Mendelian based and genetically influenced diseases.


Asunto(s)
Genoma Humano , Aprendizaje Automático , Dengue Grave/genética , Brasil , Estudios de Casos y Controles , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Valor Predictivo de las Pruebas , Pronóstico , Sensibilidad y Especificidad
20.
Hum Immunol ; 69(2): 122-8, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18361938

RESUMEN

Dengue disease can clinically evolve from an asymptomatic and mild disease, known as dengue fever (DF), to a severe disease known as dengue hemorrhagic fever (DHF). Recent evidence has shown how host genetic factors can be correlated with severe dengue susceptibility or protection. Many of these genes, such as CD209, TNF-a, vitamin D receptor, and FC gamma receptor IIA, are components of the innate immune system, suggesting that innate responses might have a role in dengue pathogenesis. MBL2 gene polymorphisms have been shown to modulate susceptibility or protection in many viral diseases. We investigated the involvement of MBL2 gene in the dengue clinical outcome through the analysis of MBL2 exon 1 polymorphisms (at codons 52, 54, and 57) known to be associated with reduced serum levels of the MBL protein. The genotypes of 110 well-characterized dengue-positive patients were statistically analyzed to establish possible correlations between MBL2 polymorphisms and parameters such as sex, type of infection (primary or secondary response), race/ethnicity, course of infection, and age. We found significant correlations between wild-type AA MBL2 genotype and age as associated risk factors for development of dengue-related thrombocytopenia.


Asunto(s)
Virus del Dengue , Dengue/genética , Lectina de Unión a Manosa/genética , Dengue Grave/genética , Trombocitopenia/genética , Adolescente , Adulto , Factores de Edad , Brasil , Dengue/sangre , Susceptibilidad a Enfermedades , Femenino , Humanos , Masculino , Lectina de Unión a Manosa/sangre , Persona de Mediana Edad , Polimorfismo Genético , Factores de Riesgo , Dengue Grave/sangre
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda