Búsqueda | Portal Regional de la BVS

1.

Short-chain fatty acid production in accessible and inaccessible body pools as assessed by novel stable tracer pulse approach is reduced by aging independent of presence of COPD.

Kirschner, Sarah K; Ghane, Parisa; Park, Jaekwan K; Simbo, Sunday Y; Ivanov, Ivan; Braga-Neto, Ulisses M; Ten Have, Gabriëlla A M; Thaden, John J; Engelen, Mariëlle P K J; Deutz, Nicolaas E P.

Metabolism ; 141: 155399, 2023 04.

Artículo en Inglés | MEDLINE | ID: mdl-36642114

RESUMEN

BACKGROUND: Production rates of the short-chain fatty acids (SCFA) acetate, propionate, and butyrate, which are beneficial metabolites of the intestinal microbiota, are difficult to measure in humans due to inaccessibility of the intestine to perform measurements, and the high first-pass metabolism of SCFAs in colonocytes and liver. We developed a stable tracer pulse approach to estimate SCFA whole-body production (WBP) in the accessible pool representing the systemic circulation and interstitial fluid. Compartmental modeling of plasma enrichment data allowed us to additionally calculate SCFA kinetics and pool sizes in the inaccessible pool likely representing the intestine with microbiota. We also studied the effects of aging and the presence of Chronic Obstructive Pulmonary Disease (COPD) on SCFA kinetics. METHODS: In this observational study, we designed a two-compartmental model to determine SCFA kinetics in 31 young (20-29 y) and 71 older (55-87 y) adults, as well as in 33 clinically stable patients with moderate to very severe COPD (mean (SD) FEV1, 46.5 (16.2)% of predicted). Participants received in the fasted state a pulse containing stable tracers of acetate, propionate, and butyrate intravenously and blood was sampled four times over a 30 min period. We measured tracer-tracee ratios by GC-MS and used parameters obtained from two-exponential curve fitting to calculate non-compartmental SCFA WBP and perform compartmental analysis. Statistics were done by ANCOVA. RESULTS: Acetate, propionate, and butyrate WBP and fluxes between the accessible and inaccessible pools were lower in older than young adults (all q < 0.0001). Moreover, older participants had lower acetate (q < 0.0001) and propionate (q = 0.019) production rates in the inaccessible pool as well as smaller sizes of the accessible and inaccessible acetate pools (both q < 0.0001) than young participants. WBP, compartmental SCFA kinetics, and pool sizes did not differ between COPD patients and older adults (all q > 0.05). Overall and independent of the group studied, calculated production rates in the inaccessible pool were on average 7 (acetate), 11 (propionate), and 16 (butyrate) times higher than non-compartmental WBP, and sizes of inaccessible pools were 24 (acetate), 31 (propionate), and 55 (butyrate) times higher than sizes of accessible pools (all p < 0.0001). CONCLUSION: Non-compartmental production measurements of SCFAs in the accessible pool (i.e. systemic circulation) substantially underestimate the SCFA production in the inaccessible pool, which likely represents the intestine with microbiota, as assessed by compartmental analysis.

Asunto(s)

Ácidos Grasos Volátiles , Propionatos , Adulto Joven , Humanos , Anciano , Acetatos/metabolismo , Butiratos , Envejecimiento

2.

Adaptive Particle Filtering for Fault Detection in Partially-Observed Boolean Dynamical Systems.

Bahadorinejad, Arghavan; Imani, Mahdi; Braga-Neto, Ulisses M.

IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1105-1114, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-30418915

RESUMEN

We propose a novel methodology for fault detection and diagnosis in partially-observed Boolean dynamical systems (POBDS). These are stochastic, highly nonlinear, and derivativeless systems, rendering difficult the application of classical fault detection and diagnosis methods. The methodology comprises two main approaches. The first addresses the case when the normal mode of operation is known but not the fault modes. It applies an innovations filter (IF) to detect deviations from the nominal normal mode of operation. The second approach is applicable when the set of possible fault models is finite and known, in which case we employ a multiple model adaptive estimation (MMAE) approach based on a likelihood-ratio (LR) statistic. Unknown system parameters are estimated by an adaptive expectation-maximization (EM) algorithm. Particle filtering techniques are used to reduce the computational complexity in the case of systems with large state-spaces. The efficacy of the proposed methodology is demonstrated by numerical experiments with a large gene regulatory network (GRN) with stuck-at faults observed through a single noisy time series of RNA-seq gene expression measurements.

Asunto(s)

Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Modelos Genéticos , Algoritmos , RNA-Seq , Saccharomycetales/genética , Procesos Estocásticos

3.

On the Bias of Precision Estimation Under Separate Sampling.

Xie, Shuilian; Braga-Neto, Ulisses M.

Cancer Inform ; 18: 1176935119860822, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31360060

RESUMEN

Observational case-control studies for biomarker discovery in cancer studies often collect data that are sampled separately from the case and control populations. We present an analysis of the bias in the estimation of the precision of classifiers designed on separately sampled data. The analysis consists of both theoretical and numerical results, which show that classifier precision estimates can display strong bias under separating sampling, with the bias magnitude depending on the difference between the true case prevalence in the population and the sample prevalence in the data. We show that this bias is systematic in the sense that it cannot be reduced by increasing sample size. If information about the true case prevalence is available from public health records, then a modified precision estimator that uses the known prevalence displays smaller bias, which can in fact be reduced to zero as sample size increases under regularity conditions on the classification algorithm. The accuracy of the theoretical analysis and the performance of the precision estimators under separate sampling are confirmed by numerical experiments using synthetic and real data from published observational case-control studies. The results with real data confirmed that under separately sampled data, the usual estimator produces larger, ie, more optimistic, precision estimates than the estimator using the true prevalence value.

4.

Control of Gene Regulatory Networks Using Bayesian Inverse Reinforcement Learning.

Imani, Mahdi; Braga-Neto, Ulisses M.

IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1250-1261, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-29993697

RESUMEN

Control of gene regulatory networks (GRNs) to shift gene expression from undesirable states to desirable ones has received much attention in recent years. Most of the existing methods assume that the cost of intervention at each state and time point, referred to as the immediate cost function, is fully known. In this paper, we employ the Partially-Observed Boolean Dynamical System (POBDS) signal model for a time sequence of noisy expression measurement from a Boolean GRN and develop a Bayesian Inverse Reinforcement Learning (BIRL) approach to address the realistic case in which the only available knowledge regarding the immediate cost function is provided by the sequence of measurements and interventions recorded in an experimental setting by an expert. The Boolean Kalman Smoother (BKS) algorithm is used for optimally mapping the available gene-expression data into a sequence of Boolean states, and then the BIRL method is efficiently combined with the Q-learning algorithm for quantification of the immediate cost function. The performance of the proposed methodology is investigated by applying a state-feedback controller to two GRN models: a melanoma WNT5A Boolean network and a p53-MDM2 negative feedback loop Boolean network, when the cost of the undesirable states, and thus the identity of the undesirable genes, is learned using the proposed methodology.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Aprendizaje Automático , Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Melanoma/metabolismo , Modelos Biológicos , Modelos Genéticos , Proteínas Proto-Oncogénicas c-mdm2/metabolismo , Neoplasias Cutáneas/metabolismo , Programas Informáticos , Proteína p53 Supresora de Tumor/metabolismo , Proteína Wnt-5a/metabolismo

5.

Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty.

Imani, Mahdi; Dehghannasiri, Roozbeh; Braga-Neto, Ulisses M; Dougherty, Edward R.

Cancer Inform ; 17: 1176935118790247, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30093796

RESUMEN

Scientists are attempting to use models of ever-increasing complexity, especially in medicine, where gene-based diseases such as cancer require better modeling of cell regulation. Complex models suffer from uncertainty and experiments are needed to reduce this uncertainty. Because experiments can be costly and time-consuming, it is desirable to determine experiments providing the most useful information. If a sequence of experiments is to be performed, experimental design is needed to determine the order. A classical approach is to maximally reduce the overall uncertainty in the model, meaning maximal entropy reduction. A recently proposed method takes into account both model uncertainty and the translational objective, for instance, optimal structural intervention in gene regulatory networks, where the aim is to alter the regulatory logic to maximally reduce the long-run likelihood of being in a cancerous state. The mean objective cost of uncertainty (MOCU) quantifies uncertainty based on the degree to which model uncertainty affects the objective. Experimental design involves choosing the experiment that yields the greatest reduction in MOCU. This article introduces finite-horizon dynamic programming for MOCU-based sequential experimental design and compares it with the greedy approach, which selects one experiment at a time without consideration of the full horizon of experiments. A salient aspect of the article is that it demonstrates the advantage of MOCU-based design over the widely used entropy-based design for both greedy and dynamic programming strategies and investigates the effect of model conditions on the comparative performances.

6.

Optimal Fault Detection and Diagnosis in Transcriptional Circuits Using Next-Generation Sequencing.

Bahadorinejad, Arghavan; Braga-Neto, Ulisses M.

IEEE/ACM Trans Comput Biol Bioinform ; 15(2): 516-525, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29610100

RESUMEN

We propose a methodology for model-based fault detection and diagnosis for stochastic Boolean dynamical systems indirectly observed through a single time series of transcriptomic measurements using Next Generation Sequencing (NGS) data. The fault detection consists of an innovations filter followed by a fault certification step, and requires no knowledge about the possible system faults. The innovations filter uses the optimal Boolean state estimator, called the Boolean Kalman Filter (BKF). In the presence of knowledge about the possible system faults, we propose an additional step of fault diagnosis based on a multiple model adaptive estimation (MMAE) method consisting of a bank of BKFs running in parallel. Performance is assessed by means of false detection and misdiagnosis rates, as well as average times until correct detection and diagnosis. The efficacy of the proposed methodology is demonstrated via numerical experiments using a p53-MDM2 negative feedback loop Boolean network with stuck-at faults that model molecular events commonly found in cancer.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Biología Computacional , Humanos , Neoplasias/genética , Neoplasias/metabolismo

7.

A fast Branch-and-Bound algorithm for U-curve feature selection

Atashpaz-Gargari, Esmaeil; Reis, Marcelo da Silva; Braga-Neto, Ulisses M.; Barrera, Junior; Dougherty, Edward R..

Pattern Recognit, v. 73, p. 172-188, jan. 2018

Artículo en Inglés | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: bud-2394

RESUMEN

We introduce a fast Branch-and-Bound algorithm for optimal feature selection based on a U-curve assumption for the cost function. The U-curve assumption, which is based on the peaking phenomenon of the classification error, postulates that the cost over the chains of the Boolean lattice that represents the search space describes a U-shaped curve. The proposed algorithm is an improvement over the original algorithm for U-curve feature selection introduced recently. Extensive simulation experiments are carried out to assess the performance of the proposed algorithm (IUBB), comparing it to the original algorithm (UBB), as well as exhaustive search and Generalized Sequential Forward Search. The results show that the IUBB algorithm makes fewer evaluations and achieves better solutions under a fixed computational budget. We also show that the IUBB algorithm is robust with respect to violations of the U-curve assumption. We investigate the application of the IUBB algorithm in the design of imaging W-operators and in classification feature selection, using the average mean conditional entropy (MCE) as the cost function for the search.

8.

A fast Branch-and-Bound algorithm for U-curve feature selection

Atashpaz-Gargari, Esmaeil; Reis, Marcelo da Silva; Braga-Neto, Ulisses M; Barrera, Junior; Dougherty, Edward R.

Pattern Recognit ; 73: p. 172-188, 2018.

Artículo en Inglés | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: but-ib14869

RESUMEN

We introduce a fast Branch-and-Bound algorithm for optimal feature selection based on a U-curve assumption for the cost function. The U-curve assumption, which is based on the peaking phenomenon of the classification error, postulates that the cost over the chains of the Boolean lattice that represents the search space describes a U-shaped curve. The proposed algorithm is an improvement over the original algorithm for U-curve feature selection introduced recently. Extensive simulation experiments are carried out to assess the performance of the proposed algorithm (IUBB), comparing it to the original algorithm (UBB), as well as exhaustive search and Generalized Sequential Forward Search. The results show that the IUBB algorithm makes fewer evaluations and achieves better solutions under a fixed computational budget. We also show that the IUBB algorithm is robust with respect to violations of the U-curve assumption. We investigate the application of the IUBB algorithm in the design of imaging W-operators and in classification feature selection, using the average mean conditional entropy (MCE) as the cost function for the search.

9.

BoolFilter: an R package for estimation and identification of partially-observed Boolean dynamical systems.

Mcclenny, Levi D; Imani, Mahdi; Braga-Neto, Ulisses M.

BMC Bioinformatics ; 18(1): 519, 2017 Nov 25.

Artículo en Inglés | MEDLINE | ID: mdl-29178844

RESUMEN

BACKGROUND: Gene regulatory networks govern the function of key cellular processes, such as control of the cell cycle, response to stress, DNA repair mechanisms, and more. Boolean networks have been used successfully in modeling gene regulatory networks. In the Boolean network model, the transcriptional state of each gene is represented by 0 (inactive) or 1 (active), and the relationship among genes is represented by logical gates updated at discrete time points. However, the Boolean gene states are never observed directly, but only indirectly and incompletely through noisy measurements based on expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays. The Partially-Observed Boolean Dynamical System (POBDS) signal model is distinct from other deterministic and stochastic Boolean network models in removing the requirement of a directly observable Boolean state vector and allowing uncertainty in the measurement process, addressing the scenario encountered in practice in transcriptomic analysis. RESULTS: BoolFilter is an R package that implements the POBDS model and associated algorithms for state and parameter estimation. It allows the user to estimate the Boolean states, network topology, and measurement parameters from time series of transcriptomic data using exact and approximated (particle) filters, as well as simulate the transcriptomic data for a given Boolean network model. Some of its infrastructure, such as the network interface, is the same as in the previously published R package for Boolean Networks BoolNet, which enhances compatibility and user accessibility to the new package. CONCLUSIONS: We introduce the R package BoolFilter for Partially-Observed Boolean Dynamical Systems (POBDS). The BoolFilter package provides a useful toolbox for the bioinformatics community, with state-of-the-art algorithms for simulation of time series transcriptomic data as well as the inverse process of system identification from data obtained with various expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays.

Asunto(s)

Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Modelos Biológicos , Interfaz Usuario-Computador

10.

Bayesian estimation of the discrete coefficient of determination.

Chen, Ting; Braga-Neto, Ulisses M.

EURASIP J Bioinform Syst Biol ; 2016(1): 1, 2016 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-26807133

RESUMEN

The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.

11.

Statistical Detection of Intrinsically Multivariate Predictive Genes.

Chen, Ting; Braga-Neto, Ulisses M.

IEEE/ACM Trans Comput Biol Bioinform ; 12(4): 951-63, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26357335

RESUMEN

Canalizing genes possess broad regulatory power over a wide swath of regulatory processes. On the other hand, it has been hypothesized that the phenomenon of intrinsically multivariate prediction (IMP) is associated with canalization. However, applications have relied on user-selectable thresholds on the IMP score to decide on the presence of IMP. A methodology is developed here that avoids arbitrary thresholds, by providing a statistical test for the IMP score. In addition, the proposed procedure allows the incorporation of prior knowledge if available, which can alleviate the problem of loss of power due to small sample sizes. The issue of multiplicity of tests is addressed by family-wise error rate (FWER) and false discovery rate (FDR) controlling approaches. The proposed methodology is demonstrated by experiments using synthetic and real gene-expression data from studies on melanoma and ionizing radiation (IR) responsive genes. The results with the real data identified DUSP1 and p53, two well-known canalizing genes associated with melanoma and IR response, respectively, as the genes with a clear majority of IMP predictor pairs. This validates the potential of the proposed methodology as a tool for discovery of canalizing genes from binary gene-expression data. The procedure is made available through an R package.

Asunto(s)

Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Simulación por Computador , Humanos , Melanoma/genética , Análisis Multivariante , Procesos Estocásticos

12.

Bayesian ABC-MCMC Classification of Liquid Chromatography-Mass Spectrometry Data.

Banerjee, Upamanyu; Braga-Neto, Ulisses M.

Cancer Inform ; 14(Suppl 5): 175-182, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-28096647

RESUMEN

Proteomics promises to revolutionize cancer treatment and prevention by facilitating the discovery of molecular biomarkers. Progress has been impeded, however, by the small-sample, high-dimensional nature of proteomic data. We propose the application of a Bayesian approach to address this issue in classification of proteomic profiles generated by liquid chromatography-mass spectrometry (LC-MS). Our approach relies on a previously proposed model of the LC-MS experiment, as well as on the theory of the optimal Bayesian classifier (OBC). Computation of the OBC requires the combination of a likelihood-free methodology called approximate Bayesian computation (ABC) as well as Markov chain Monte Carlo (MCMC) sampling. Numerical experiments using synthetic LC-MS data based on an actual human proteome indicate that the proposed ABC-MCMC classification rule outperforms classical methods such as support vector machines, linear discriminant analysis, and 3-nearest neighbor classification rules in the case when sample size is small or the number of selected proteins used to classify is large.

13.

Cross-validation under separate sampling: strong bias and how to correct it.

Braga-Neto, Ulisses M; Zollanvari, Amin; Dougherty, Edward R.

Bioinformatics ; 30(23): 3349-55, 2014 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-25123902

RESUMEN

MOTIVATION: It is commonly assumed in pattern recognition that cross-validation error estimation is 'almost unbiased' as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics. RESULTS: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an 'almost unbiased' theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cross-validation is used. AVAILABILITY AND IMPLEMENTATION: The source code in C++, along with the Supplementary Materials, is available at: http://gsp.tamu.edu/Publications/supplementary/zollanvari13/.

Asunto(s)

Sesgo de Selección , Humanos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Enfermedad de Parkinson/genética , Probabilidad , Transcriptoma

14.

Unbiased bootstrap error estimation for linear discriminant analysis.

Vu, Thang; Sima, Chao; Braga-Neto, Ulisses M; Dougherty, Edward R.

EURASIP J Bioinform Syst Biol ; 2014: 15, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-28194165

RESUMEN

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.

15.

Modeling and systematic analysis of biomarker validation using selected reaction monitoring.

Atashpaz-Gargari, Esmaeil; Braga-Neto, Ulisses M; Dougherty, Edward R.

EURASIP J Bioinform Syst Biol ; 2014: 17, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-28194167

RESUMEN

BACKGROUND: Discovery and validation of protein biomarkers with high specificity is the main challenge of current proteomics studies. Different mass spectrometry models are used as shotgun tools for the discovery of biomarkers. Validation of a set of selected biomarkers from a list of candidates is an important stage in the biomarker identification pipeline. Validation is typically done by triple quadrupole (QQQ) mass spectrometry (MS) running in selected reaction monitoring (SRM) mode. Although the individual modules of this pipeline have been studied, there is little work on integrating the components from a systematic point of view. RESULTS: This paper analyzes the SRM experiment pipeline in a systematic fashion, by modeling the main stages of the biomarker validation process. The proposed models for SRM and protein mixture are then used to study the effect of different parameters on the final performance of biomarker validation. Sample complexity, purification, peptide ionization, and peptide specificity are among the parameters of the SRM experiment that are studied. We focus on the sensitivity of the SRM pipeline to the working parameters, in order to identify the bottlenecks where time and energy should be spent in designing the experiment. CONCLUSIONS: The model presented in this paper can be utilized to observe the effect of different instrument and experimental settings on biomarker validation by SRM. On the other hand, the model would be beneficial for optimization of the work flow as well as identification of the bottlenecks of the pipeline. Also, it creates the required infrastructure for predicting the performance of the SRM pipeline for a specific setting of the parameters.

16.

Statistical detection of boolean regulatory relationships.

Chen, Ting; Braga-Neto, Ulisses M.

IEEE/ACM Trans Comput Biol Bioinform ; 10(5): 1310-21, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24384715

RESUMEN

A statistic tool for the detection of multivariate Boolean relationships is presented, with applications in the inference of gene regulatory mechanisms. A statistical test is developed for the detection of a nonzero discrete coefficient of determination (CoD) between predictor and target variables. This is done by framing the problem in the context of a stochastic logic model that naturally allows the inclusion of prior knowledge if available. The rejection region, p-value, statistical power, and confidence interval are derived and analyzed. Furthermore, the issue of multiplicity of tests due to presence of numerous candidate genes and logic relationships is addressed via FWER- and FDR-controlling approaches. The methodology is demonstrated by experiments using synthetic data and real data from a study on ionizing radiation (IR)-responsive genes. The results indicate that the proposed methodology is a promising tool for detection of gene regulatory relationships from gene-expression data. Software that implements the COD test is available online as an R package.

Asunto(s)

Algoritmos , Regulación de la Expresión Génica/genética , Modelos Logísticos , Modelos Genéticos , Proteoma/genética , Transducción de Señal/genética , Simulación por Computador

17.

Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens.

Wang, Ying; Ghaffari, Noushin; Johnson, Charles D; Braga-Neto, Ulisses M; Wang, Hui; Chen, Rui; Zhou, Huaijun.

BMC Bioinformatics ; 12 Suppl 10: S5, 2011 Oct 18.

Artículo en Inglés | MEDLINE | ID: mdl-22165852

RESUMEN

BACKGROUND: RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. RESULTS: Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. CONCLUSION: The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.

Asunto(s)

Pollos/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Pulmón/metabolismo , Análisis de Secuencia de ARN , Animales , Biblioteca de Genes , Anotación de Secuencia Molecular , ARN Mensajero/genética

18.

Description of a prospective 17DD yellow fever vaccine cohort in Recife, Brazil.

de Melo, Andréa Barbosa; da Silva, Maria da Paz C; Magalhães, Maria Cecília F; Gonzales Gil, Laura Helena Vega; Freese de Carvalho, Eduardo M; Braga-Neto, Ulisses M; Bertani, Giovani Rota; Marques, Ernesto T A; Cordeiro, Marli Tenório.

Am J Trop Med Hyg ; 85(4): 739-47, 2011 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-21976581

RESUMEN

From September 2005 to March 2007, 238 individuals being vaccinated for the first time with the yellow fever (YF) -17DD vaccine were enrolled in a cohort established in Recife, Brazil. A prospective study indicated that, after immunization, anti-YF immunoglobulin M (IgM) and anti-YF IgG were present in 70.6% (IgM) and 98.3% (IgG) of the vaccinated subjects. All vaccinees developed protective immunity, which was detected by the plaque reduction neutralization test (PRNT) with a geometric mean titer of 892. Of the 238 individuals, 86.6% had IgG antibodies to dengue virus; however, the presence of anti-dengue IgG did not interfere significantly with the development of anti-YF neutralizing antibodies. In a separate retrospective study of individuals immunized with the 17DD vaccine, the PRNT values at 5 and 10 years post-vaccination remained positive but showed a significant decrease in neutralization titer (25% with PRNT titers < 100 after 5 years and 35% after 10 years).

Asunto(s)

Vacuna contra la Fiebre Amarilla/administración & dosificación , Anticuerpos Antivirales/sangre , Brasil , Ensayo de Inmunoadsorción Enzimática , Humanos , Pruebas de Neutralización , Estudios Prospectivos , Ensayo de Placa Viral , Vacuna contra la Fiebre Amarilla/inmunología

19.

High-dimensional bolstered error estimation.

Sima, Chao; Braga-Neto, Ulisses M; Dougherty, Edward R.

Bioinformatics ; 27(21): 3056-64, 2011 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-21914630

RESUMEN

MOTIVATION: In small-sample settings, bolstered error estimation has been shown to perform better than cross-validation and competitively with bootstrap with regard to various criteria. The key issue for bolstering performance is the variance setting for the bolstering kernel. Heretofore, this variance has been determined in a non-parametric manner from the data. Although bolstering based on this variance setting works well for small feature sets, results can deteriorate for high-dimensional feature spaces. RESULTS: This article computes an optimal kernel variance depending on the classification rule, sample size, model and feature space, both the original number and the number remaining after feature selection. A key point is that the optimal variance is robust relative to the model. This allows us to develop a method for selecting a suitable variance to use in real-world applications where the model is not known, but the other factors in determining the optimal kernel are known. AVAILABILITY: Companion website at http://compbio.tgen.org/paper_supp/high_dim_bolstering. CONTACT: edward@mail.ece.tamu.edu.

Asunto(s)

Perfilación de la Expresión Génica , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Interpretación Estadística de Datos , Femenino , Humanos , Mieloma Múltiple/genética , Mieloma Múltiple/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Tamaño de la Muestra

20.

The illusion of distribution-free small-sample classification in genomics.

Dougherty, Edward R; Zollanvari, Amin; Braga-Neto, Ulisses M.

Curr Genomics ; 12(5): 333-41, 2011 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-22294876

RESUMEN

Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA