Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

1.

A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex.

Yao, Zizhen; Liu, Hanqing; Xie, Fangming; Fischer, Stephan; Adkins, Ricky S; Aldridge, Andrew I; Ament, Seth A; Bartlett, Anna; Behrens, M Margarita; Van den Berge, Koen; Bertagnolli, Darren; de Bézieux, Hector Roux; Biancalani, Tommaso; Booeshaghi, A Sina; Bravo, Héctor Corrada; Casper, Tamara; Colantuoni, Carlo; Crabtree, Jonathan; Creasy, Heather; Crichton, Kirsten; Crow, Megan; Dee, Nick; Dougherty, Elizabeth L; Doyle, Wayne I; Dudoit, Sandrine; Fang, Rongxin; Felix, Victor; Fong, Olivia; Giglio, Michelle; Goldy, Jeff; Hawrylycz, Mike; Herb, Brian R; Hertzano, Ronna; Hou, Xiaomeng; Hu, Qiwen; Kancherla, Jayaram; Kroll, Matthew; Lathia, Kanan; Li, Yang Eric; Lucero, Jacinta D; Luo, Chongyuan; Mahurkar, Anup; McMillen, Delissa; Nadaf, Naeem M; Nery, Joseph R; Nguyen, Thuc Nghi; Niu, Sheng-Yong; Ntranos, Vasilis; Orvis, Joshua; Osteen, Julia K.

Nature ; 598(7879): 103-110, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-34616066

RESUMEN

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.

Asunto(s)

Epigenómica , Perfilación de la Expresión Génica , Corteza Motora/citología , Neuronas/clasificación , Análisis de la Célula Individual , Transcriptoma , Animales , Atlas como Asunto , Conjuntos de Datos como Asunto , Epigénesis Genética , Femenino , Masculino , Ratones , Corteza Motora/anatomía & histología , Neuronas/citología , Neuronas/metabolismo , Especificidad de Órganos , Reproducibilidad de los Resultados

2.

Capturing discrete latent structures: choose LDs over PCs.

Alexander, Theresa A; Irizarry, Rafael A; Bravo, Héctor Corrada.

Biostatistics ; 24(1): 1-16, 2022 12 12.

Artículo en Inglés | MEDLINE | ID: mdl-34467372

RESUMEN

High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.

Asunto(s)

Algoritmos , Humanos , Análisis de Componente Principal

3.

Multivariable association discovery in population-scale meta-omics studies.

Mallick, Himel; Rahnavard, Ali; McIver, Lauren J; Ma, Siyuan; Zhang, Yancong; Nguyen, Long H; Tickle, Timothy L; Weingart, George; Ren, Boyu; Schwager, Emma H; Chatterjee, Suvo; Thompson, Kelsey N; Wilkinson, Jeremy E; Subramanian, Ayshwarya; Lu, Yiren; Waldron, Levi; Paulson, Joseph N; Franzosa, Eric A; Bravo, Hector Corrada; Huttenhower, Curtis.

PLoS Comput Biol ; 17(11): e1009442, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34784344

RESUMEN

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.

Asunto(s)

Biología Computacional , Microbioma Gastrointestinal , Análisis Multivariante , Simulación por Computador , Humanos , Enfermedades Inflamatorias del Intestino/genética , Enfermedades Inflamatorias del Intestino/metabolismo , Enfermedades Inflamatorias del Intestino/patología

4.

Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data.

Sarkar, Hirak; Srivastava, Avi; Bravo, Héctor Corrada; Love, Michael I; Patro, Rob.

Bioinformatics ; 36(Suppl_1): i102-i110, 2020 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-32657377

RESUMEN

MOTIVATION: Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. RESULTS: We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. AVAILABILITY AND IMPLEMENTATION: Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Perfilación de la Expresión Génica , Programas Informáticos , Algoritmos , RNA-Seq , Análisis de Secuencia de ARN

5.

Heterogeneity of transcription factor binding specificity models within and across cell lines.

Sharmin, Mahfuza; Bravo, Héctor Corrada; Hannenhalli, Sridhar.

Genome Res ; 26(8): 1110-23, 2016 08.

Artículo en Inglés | MEDLINE | ID: mdl-27311443

RESUMEN

Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type-specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type-specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.

Asunto(s)

Proteínas de Unión al ADN/genética , Heterogeneidad Genética , Unión Proteica/genética , Factores de Transcripción/genética , Sitios de Unión/genética , Linaje de la Célula/genética , Biología Computacional , Regulación de la Expresión Génica , Genómica , Humanos , Sensibilidad y Especificidad

6.

Smooth quantile normalization.

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada.

Biostatistics ; 19(2): 185-198, 2018 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-29036413

RESUMEN

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

Asunto(s)

Bioestadística/métodos , Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Modelos Estadísticos , Humanos

7.

gEAR: Gene Expression Analysis Resource portal for community-driven, multi-omic data exploration.

Orvis, Joshua; Gottfried, Brian; Kancherla, Jayaram; Adkins, Ricky S; Song, Yang; Dror, Amiel A; Olley, Dustin; Rose, Kevin; Chrysostomou, Elena; Kelly, Michael C; Milon, Beatrice; Matern, Maggie S; Azaiez, Hela; Herb, Brian; Colantuoni, Carlo; Carter, Robert L; Ament, Seth A; Kelley, Matthew W; White, Owen; Bravo, Hector Corrada; Mahurkar, Anup; Hertzano, Ronna.

Nat Methods ; 18(8): 843-844, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-34172972

Asunto(s)

Algoritmos , Encéfalo/metabolismo , Biología Computacional/métodos , Regulación de la Expresión Génica , Genómica/métodos , Programas Informáticos , Transcriptoma , Gráficos por Computador , Humanos

8.

Orchestrating high-throughput genomic analysis with Bioconductor.

Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D; Irizarry, Rafael A; Lawrence, Michael; Love, Michael I; MacDonald, James; Obenchain, Valerie; Oles, Andrzej K; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin.

Nat Methods ; 12(2): 115-21, 2015 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-25633503

RESUMEN

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

Asunto(s)

Biología Computacional , Perfilación de la Expresión Génica , Genómica/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Programas Informáticos , Lenguajes de Programación , Interfaz Usuario-Computador

9.

Epiviz: interactive visual analytics for functional genomics data.

Chelaru, Florin; Smith, Llewellyn; Goldstein, Naomi; Bravo, Héctor Corrada.

Nat Methods ; 11(9): 938-40, 2014 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-25086505

RESUMEN

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.

Asunto(s)

Mapeo Cromosómico/métodos , Minería de Datos/métodos , Bases de Datos Genéticas , Genómica/métodos , Internet , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Sistemas de Administración de Bases de Datos

10.

BatchQC: interactive software for evaluating sample and batch effects in genomic data.

Manimaran, Solaiappan; Selby, Heather Marie; Okrah, Kwame; Ruberman, Claire; Leek, Jeffrey T; Quackenbush, John; Haibe-Kains, Benjamin; Bravo, Hector Corrada; Johnson, W Evan.

Bioinformatics ; 32(24): 3836-3838, 2016 12 15.

Artículo en Inglés | MEDLINE | ID: mdl-27540268

RESUMEN

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. There are several existing batch adjustment tools for '-omics' data, but they do not indicate a priori whether adjustment needs to be conducted or how correction should be applied. We present a software pipeline, BatchQC, which addresses these issues using interactive visualizations and statistics that evaluate the impact of batch effects in a genomic dataset. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. We used the BatchQC pipeline on both simulated and real data to demonstrate the effectiveness of this software toolkit. AVAILABILITY AND IMPLEMENTATION: BatchQC is available through Bioconductor: http://bioconductor.org/packages/BatchQC and GitHub: https://github.com/mani2012/BatchQC CONTACT: wej@bu.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Genómica/métodos , Programas Informáticos , Genoma , Humanos , Interfaz Usuario-Computador

11.

Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment.

Pop, Mihai; Paulson, Joseph N; Chakraborty, Subhra; Astrovskaya, Irina; Lindsay, Brianna R; Li, Shan; Bravo, Héctor Corrada; Harro, Clayton; Parkhill, Julian; Walker, Alan W; Walker, Richard I; Sack, David A; Stine, O Colin.

BMC Genomics ; 17: 440, 2016 06 08.

Artículo en Inglés | MEDLINE | ID: mdl-27277524

RESUMEN

BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrhea in inhabitants from low-income countries and in visitors to these countries. The impact of the human intestinal microbiota on the initiation and progression of ETEC diarrhea is not yet well understood. RESULTS: We used 16S rRNA (ribosomal RNA) gene sequencing to study changes in the fecal microbiota of 12 volunteers during a human challenge study with ETEC (H10407) and subsequent treatment with ciprofloxacin. Five subjects developed severe diarrhea and seven experienced few or no symptoms. Diarrheal symptoms were associated with high concentrations of fecal E. coli as measured by quantitative culture, quantitative PCR, and normalized number of 16S rRNA gene sequences. Large changes in other members of the microbiota varied greatly from individual to individual, whether or not diarrhea occurred. Nonetheless the variation within an individual was small compared to variation between individuals. Ciprofloxacin treatment reorganized microbiota populations; however, the original structure was largely restored at one and three month follow-up visits. CONCLUSION: Symptomatic ETEC infections, but not asymptomatic infections, were associated with high fecal concentrations of E. coli. Both infection and ciprofloxacin treatment caused variable changes in other bacteria that generally reverted to baseline levels after three months.

Asunto(s)

Ciprofloxacina/uso terapéutico , Escherichia coli Enterotoxigénica/efectos de los fármacos , Escherichia coli Enterotoxigénica/fisiología , Infecciones por Escherichia coli/tratamiento farmacológico , Infecciones por Escherichia coli/microbiología , Microbioma Gastrointestinal/efectos de los fármacos , Adulto , Ciprofloxacina/farmacología , Diarrea/tratamiento farmacológico , Diarrea/microbiología , Heces/microbiología , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Metagenoma , Metagenómica/métodos , Persona de Mediana Edad , ARN Ribosómico 16S , Curva ROC , Resultado del Tratamiento , Adulto Joven

12.

Differential abundance analysis for microbial marker-gene surveys.

Paulson, Joseph N; Stine, O Colin; Bravo, Héctor Corrada; Pop, Mihai.

Nat Methods ; 10(12): 1200-2, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24076764

RESUMEN

We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

Asunto(s)

Marcadores Genéticos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Algoritmos , Animales , Área Bajo la Curva , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Variación Genética , Humanos , Intestinos/microbiología , Ratones , Modelos Genéticos , Modelos Estadísticos , Distribución Normal , Fenotipo , Análisis de Secuencia de ADN , Programas Informáticos

13.

Distinct genomic and epigenomic features demarcate hypomethylated blocks in colon cancer.

Sharmin, Mahfuza; Bravo, Héctor Corrada; Hannenhalli, Sridhar.

BMC Cancer ; 16: 88, 2016 Feb 11.

Artículo en Inglés | MEDLINE | ID: mdl-26868017

RESUMEN

BACKGROUND: Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. METHOD: Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. RESULT: We found that the classical promoter epigenomic mark--H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. CONCLUSION: Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.

Asunto(s)

Neoplasias del Colon/genética , Metilación de ADN/genética , Epigenómica , Genoma Humano , Línea Celular Tumoral , Cromatina/genética , Neoplasias del Colon/patología , Islas de CpG/genética , Histonas/genética , Humanos , Regiones Promotoras Genéticas

14.

Tackling the widespread and critical impact of batch effects in high-throughput data.

Leek, Jeffrey T; Scharpf, Robert B; Bravo, Héctor Corrada; Simcha, David; Langmead, Benjamin; Johnson, W Evan; Geman, Donald; Baggerly, Keith; Irizarry, Rafael A.

Nat Rev Genet ; 11(10): 733-9, 2010 10.

Artículo en Inglés | MEDLINE | ID: mdl-20838408

RESUMEN

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.

Asunto(s)

Biotecnología/métodos , Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Biotecnología/normas , Biotecnología/estadística & datos numéricos , Biología Computacional/métodos , Genómica/normas , Genómica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Publicaciones Periódicas como Asunto/normas , Proyectos de Investigación/normas , Proyectos de Investigación/estadística & datos numéricos , Análisis de Secuencia de ADN/normas , Análisis de Secuencia de ADN/estadística & datos numéricos

15.

Gene expression anti-profiles as a basis for accurate universal cancer signatures.

Bravo, Héctor Corrada; Pihur, Vasyl; McCall, Matthew; Irizarry, Rafael A; Leek, Jeffrey T.

BMC Bioinformatics ; 13: 272, 2012 Oct 22.

Artículo en Inglés | MEDLINE | ID: mdl-23088656

RESUMEN

BACKGROUND: Early screening for cancer is arguably one of the greatest public health advances over the last fifty years. However, many cancer screening tests are invasive (digital rectal exams), expensive (mammograms, imaging) or both (colonoscopies). This has spurred growing interest in developing genomic signatures that can be used for cancer diagnosis and prognosis. However, progress has been slowed by heterogeneity in cancer profiles and the lack of effective computational prediction tools for this type of data. RESULTS: We developed anti-profiles as a first step towards translating experimental findings suggesting that stochastic across-sample hyper-variability in the expression of specific genes is a stable and general property of cancer into predictive and diagnostic signatures. Using single-chip microarray normalization and quality assessment methods, we developed an anti-profile for colon cancer in tissue biopsy samples. To demonstrate the translational potential of our findings, we applied the signature developed in the tissue samples, without any further retraining or normalization, to screen patients for colon cancer based on genomic measurements from peripheral blood in an independent study (AUC of 0.89). This method achieved higher accuracy than the signature underlying commercially available peripheral blood screening tests for colon cancer (AUC of 0.81). We also confirmed the existence of hyper-variable genes across a range of cancer types and found that a significant proportion of tissue-specific genes are hyper-variable in cancer. Based on these observations, we developed a universal cancer anti-profile that accurately distinguishes cancer from normal regardless of tissue type (ten-fold cross-validation AUC > 0.92). CONCLUSIONS: We have introduced anti-profiles as a new approach for developing cancer genomic signatures that specifically takes advantage of gene expression heterogeneity. We have demonstrated that anti-profiles can be successfully applied to develop peripheral-blood based diagnostics for cancer and used anti-profiles to develop a highly accurate universal cancer signature. By using single-chip normalization and quality assessment methods, no further retraining of signatures developed by the anti-profile approach would be required before their application in clinical settings. Our results suggest that anti-profiles may be used to develop inexpensive and non-invasive universal cancer screening tests.

Asunto(s)

Neoplasias del Colon/genética , Perfilación de la Expresión Génica/métodos , Área Bajo la Curva , Biomarcadores de Tumor/sangre , Neoplasias del Colon/diagnóstico , Variación Genética , Genómica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Pronóstico , Transcriptoma

16.

The partitioned LASSO-patternsearch algorithm with application to gene expression data.

Shi, Weiliang; Wahba, Grace; Irizarry, Rafael A; Bravo, Hector Corrada; Wright, Stephen J.

BMC Bioinformatics ; 13: 98, 2012 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-22587526

RESUMEN

BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.

Asunto(s)

Algoritmos , Simulación por Computador , Perfilación de la Expresión Génica/estadística & datos numéricos , Expresión Génica , Modelos Genéticos , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Femenino , Genómica , Humanos

17.

Reply to: "a fair comparison".

Paulson, Joseph N; Bravo, Héctor Corrada; Pop, Mihai.

Nat Methods ; 11(4): 359-60, 2014 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-24681718

Asunto(s)

Marcadores Genéticos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Animales , Humanos

18.

Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models.

Bravo, Héctor Corrada; Lee, Kristine E; Klein, Barbara E K; Klein, Ronald; Iyengar, Sudha K; Wahba, Grace.

Proc Natl Acad Sci U S A ; 106(20): 8128-33, 2009 May 19.

Artículo en Inglés | MEDLINE | ID: mdl-19420224

RESUMEN

We present a method for examining the relative influence of familial, genetic, and environmental covariate information in flexible nonparametric risk models. Our goal is investigating the relative importance of these three sources of information as they are associated with a particular outcome. To that end, we developed a method for incorporating arbitrary pedigree information in a smoothing spline ANOVA (SS-ANOVA) model. By expressing pedigree data as a positive semidefinite kernel matrix, the SS-ANOVA model is able to estimate a log-odds ratio as a multicomponent function of several variables: one or more functional components representing information from environmental covariates and/or genetic marker data and another representing pedigree relationships. We report a case study on models for retinal pigmentary abnormalities in the Beaver Dam Eye Study. Our model verifies known facts about the epidemiology of this eye lesion--found in eyes with early age-related macular degeneration--and shows significantly increased predictive ability in models that include all three of the genetic, environmental, and familial data sources. The case study also shows that models that contain only two of these data sources, that is, pedigree-environmental covariates, or pedigree-genetic markers, or environmental covariates-genetic markers, have comparable predictive ability, but less than the model with all three. This result is consistent with the notions that genetic marker data encode--at least in part--pedigree data, and that familial correlations encode shared environment data as well.

Asunto(s)

Susceptibilidad a Enfermedades/etiología , Modelos Teóricos , Riesgo , Adulto , Anciano , Anciano de 80 o más Años , Análisis de Varianza , Simulación por Computador , Ambiente , Salud de la Familia , Marcadores Genéticos , Humanos , Degeneración Macular/etiología , Persona de Mediana Edad , Linaje , Polimorfismo de Nucleótido Simple , Curva ROC

19.

Reporting guidelines for human microbiome research: the STORMS checklist.

Mirzayi, Chloe; Renson, Audrey; Zohra, Fatima; Elsafoury, Shaimaa; Geistlinger, Ludwig; Kasselman, Lora J; Eckenrode, Kelly; van de Wijgert, Janneke; Loughman, Amy; Marques, Francine Z; MacIntyre, David A; Arumugam, Manimozhiyan; Azhar, Rimsha; Beghini, Francesco; Bergstrom, Kirk; Bhatt, Ami; Bisanz, Jordan E; Braun, Jonathan; Bravo, Hector Corrada; Buck, Gregory A; Bushman, Frederic; Casero, David; Clarke, Gerard; Collado, Maria Carmen; Cotter, Paul D; Cryan, John F; Demmer, Ryan T; Devkota, Suzanne; Elinav, Eran; Escobar, Juan S; Fettweis, Jennifer; Finn, Robert D; Fodor, Anthony A; Forslund, Sofia; Franke, Andre; Furlanello, Cesare; Gilbert, Jack; Grice, Elizabeth; Haibe-Kains, Benjamin; Handley, Scott; Herd, Pamela; Holmes, Susan; Jacobs, Jonathan P; Karstens, Lisa; Knight, Rob; Knights, Dan; Koren, Omry; Kwon, Douglas S; Langille, Morgan; Lindsay, Brianna.

Nat Med ; 27(11): 1885-1892, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34789871

RESUMEN

The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Therefore, a multidisciplinary group of microbiome epidemiology researchers adapted guidelines for observational and genetic studies to culture-independent human microbiome studies, and also developed new reporting elements for laboratory, bioinformatics and statistical analyses tailored to microbiome studies. The resulting tool, called 'Strengthening The Organization and Reporting of Microbiome Studies' (STORMS), is composed of a 17-item checklist organized into six sections that correspond to the typical sections of a scientific publication, presented as an editable table for inclusion in supplementary materials. The STORMS checklist provides guidance for concise and complete reporting of microbiome studies that will facilitate manuscript preparation, peer review, and reader comprehension of publications and comparative analysis of published results.

Asunto(s)

Biología Computacional/métodos , Disbiosis/microbiología , Microbiota/fisiología , Estudios Observacionales como Asunto/métodos , Proyectos de Investigación , Humanos , Ciencia Traslacional Biomédica

20.

A phylogenetic mixture model for the evolution of gene expression.

Eng, Kevin H; Bravo, Héctor Corrada; Keles, Sündüz.

Mol Biol Evol ; 26(10): 2363-72, 2009 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-19602540

RESUMEN

Microarray platforms are used increasingly to make comparative inferences through genome-wide surveys of gene expression. Although recent studies focus on describing the evidence for natural selection using estimates of the within- and between-taxa mutational variances, these methods do not explicitly or flexibly account for predicted nonindependence due to phylogenetic associations between measurements. In the interest of parsing the effects of selection: we introduce a mixture model for the comparative analysis of variation in gene expression across multiple taxa. This class of models isolates the phylogenetic signal from the nonphylogenetic and the heritable signal from the nonheritable while measuring the proper amount of correction. As a result, the mixture model resolves outstanding differences between existing models, relates different ways to estimate the across taxa variance, and induces a likelihood ratio test for selection. We investigate by simulation and application the feasibility and utility of estimation of the required parameters and the power of the proposed test. We illustrate analysis under this mixture model with a gene duplication family data set.

Asunto(s)

Evolución Molecular , Regulación Fúngica de la Expresión Génica , Modelos Genéticos , Filogenia , Saccharomyces cerevisiae/genética , Análisis de Varianza , Calibración , Simulación por Computador , Funciones de Verosimilitud , Familia de Multigenes/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA