Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism.

Satterstrom, F Kyle; Kosmicki, Jack A; Wang, Jiebiao; Breen, Michael S; De Rubeis, Silvia; An, Joon-Yong; Peng, Minshi; Collins, Ryan; Grove, Jakob; Klei, Lambertus; Stevens, Christine; Reichert, Jennifer; Mulhern, Maureen S; Artomov, Mykyta; Gerges, Sherif; Sheppard, Brooke; Xu, Xinyi; Bhaduri, Aparna; Norman, Utku; Brand, Harrison; Schwartz, Grace; Nguyen, Rachel; Guerrero, Elizabeth E; Dias, Caroline; Betancur, Catalina; Cook, Edwin H; Gallagher, Louise; Gill, Michael; Sutcliffe, James S; Thurm, Audrey; Zwick, Michael E; Børglum, Anders D; State, Matthew W; Cicek, A Ercument; Talkowski, Michael E; Cutler, David J; Devlin, Bernie; Sanders, Stephan J; Roeder, Kathryn; Daly, Mark J; Buxbaum, Joseph D.

Cell ; 180(3): 568-584.e23, 2020 02 06.

Artículo en Inglés | MEDLINE | ID: mdl-31981491

RESUMEN

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

Asunto(s)

Trastorno Autístico/genética , Corteza Cerebral/crecimiento & desarrollo , Secuenciación del Exoma/métodos , Regulación del Desarrollo de la Expresión Génica , Neurobiología/métodos , Estudios de Casos y Controles , Linaje de la Célula , Estudios de Cohortes , Exoma , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Masculino , Mutación Missense , Neuronas/metabolismo , Fenotipo , Factores Sexuales , Análisis de la Célula Individual/métodos

2.

Bayesian estimation of cell type-specific gene expression with prior derived from single-cell data.

Wang, Jiebiao; Roeder, Kathryn; Devlin, Bernie.

Genome Res ; 31(10): 1807-1818, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-33837133

RESUMEN

When assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by evaluating gene expression at the cellular level. Both data types lend insights into disease etiology. With current technologies, scRNA-seq data are known to be noisy. Constrained by costs, scRNA-seq data are typically generated from a relatively small number of subjects, which limits their utility for some analyses, such as identification of gene expression quantitative trait loci (eQTLs). To address these issues while maintaining the unique advantages of each data type, we develop a Bayesian method (bMIND) to integrate bulk and scRNA-seq data. With a prior derived from scRNA-seq data, we propose to estimate sample-level cell type-specific (CTS) expression from bulk expression data. The CTS expression enables large-scale sample-level downstream analyses, such as detection of CTS differentially expressed genes (DEGs) and eQTLs. Through simulations, we show that bMIND improves the accuracy of sample-level CTS expression estimates and increases the power to discover CTS DEGs when compared to existing methods. To further our understanding of two complex phenotypes, autism spectrum disorder and Alzheimer's disease, we apply bMIND to gene expression data of relevant brain tissue to identify CTS DEGs. Our results complement findings for CTS DEGs obtained from snRNA-seq studies, replicating certain DEGs in specific cell types while nominating other novel genes for those cell types. Finally, we calculate CTS eQTLs for 11 brain regions by analyzing Genotype-Tissue Expression Project data, creating a new resource for biological insights.

Asunto(s)

Trastorno del Espectro Autista , Análisis de la Célula Individual , Trastorno del Espectro Autista/genética , Teorema de Bayes , Expresión Génica , Perfilación de la Expresión Génica/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos

3.

Transcriptional risk scores in Alzheimer's disease: From pathology to cognition.

Pyun, Jung-Min; Park, Young Ho; Wang, Jiebiao; Bennett, David A; Bice, Paula J; Kim, Jun Pyo; Kim, SangYun; Saykin, Andrew J; Nho, Kwangsik.

Alzheimers Dement ; 20(1): 243-252, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37563770

RESUMEN

INTRODUCTION: Our previously developed blood-based transcriptional risk scores (TRS) showed associations with diagnosis and neuroimaging biomarkers for Alzheimer's disease (AD). Here, we developed brain-based TRS. METHODS: We integrated AD genome-wide association study summary and expression quantitative trait locus data to prioritize target genes using Mendelian randomization. We calculated TRS using brain transcriptome data of two independent cohorts (N = 878) and performed association analysis of TRS with diagnosis, amyloidopathy, tauopathy, and cognition. We compared AD classification performance of TRS with polygenic risk scores (PRS). RESULTS: Higher TRS values were significantly associated with AD, amyloidopathy, tauopathy, worse cognition, and faster cognitive decline, which were replicated in an independent cohort. The AD classification performance of PRS was increased with the inclusion of TRS up to 16% with the area under the curve value of 0.850. DISCUSSION: Our results suggest brain-based TRS improves the AD classification of PRS and may be a potential AD biomarker. HIGHLIGHTS: Transcriptional risk score (TRS) is developed using brain RNA-Seq data. Higher TRS values are shown in Alzheimer's disease (AD). TRS improves the AD classification power of PRS up to 16%. TRS is associated with AD pathology presence. TRS is associated with worse cognitive performance and faster cognitive decline.

Asunto(s)

Enfermedad de Alzheimer , Tauopatías , Humanos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , Cognición , Factores de Riesgo , Biomarcadores , Puntuación de Riesgo Genético

4.

Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution.

Cai, Manqi; Yue, Molin; Chen, Tianmeng; Liu, Jinling; Forno, Erick; Lu, Xinghua; Billiar, Timothy; Celedón, Juan; McKennan, Chris; Chen, Wei; Wang, Jiebiao.

Bioinformatics ; 38(11): 3004-3010, 2022 05 26.

Artículo en Inglés | MEDLINE | ID: mdl-35438146

RESUMEN

MOTIVATION: Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. RESULTS: To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. AVAILABILITY AND IMPLEMENTATION: EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. The RNA microarray data from the TRAUMA study are available and can be accessed in GEO (GSE36809). The demographic and clinical phenotypes can be shared on reasonable request to the corresponding authors. The RNA-seq data from the EVAPR study cannot be shared publicly due to the privacy of individuals that participated in the clinical research in compliance with the IRB approval at the University of Pittsburgh. The RNA microarray data from the FHS study are available from dbGaP (phs000007.v32.p13). The RNA-seq data from ROS study is downloaded from AD Knowledge Portal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

ARN , Transcriptoma , Análisis de Secuencia de ARN , RNA-Seq , Simulación por Computador

5.

Identification of cell-type-specific marker genes from co-expression patterns in tissue samples.

Qiu, Yixuan; Wang, Jiebiao; Lei, Jing; Roeder, Kathryn.

Bioinformatics ; 37(19): 3228-3234, 2021 Oct 11.

Artículo en Inglés | MEDLINE | ID: mdl-33904573

RESUMEN

MOTIVATION: Marker genes, defined as genes that are expressed primarily in a single-cell type, can be identified from the single-cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. RESULTS: To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. AVAILABILITY AND IMPLEMENTATION: We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.

ESCO: single cell expression simulation incorporating gene co-expression.

Tian, Jinjin; Wang, Jiebiao; Roeder, Kathryn.

Bioinformatics ; 37(16): 2374-2381, 2021 Aug 25.

Artículo en Inglés | MEDLINE | ID: mdl-33624750

RESUMEN

MOTIVATION: Gene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner. RESULTS: Therefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data. AVAILABILITY AND IMPLEMENTATION: The ESCO implementation is available as R package ESCO. Users can either download the development version via github (https://github.com/JINJINT/ESCO) or the archived version via Zenodo (https://zenodo.org/record/4455890). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.

CCmed: cross-condition mediation analysis for identifying replicable trans-associations mediated by cis-gene expression.

Yang, Fan; Gleason, Kevin J; Wang, Jiebiao; Duan, Jubao; He, Xin; Pierce, Brandon L; Chen, Lin S.

Bioinformatics ; 37(17): 2513-2520, 2021 Sep 09.

Artículo en Inglés | MEDLINE | ID: mdl-33647928

RESUMEN

MOTIVATION: Trans-acting expression quantitative trait loci (eQTLs) collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are often individually weak. A large proportion of genetic effects on distal genes are mediated through cis-gene expression. Cis-association (between SNP and cis-gene) and gene-gene correlation conditional on SNP genotype could establish trans-association (between SNP and trans-gene). Both cis-association and gene-gene conditional correlation have effects shared across relevant tissues and conditions, and trans-associations mediated by cis-gene expression also have effects shared across relevant conditions. RESULTS: We proposed a Cross-Condition Mediation analysis method (CCmed) for detecting cis-mediated trans-associations with replicable effects in relevant conditions/studies. CCmed integrates cis-association and gene-gene conditional correlation statistics from multiple tissues/studies. Motivated by the bimodal effect-sharing patterns of eQTLs, we proposed two variations of CCmed, CCmedmost and CCmedspec for detecting cross-tissue and tissue-specific trans-associations, respectively. We analyzed data of 13 brain tissues from the Genotype-Tissue Expression (GTEx) project, and identified trios with cis-mediated trans-associations across brain tissues, many of which showed evidence of trans-association in two replication studies. We also identified trans-genes associated with schizophrenia loci in at least two brain tissues. AVAILABILITY AND IMPLEMENTATION: CCmed software is available at http://github.com/kjgleason/CCmed. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.

Small nucleolar RNAs in plasma extracellular vesicles and their discriminatory power as diagnostic biomarkers of Alzheimer's disease.

Fitz, Nicholas F; Wang, Jiebiao; Kamboh, M Ilyas; Koldamova, Radosveta; Lefterov, Iliya.

Neurobiol Dis ; 159: 105481, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34411703

RESUMEN

The clinical diagnosis of Alzheimer's disease, at its early stage, remains a difficult task. Advanced imaging technologies and laboratory assays to detect Aß peptides Aß42 and Aß40, total and phosphorylated tau in CSF provide a set of biomarkers of developing AD brain pathology and facilitate the diagnostic process. The search for biofluid biomarkers, other than in CSF, and the development of biomarker assays have accelerated significantly and now represent the fastest-growing field in AD research. The goal of this study was to determine the differential enrichment of noncoding RNAs (ncRNAs) in plasma-derived extracellular vesicles (EV) of AD patients and Cognitively Normal controls (NC). Using RNA-seq, we profiled four significant classes of ncRNAs: miRNAs, snoRNAs, tRNAs, and piRNAs. We report a significant enrichment of SNORDs - a group of snoRNAs, in AD samples compared to NC. To verify the differential enrichment of two clusters of SNORDs - SNORD115 and SNORD116, localized on human chromosome 15q11-q13, we used plasma samples of an independent group of AD patients and NC. We applied ddPCR technique and identified SNORD115 and SNORD116 with a high discriminatory power to differentiate AD samples from NC. The results of our study present evidence that AD is associated with changes in the enrichment of SNORDs, transcribed from imprinted genomic loci, in plasma EV and provide a rationale to further explore the validity of those SNORDs as plasma biomarkers of AD.

Asunto(s)

Enfermedad de Alzheimer/metabolismo , Vesículas Extracelulares/metabolismo , ARN Nucleolar Pequeño/metabolismo , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/diagnóstico , Biomarcadores/metabolismo , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Sensibilidad y Especificidad

9.

Using multiple measurements of tissue to estimate subject- and cell-type-specific gene expression.

Wang, Jiebiao; Devlin, Bernie; Roeder, Kathryn.

Bioinformatics ; 36(3): 782-788, 2020 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-31400192

RESUMEN

MOTIVATION: Patterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects. RESULTS: Complementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g. multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL). AVAILABILITY AND IMPLEMENTATION: We implement this method as an R package MIND, hosted on https://github.com/randel/MIND. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Perfilación de la Expresión Génica , Programas Informáticos , Teorema de Bayes , Análisis de Secuencia de ARN , Análisis de la Célula Individual

10.

Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis.

Yang, Fan; Wang, Jiebiao; Pierce, Brandon L; Chen, Lin S.

Genome Res ; 27(11): 1859-1871, 2017 11.

Artículo en Inglés | MEDLINE | ID: mdl-29021290

RESUMEN

The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is "mediation" by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are "cis-mediators" of trans-eQTLs, including those "cis-hubs" involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Genómica/métodos , Sitios de Carácter Cuantitativo , Bases de Datos Genéticas , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Selección Genética , Distribución Tisular

11.

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

Wang, Jiebiao; Wang, Pei; Hedeker, Donald; Chen, Lin S.

Biostatistics ; 20(4): 648-665, 2019 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-29939200

RESUMEN

In quantitative proteomics, mass tag labeling techniques have been widely adopted in mass spectrometry experiments. These techniques allow peptides (short amino acid sequences) and proteins from multiple samples of a batch being detected and quantified in a single experiment, and as such greatly improve the efficiency of protein profiling. However, the batch-processing of samples also results in severe batch effects and non-ignorable missing data occurring at the batch level. Motivated by the breast cancer proteomic data from the Clinical Proteomic Tumor Analysis Consortium, in this work, we developed two tailored multivariate MIxed-effects SElection models (mvMISE) to jointly analyze multiple correlated peptides/proteins in labeled proteomics data, considering the batch effects and the non-ignorable missingness. By taking a multivariate approach, we can borrow information across multiple peptides of the same protein or multiple proteins from the same biological pathway, and thus achieve better statistical efficiency and biological interpretation. These two different models account for different correlation structures among a group of peptides or proteins. Specifically, to model multiple peptides from the same protein, we employed a factor-analytic random effects structure to characterize the high and similar correlations among peptides. To model biological dependence among multiple proteins in a functional pathway, we introduced a graphical lasso penalty on the error precision matrix, and implemented an efficient algorithm based on the alternating direction method of multipliers. Simulations demonstrated the advantages of the proposed models. Applying the proposed methods to the motivating data set, we identified phosphoproteins and biological pathways that showed different activity patterns in triple negative breast tumors versus other breast tumors. The proposed methods can also be applied to other high-dimensional multivariate analyses based on clustered data with or without non-ignorable missingness.

Asunto(s)

Algoritmos , Bioestadística/métodos , Modelos Estadísticos , Proteómica/métodos , Humanos

12.

A meta-analysis approach with filtering for identifying gene-level gene-environment interactions.

Wang, Jiebiao; Liu, Qianying; Pierce, Brandon L; Huo, Dezheng; Olopade, Olufunmilayo I; Ahsan, Habibul; Chen, Lin S.

Genet Epidemiol ; 42(5): 434-446, 2018 07.

Artículo en Inglés | MEDLINE | ID: mdl-29430690

RESUMEN

There is a growing recognition that gene-environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome-wide analysis remains challenging due to power issues. In genome-wide G × E studies, a common strategy to improve power is to first conduct a filtering test and retain only the genetic variants that pass the filtering step for subsequent G × E analyses. Two-stage, multistage, and unified tests have been proposed to jointly consider the filtering statistics in G × E tests. However, such G × E tests based on data from a single study may still be underpowered. Meanwhile, large-scale consortia have been formed to borrow strength across studies and populations. In this work, motivated by existing single-study G × E tests with filtering and the needs for meta-analysis G × E approaches based on consortia data, we propose a meta-analysis framework for detecting gene-based G × E effects, and introduce meta-analysis-based filtering statistics in the gene-level G × E tests. Simulations demonstrate the advantages of the proposed method-the ofGEM test. We apply the proposed tests to existing data from two breast cancer consortia to identify the genes harboring genetic variants with age-dependent penetrance (i.e., gene-age interaction effects). We develop an R software package ofGEM for the proposed meta-analysis tests.

Asunto(s)

Interacción Gen-Ambiente , Factores de Edad , Edad de Inicio , Neoplasias de la Mama/genética , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Penetrancia , Factores de Riesgo

13.

Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx.

Wang, Jiebiao; Gamazon, Eric R; Pierce, Brandon L; Stranger, Barbara E; Im, Hae Kyung; Gibbons, Robert D; Cox, Nancy J; Nicolae, Dan L; Chen, Lin S.

Am J Hum Genet ; 98(4): 697-708, 2016 Apr 07.

Artículo en Inglés | MEDLINE | ID: mdl-27040689

RESUMEN

Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies.

Asunto(s)

Regulación de la Expresión Génica , Genotipo , Sitios de Carácter Cuantitativo , Transcriptoma , Humanos , Fenotipo , Proyectos Piloto , Reproducibilidad de los Resultados

14.

Aberrant GAP43 Gene Expression Is Alzheimer Disease Pathology-Specific.

Pyun, Jung-Min; Park, Young Ho; Wang, Jiebiao; Bice, Paula J; Bennett, David A; Kim, SangYun; Saykin, Andrew J; Nho, Kwangsik.

Ann Neurol ; 93(5): 1047-1048, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-36897291

Asunto(s)

Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Proteínas del Tejido Nervioso/genética , Glicoproteínas de Membrana/genética , Expresión Génica , Proteína GAP-43/genética , Proteína GAP-43/metabolismo

15.

BLEND: Probabilistic Cellular Deconvolution with Automated Reference Selection.

Huang, Penghui; Cai, Manqi; McKennan, Chris; Wang, Jiebiao.

bioRxiv ; 2024 Aug 06.

Artículo en Inglés | MEDLINE | ID: mdl-39149243

RESUMEN

Cellular deconvolution aims to estimate cell type fractions from bulk transcriptomic and other omics data. Most existing deconvolution methods fail to account for the heterogeneity in cell type-specific (CTS) expression across bulk samples, ignore discrepancies between CTS expression in bulk and cell type reference data, and provide no guidance on cell type reference selection or integration. To address these issues, we introduce BLEND, a hierarchical Bayesian method that leverages multiple reference datasets. BLEND learns the most suitable references for each bulk sample by exploring the convex hulls of references and employs a "bag-of-words" representation for bulk count data for deconvolution. To speed up the computation, we provide an efficient EM algorithm for parameter estimation. Notably, BLEND requires no data transformation, normalization, cell type marker gene selection, or reference quality evaluation. Benchmarking studies on both simulated and real human brain data highlight BLEND's superior performance in various scenarios. The analysis of Alzheimer's disease data illustrates BLEND's application in real data and reference resource integration.

16.

scMD facilitates cell type deconvolution using single-cell DNA methylation references.

Cai, Manqi; Zhou, Jingtian; McKennan, Chris; Wang, Jiebiao.

Commun Biol ; 7(1): 1, 2024 01 02.

Artículo en Inglés | MEDLINE | ID: mdl-38168620

RESUMEN

The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.

Asunto(s)

Encéfalo , Metilación de ADN , Encéfalo/metabolismo , Perfilación de la Expresión Génica , Genoma , ARN/metabolismo

17.

Amyloid age and tau PET timeline to symptomatic Alzheimer's disease in Down syndrome.

Schworer, Emily K; Zammit, Matthew D; Wang, Jiebiao; Handen, Benjamin L; Betthauser, Tobey; Laymon, Charles M; Tudorascu, Dana L; Cohen, Annie D; Zaman, Shahid H; Ances, Beau M; Mapstone, Mark; Head, Elizabeth; Klunk, William E; Christian, Bradley T; Hartley, Sigan L.

medRxiv ; 2024 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-39211859

RESUMEN

Background: Adults with Down syndrome (DS) are at risk for Alzheimer's disease (AD). Recent natural history cohort studies have characterized AD biomarkers, with a focus on PET amyloid-beta (Aß) and PET tau. Leveraging these well-characterized biomarkers, the present study examined the timeline to symptomatic AD based on estimated years since reaching Aß+, referred to as "amyloid age", and in relation to tau in a large cohort of individuals with DS. Methods: In this multicenter cohort study, 25 - 57-year-old adults with DS (n = 167) were assessed twice from 2017 to 2022, with approximately 32 months between visits as part of the Alzheimer Biomarker Consortium - Down Syndrome. Adults with DS completed amyloid and tau PET scans, and were administered the modified Cued Recall Test and the Down Syndrome Mental Status Examination. Study partners completed the National Task Group-Early Detection Screen for Dementia. Findings: Mixed linear regressions showed significant quadratic associations between amyloid age and cognitive performance and cubic associations between amyloid age and tau, both at baseline and across 32 months. Using broken stick regression models, differences in mCRT scores were detected beginning 2.7 years following Aß+ in cross-sectional models, with an estimated decline of 1.3 points per year. Increases in tau began, on average, 2.7 - 6.1 years following Aß+. On average, participants with mild cognitive impairment were 7.4 years post Aß+ and those with dementia were 12.7 years post Aß+. Interpretation: There is a short timeline to initial cognitive decline and dementia from Aß+ (Centiloid = 18) and tau deposition in DS relative to late onset AD. The established timeline based on amyloid age (or equivalent Centiloid values) is important for clinical practice and informing AD clinical trials, and avoids limitations of timelines based on chronological age. Funding. National Institute on Aging and the National Institute for Child Health and Human Development. Research in Context: Evidence before this study: We searched PubMed for articles published involving the progression of Aß and tau deposition in adults with Down syndrome from database inception to March 1, 2024. Terms included "amyloid", "Down syndrome", "tau", "Alzheimer's disease", "cognitive decline", and "amyloid chronicity," with no language restrictions. One previous study outlined the progression of tau in adults with Down syndrome without consideration of cognitive decline or clinical status. Other studies reported cognitive decline associated with Aß burden and estimated years to AD symptom onset in Down syndrome. Amyloid age estimates have also been created for older neurotypical adults and compared to cognitive performance, but this has not been investigated in Down syndrome.Added value of this study: The timeline to symptomatic Alzheimer's disease in relation to amyloid, expressed as duration of Aß+, and tau has yet to be described in adults with Down syndrome. Our longitudinal study is the first to provide a timeline of cognitive decline and transition to mild cognitive impairment and dementia in relation to Aß+.Implications of all the available evidence: In a cohort study of 167 adults with Down syndrome, cognitive decline began 2.7 - 5.4 years and tau deposition began 2.7 - 6.1 years following Aß+ (Centiloid = 18). Adults with Down syndrome converted to MCI after ~7 years and dementia after ~12-13 years of Aß+. This shortened timeline to AD symptomology from Aß+ and tau deposition in DS based on amyloid age (or corresponding Centiloid values) can inform clinical AD intervention trials and is of use in clinical settings.

18.

scMD: cell type deconvolution using single-cell DNA methylation references.

Cai, Manqi; Zhou, Jingtian; McKennan, Chris; Wang, Jiebiao.

bioRxiv ; 2023 Aug 06.

Artículo en Inglés | MEDLINE | ID: mdl-37577715

RESUMEN

The proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create a precise cell-type signature matrix that surpasses state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.

19.

Accurate estimation of rare cell type fractions from tissue omics data via hierarchical deconvolution.

Huang, Penghui; Cai, Manqi; Lu, Xinghua; McKennan, Chris; Wang, Jiebiao.

bioRxiv ; 2023 Mar 16.

Artículo en Inglés | MEDLINE | ID: mdl-36993280

RESUMEN

Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose Hierarchical Deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon significantly outperforms existing methods and accurately estimates cellular fractions.

20.

DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction.

Chandrashekar, Pramod Bharadwaj; Alatkar, Sayali; Wang, Jiebiao; Hoffman, Gabriel E; He, Chenfeng; Jin, Ting; Khullar, Saniya; Bendl, Jaroslav; Fullard, John F; Roussos, Panos; Wang, Daifeng.

Genome Med ; 15(1): 88, 2023 10 31.

Artículo en Inglés | MEDLINE | ID: mdl-37904203

RESUMEN

BACKGROUND: Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD: To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS: We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION: We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.

Asunto(s)

Enfermedad de Alzheimer , Aprendizaje Automático , Animales , Ratones , Redes Neurales de la Computación , Genotipo , Fenotipo , Enfermedad de Alzheimer/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA