Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 180(3): 568-584.e23, 2020 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-31981491

RESUMEN

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.


Asunto(s)
Trastorno Autístico/genética , Corteza Cerebral/crecimiento & desarrollo , Secuenciación del Exoma/métodos , Regulación del Desarrollo de la Expresión Génica , Neurobiología/métodos , Estudios de Casos y Controles , Linaje de la Célula , Estudios de Cohortes , Exoma , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Masculino , Mutación Missense , Neuronas/metabolismo , Fenotipo , Factores Sexuales , Análisis de la Célula Individual/métodos
2.
Genome Res ; 31(10): 1807-1818, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-33837133

RESUMEN

When assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by evaluating gene expression at the cellular level. Both data types lend insights into disease etiology. With current technologies, scRNA-seq data are known to be noisy. Constrained by costs, scRNA-seq data are typically generated from a relatively small number of subjects, which limits their utility for some analyses, such as identification of gene expression quantitative trait loci (eQTLs). To address these issues while maintaining the unique advantages of each data type, we develop a Bayesian method (bMIND) to integrate bulk and scRNA-seq data. With a prior derived from scRNA-seq data, we propose to estimate sample-level cell type-specific (CTS) expression from bulk expression data. The CTS expression enables large-scale sample-level downstream analyses, such as detection of CTS differentially expressed genes (DEGs) and eQTLs. Through simulations, we show that bMIND improves the accuracy of sample-level CTS expression estimates and increases the power to discover CTS DEGs when compared to existing methods. To further our understanding of two complex phenotypes, autism spectrum disorder and Alzheimer's disease, we apply bMIND to gene expression data of relevant brain tissue to identify CTS DEGs. Our results complement findings for CTS DEGs obtained from snRNA-seq studies, replicating certain DEGs in specific cell types while nominating other novel genes for those cell types. Finally, we calculate CTS eQTLs for 11 brain regions by analyzing Genotype-Tissue Expression Project data, creating a new resource for biological insights.


Asunto(s)
Trastorno del Espectro Autista , Análisis de la Célula Individual , Trastorno del Espectro Autista/genética , Teorema de Bayes , Expresión Génica , Perfilación de la Expresión Génica/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
3.
Alzheimers Dement ; 20(1): 243-252, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37563770

RESUMEN

INTRODUCTION: Our previously developed blood-based transcriptional risk scores (TRS) showed associations with diagnosis and neuroimaging biomarkers for Alzheimer's disease (AD). Here, we developed brain-based TRS. METHODS: We integrated AD genome-wide association study summary and expression quantitative trait locus data to prioritize target genes using Mendelian randomization. We calculated TRS using brain transcriptome data of two independent cohorts (N = 878) and performed association analysis of TRS with diagnosis, amyloidopathy, tauopathy, and cognition. We compared AD classification performance of TRS with polygenic risk scores (PRS). RESULTS: Higher TRS values were significantly associated with AD, amyloidopathy, tauopathy, worse cognition, and faster cognitive decline, which were replicated in an independent cohort. The AD classification performance of PRS was increased with the inclusion of TRS up to 16% with the area under the curve value of 0.850. DISCUSSION: Our results suggest brain-based TRS improves the AD classification of PRS and may be a potential AD biomarker. HIGHLIGHTS: Transcriptional risk score (TRS) is developed using brain RNA-Seq data. Higher TRS values are shown in Alzheimer's disease (AD). TRS improves the AD classification power of PRS up to 16%. TRS is associated with AD pathology presence. TRS is associated with worse cognitive performance and faster cognitive decline.


Asunto(s)
Enfermedad de Alzheimer , Tauopatías , Humanos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , Cognición , Factores de Riesgo , Biomarcadores , Puntuación de Riesgo Genético
4.
Bioinformatics ; 38(11): 3004-3010, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35438146

RESUMEN

MOTIVATION: Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. RESULTS: To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. AVAILABILITY AND IMPLEMENTATION: EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. The RNA microarray data from the TRAUMA study are available and can be accessed in GEO (GSE36809). The demographic and clinical phenotypes can be shared on reasonable request to the corresponding authors. The RNA-seq data from the EVAPR study cannot be shared publicly due to the privacy of individuals that participated in the clinical research in compliance with the IRB approval at the University of Pittsburgh. The RNA microarray data from the FHS study are available from dbGaP (phs000007.v32.p13). The RNA-seq data from ROS study is downloaded from AD Knowledge Portal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN , Transcriptoma , Análisis de Secuencia de ARN , RNA-Seq , Simulación por Computador
5.
Bioinformatics ; 37(19): 3228-3234, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33904573

RESUMEN

MOTIVATION: Marker genes, defined as genes that are expressed primarily in a single-cell type, can be identified from the single-cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. RESULTS: To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. AVAILABILITY AND IMPLEMENTATION: We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Bioinformatics ; 37(16): 2374-2381, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33624750

RESUMEN

MOTIVATION: Gene-gene co-expression networks (GCN) are of biological interest for the useful information they provide for understanding gene-gene interactions. The advent of single cell RNA-sequencing allows us to examine more subtle gene co-expression occurring within a cell type. Many imputation and denoising methods have been developed to deal with the technical challenges observed in single cell data; meanwhile, several simulators have been developed for benchmarking and assessing these methods. Most of these simulators, however, either do not incorporate gene co-expression or generate co-expression in an inconvenient manner. RESULTS: Therefore, with the focus on gene co-expression, we propose a new simulator, ESCO, which adopts the idea of the copula to impose gene co-expression, while preserving the highlights of available simulators, which perform well for simulation of gene expression marginally. Using ESCO, we assess the performance of imputation methods on GCN recovery and find that imputation generally helps GCN recovery when the data are not too sparse, and the ensemble imputation method works best among leading methods. In contrast, imputation fails to help in the presence of an excessive fraction of zero counts, where simple data aggregating methods are a better choice. These findings are further verified with mouse and human brain cell data. AVAILABILITY AND IMPLEMENTATION: The ESCO implementation is available as R package ESCO. Users can either download the development version via github (https://github.com/JINJINT/ESCO) or the archived version via Zenodo (https://zenodo.org/record/4455890). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Bioinformatics ; 37(17): 2513-2520, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33647928

RESUMEN

MOTIVATION: Trans-acting expression quantitative trait loci (eQTLs) collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are often individually weak. A large proportion of genetic effects on distal genes are mediated through cis-gene expression. Cis-association (between SNP and cis-gene) and gene-gene correlation conditional on SNP genotype could establish trans-association (between SNP and trans-gene). Both cis-association and gene-gene conditional correlation have effects shared across relevant tissues and conditions, and trans-associations mediated by cis-gene expression also have effects shared across relevant conditions. RESULTS: We proposed a Cross-Condition Mediation analysis method (CCmed) for detecting cis-mediated trans-associations with replicable effects in relevant conditions/studies. CCmed integrates cis-association and gene-gene conditional correlation statistics from multiple tissues/studies. Motivated by the bimodal effect-sharing patterns of eQTLs, we proposed two variations of CCmed, CCmedmost and CCmedspec for detecting cross-tissue and tissue-specific trans-associations, respectively. We analyzed data of 13 brain tissues from the Genotype-Tissue Expression (GTEx) project, and identified trios with cis-mediated trans-associations across brain tissues, many of which showed evidence of trans-association in two replication studies. We also identified trans-genes associated with schizophrenia loci in at least two brain tissues. AVAILABILITY AND IMPLEMENTATION: CCmed software is available at http://github.com/kjgleason/CCmed. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Neurobiol Dis ; 159: 105481, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34411703

RESUMEN

The clinical diagnosis of Alzheimer's disease, at its early stage, remains a difficult task. Advanced imaging technologies and laboratory assays to detect Aß peptides Aß42 and Aß40, total and phosphorylated tau in CSF provide a set of biomarkers of developing AD brain pathology and facilitate the diagnostic process. The search for biofluid biomarkers, other than in CSF, and the development of biomarker assays have accelerated significantly and now represent the fastest-growing field in AD research. The goal of this study was to determine the differential enrichment of noncoding RNAs (ncRNAs) in plasma-derived extracellular vesicles (EV) of AD patients and Cognitively Normal controls (NC). Using RNA-seq, we profiled four significant classes of ncRNAs: miRNAs, snoRNAs, tRNAs, and piRNAs. We report a significant enrichment of SNORDs - a group of snoRNAs, in AD samples compared to NC. To verify the differential enrichment of two clusters of SNORDs - SNORD115 and SNORD116, localized on human chromosome 15q11-q13, we used plasma samples of an independent group of AD patients and NC. We applied ddPCR technique and identified SNORD115 and SNORD116 with a high discriminatory power to differentiate AD samples from NC. The results of our study present evidence that AD is associated with changes in the enrichment of SNORDs, transcribed from imprinted genomic loci, in plasma EV and provide a rationale to further explore the validity of those SNORDs as plasma biomarkers of AD.


Asunto(s)
Enfermedad de Alzheimer/metabolismo , Vesículas Extracelulares/metabolismo , ARN Nucleolar Pequeño/metabolismo , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/diagnóstico , Biomarcadores/metabolismo , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Sensibilidad y Especificidad
9.
Bioinformatics ; 36(3): 782-788, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31400192

RESUMEN

MOTIVATION: Patterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects. RESULTS: Complementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g. multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL). AVAILABILITY AND IMPLEMENTATION: We implement this method as an R package MIND, hosted on https://github.com/randel/MIND. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Teorema de Bayes , Análisis de Secuencia de ARN , Análisis de la Célula Individual
10.
Genome Res ; 27(11): 1859-1871, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-29021290

RESUMEN

The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is "mediation" by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are "cis-mediators" of trans-eQTLs, including those "cis-hubs" involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genómica/métodos , Sitios de Carácter Cuantitativo , Bases de Datos Genéticas , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Selección Genética , Distribución Tisular
11.
Biostatistics ; 20(4): 648-665, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29939200

RESUMEN

In quantitative proteomics, mass tag labeling techniques have been widely adopted in mass spectrometry experiments. These techniques allow peptides (short amino acid sequences) and proteins from multiple samples of a batch being detected and quantified in a single experiment, and as such greatly improve the efficiency of protein profiling. However, the batch-processing of samples also results in severe batch effects and non-ignorable missing data occurring at the batch level. Motivated by the breast cancer proteomic data from the Clinical Proteomic Tumor Analysis Consortium, in this work, we developed two tailored multivariate MIxed-effects SElection models (mvMISE) to jointly analyze multiple correlated peptides/proteins in labeled proteomics data, considering the batch effects and the non-ignorable missingness. By taking a multivariate approach, we can borrow information across multiple peptides of the same protein or multiple proteins from the same biological pathway, and thus achieve better statistical efficiency and biological interpretation. These two different models account for different correlation structures among a group of peptides or proteins. Specifically, to model multiple peptides from the same protein, we employed a factor-analytic random effects structure to characterize the high and similar correlations among peptides. To model biological dependence among multiple proteins in a functional pathway, we introduced a graphical lasso penalty on the error precision matrix, and implemented an efficient algorithm based on the alternating direction method of multipliers. Simulations demonstrated the advantages of the proposed models. Applying the proposed methods to the motivating data set, we identified phosphoproteins and biological pathways that showed different activity patterns in triple negative breast tumors versus other breast tumors. The proposed methods can also be applied to other high-dimensional multivariate analyses based on clustered data with or without non-ignorable missingness.


Asunto(s)
Algoritmos , Bioestadística/métodos , Modelos Estadísticos , Proteómica/métodos , Humanos
12.
Genet Epidemiol ; 42(5): 434-446, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29430690

RESUMEN

There is a growing recognition that gene-environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome-wide analysis remains challenging due to power issues. In genome-wide G × E studies, a common strategy to improve power is to first conduct a filtering test and retain only the genetic variants that pass the filtering step for subsequent G × E analyses. Two-stage, multistage, and unified tests have been proposed to jointly consider the filtering statistics in G × E tests. However, such G × E tests based on data from a single study may still be underpowered. Meanwhile, large-scale consortia have been formed to borrow strength across studies and populations. In this work, motivated by existing single-study G × E tests with filtering and the needs for meta-analysis G × E approaches based on consortia data, we propose a meta-analysis framework for detecting gene-based G × E effects, and introduce meta-analysis-based filtering statistics in the gene-level G × E tests. Simulations demonstrate the advantages of the proposed method-the ofGEM test. We apply the proposed tests to existing data from two breast cancer consortia to identify the genes harboring genetic variants with age-dependent penetrance (i.e., gene-age interaction effects). We develop an R software package ofGEM for the proposed meta-analysis tests.


Asunto(s)
Interacción Gen-Ambiente , Factores de Edad , Edad de Inicio , Neoplasias de la Mama/genética , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Penetrancia , Factores de Riesgo
13.
Am J Hum Genet ; 98(4): 697-708, 2016 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-27040689

RESUMEN

Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies.


Asunto(s)
Regulación de la Expresión Génica , Genotipo , Sitios de Carácter Cuantitativo , Transcriptoma , Humanos , Fenotipo , Proyectos Piloto , Reproducibilidad de los Resultados
15.
Commun Biol ; 7(1): 1, 2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38168620

RESUMEN

The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.


Asunto(s)
Encéfalo , Metilación de ADN , Encéfalo/metabolismo , Perfilación de la Expresión Génica , Genoma , ARN/metabolismo
16.
bioRxiv ; 2023 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-37577715

RESUMEN

The proliferation of single-cell RNA sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent development in single-cell DNA methylation (scDNAm), new avenues have been opened for deconvolving bulk DNAm data, particularly for solid tissues like the brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create a precise cell-type signature matrix that surpasses state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.

17.
bioRxiv ; 2023 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-36993280

RESUMEN

Bulk transcriptomics in tissue samples reflects the average expression levels across different cell types and is highly influenced by cellular fractions. As such, it is critical to estimate cellular fractions to both deconfound differential expression analyses and infer cell type-specific differential expression. Since experimentally counting cells is infeasible in most tissues and studies, in silico cellular deconvolution methods have been developed as an alternative. However, existing methods are designed for tissues consisting of clearly distinguishable cell types and have difficulties estimating highly correlated or rare cell types. To address this challenge, we propose Hierarchical Deconvolution (HiDecon) that uses single-cell RNA sequencing references and a hierarchical cell type tree, which models the similarities among cell types and cell differentiation relationships, to estimate cellular fractions in bulk data. By coordinating cell fractions across layers of the hierarchical tree, cellular fraction information is passed up and down the tree, which helps correct estimation biases by pooling information across related cell types. The flexible hierarchical tree structure also enables estimating rare cell fractions by splitting the tree to higher resolutions. Through simulations and real data applications with the ground truth of measured cellular fractions, we demonstrate that HiDecon significantly outperforms existing methods and accurately estimates cellular fractions.

18.
J Expo Sci Environ Epidemiol ; 33(2): 264-272, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36114292

RESUMEN

BACKGROUND: Phthalate exposure in pregnancy is typically estimated using maternal urinary phthalate metabolite levels. Our aim was to evaluate the association of urinary and placental tissue phthalates, and to explore the role of maternal and pregnancy characteristics that may bias estimates. METHODS: Fifty pregnancies were selected from the CANDLE Study, recruited from 2006 to 2011 in Tennessee. Linear models were used to estimate associations of urinary phthalates (2nd, 3rd trimesters) and placental tissue phthalates (birth). Potential confounders and modifiers were evaluated in categories: temporality (time between urine and placenta sample), fetal sex, demographics, social advantage, reproductive history, medication use, nutrition and adiposity. Molar and quantile normalized phthalates were calculated to facilitate comparison of placental and urinary levels. RESULTS: Metabolites detectable in >80% of both urine and placental samples were MEP, MnBP, MBzP, MECPP, MEOHP, MEHHP, and MEHP. MEP was most abundant in urine (geometric mean [GM] 7.00 ×102 nmol/l) and in placental tissue (GM 2.56 ×104 nmol/l). MEHP was the least abundant in urine (GM 5.32 ×101 nmol/l) and second most abundant in placental tissue (2.04 ×104 nmol/l). In aggregate, MEHP differed the most between urine and placenta (2.21 log units), and MEHHP differed the least (0.07 log units). MECPP was positively associated between urine and placenta (regression coefficient: 0.31 95% CI 0.09, 0.53). Other urine-placenta metabolite associations were modified by measures of social advantage, reproductive history, medication use, and adiposity. CONCLUSION: Phthalates were ubiquitous in 50 full-term placental samples, as has already been shown in maternal urine. MEP and MEHP were the most abundant. Measurement and comparison of urinary and placental phthalates can advance knowledge on phthalate toxicity in pregnancy and provide insight into the validity and accuracy of relying on maternal urinary concentrations to estimate placental exposures. IMPACT STATEMENT: This is the first report of correlations/associations of urinary and placental tissue phthalates in human pregnancy. Epidemiologists have relied exclusively on maternal urinary phthalate metabolite concentrations to assess exposures in pregnant women and risk to their fetuses. Even though it has not yet been confirmed empirically, it is widely assumed that urinary concentrations are strongly and positively correlated with placental and fetal levels. Our data suggest that may not be the case, and these associations may vary by phthalate metabolite and associations may be modified by measures of social advantage, reproductive history, medication use, and adiposity.


Asunto(s)
Contaminantes Ambientales , Ácidos Ftálicos , Humanos , Embarazo , Femenino , Placenta , Ácidos Ftálicos/orina , Trimestres del Embarazo , Obesidad , Contaminantes Ambientales/orina , Exposición a Riesgos Ambientales , Exposición Materna
19.
Genome Med ; 15(1): 88, 2023 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-37904203

RESUMEN

BACKGROUND: Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD: To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS: We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION: We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.


Asunto(s)
Enfermedad de Alzheimer , Aprendizaje Automático , Animales , Ratones , Redes Neurales de la Computación , Genotipo , Fenotipo , Enfermedad de Alzheimer/genética
20.
J Autism Dev Disord ; 52(8): 3712-3717, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34318432

RESUMEN

Little is known on the financial well-being of families raising children with autism spectrum disorders (ASD). Family financial well-being has important impacts on the development of children with ASD. The study uses a 2019 survey collected from Chinese families raising a child with ASD (N = 3064) to examine their financial well-being and its association with health expenditures for children. Extensive control variables (i.e., demographic and socioeconomic characteristics of children, respondents, and their families) are adjusted in analyses. Findings suggest that the amount of health expenditures is negatively associated with respondents' perception of their financial status. The significance of health expenditures disappears after household material hardship is adjusted. Health expenditures affect financial well-being mainly through resource competitions against family needs.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Niño , China , Costo de Enfermedad , Gastos en Salud , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA