Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 101
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 175(7): 1827-1841.e17, 2018 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-30550786

RESUMO

Newborn mice emit signals that promote parenting from mothers and fathers but trigger aggressive responses from virgin males. Although pup-directed attacks by males require vomeronasal function, the specific infant cues that elicit this behavior are unknown. We developed a behavioral paradigm based on reconstituted pup cues and showed that discrete infant morphological features combined with salivary chemosignals elicit robust male aggression. Seven vomeronasal receptors were identified based on infant-mediated activity, and the involvement of two receptors, Vmn2r65 and Vmn2r88, in infant-directed aggression was demonstrated by genetic deletion. Using the activation of these receptors as readouts for biochemical fractionation, we isolated two pheromonal compounds, the submandibular gland protein C and hemoglobins. Unexpectedly, none of the identified vomeronasal receptors and associated cues were specific to pups. Thus, infant-mediated aggression by virgin males relies on the recognition of pup's physical traits in addition to parental and infant chemical cues.


Assuntos
Agressão , Órgão Vomeronasal/metabolismo , Animais , Animais Recém-Nascidos , Deleção de Genes , Masculino , Camundongos , Camundongos Mutantes
2.
Nat Methods ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38877315

RESUMO

The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.

3.
Genome Res ; 33(8): 1258-1268, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37699658

RESUMO

Three-dimensional (3D) chromatin structure has been shown to play a role in regulating gene transcription during biological transitions. Although our understanding of loop formation and maintenance is rapidly improving, much less is known about the mechanisms driving changes in looping and the impact of differential looping on gene transcription. One limitation has been a lack of well-powered differential looping data sets. To address this, we conducted a deeply sequenced Hi-C time course of megakaryocyte development comprising four biological replicates and 6 billion reads per time point. Statistical analysis revealed 1503 differential loops. Gained loop anchors were enriched for AP-1 occupancy and were characterized by large increases in histone H3K27ac (over 11-fold) but relatively small increases in CTCF and RAD21 binding (1.26- and 1.23-fold, respectively). Linear modeling revealed that changes in histone H3K27ac, chromatin accessibility, and JUN binding were better correlated with changes in looping than RAD21 and almost as well correlated as CTCF. Changes to epigenetic features between-rather than at-boundaries were highly predictive of changes in looping. Together these data suggest that although CTCF and RAD21 may be the core machinery dictating where loops form, other features (both at the anchors and within the loop boundaries) may play a larger role than previously anticipated in determining the relative loop strength across cell types and conditions.


Assuntos
Cromatina , Histonas , Histonas/metabolismo , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Cromatina/genética , Cromossomos/metabolismo , Diferenciação Celular/genética
4.
Nat Methods ; 20(8): 1187-1195, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37308696

RESUMO

Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética
5.
PLoS Genet ; 19(5): e1010517, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37216410

RESUMO

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.


Assuntos
Análise de Correlação Canônica , Proteômica , Humanos , Proteômica/métodos , Multiômica , Estudos de Coortes
6.
Hum Mol Genet ; 32(3): 402-416, 2023 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-35994039

RESUMO

Genomic imprinting results in gene expression bias caused by parental chromosome of origin and occurs in genes with important roles during human brain development. However, the cell-type and temporal specificity of imprinting during human neurogenesis is generally unknown. By detecting within-donor allelic biases in chromatin accessibility and gene expression that are unrelated to cross-donor genotype, we inferred imprinting in both primary human neural progenitor cells and their differentiated neuronal progeny from up to 85 donors. We identified 43/20 putatively imprinted regulatory elements (IREs) in neurons/progenitors, and 133/79 putatively imprinted genes in neurons/progenitors. Although 10 IREs and 42 genes were shared between neurons and progenitors, most putative imprinting was only detected within specific cell types. In addition to well-known imprinted genes and their promoters, we inferred novel putative IREs and imprinted genes. Consistent with both DNA methylation-based and H3K27me3-based regulation of imprinted expression, some putative IREs also overlapped with differentially methylated or histone-marked regions. Finally, we identified a progenitor-specific putatively imprinted gene overlapping with copy number variation that is associated with uniparental disomy-like phenotypes. Our results can therefore be useful in interpreting the function of variants identified in future parent-of-origin association studies.


Assuntos
Variações do Número de Cópias de DNA , Metilação de DNA , Humanos , Metilação de DNA/genética , Impressão Genômica/genética , Dissomia Uniparental , Diferenciação Celular/genética
7.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37738402

RESUMO

Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.


Assuntos
Benchmarking , Doenças Estomatognáticas , Criança , Humanos , Pré-Escolar , Biofilmes , Simulação por Computador , Ácido Láctico
8.
Mol Cell ; 67(6): 1037-1048.e6, 2017 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-28890333

RESUMO

The three-dimensional arrangement of the human genome comprises a complex network of structural and regulatory chromatin loops important for coordinating changes in transcription during human development. To better understand the mechanisms underlying context-specific 3D chromatin structure and transcription during cellular differentiation, we generated comprehensive in situ Hi-C maps of DNA loops in human monocytes and differentiated macrophages. We demonstrate that dynamic looping events are regulatory rather than structural in nature and uncover widespread coordination of dynamic enhancer activity at preformed and acquired DNA loops. Enhancer-bound loop formation and enhancer activation of preformed loops together form multi-loop activation hubs at key macrophage genes. Activation hubs connect 3.4 enhancers per promoter and exhibit a strong enrichment for activator protein 1 (AP-1)-binding events, suggesting that multi-loop activation hubs involving cell-type-specific transcription factors represent an important class of regulatory chromatin structures for the spatiotemporal control of transcription.


Assuntos
Diferenciação Celular , Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , DNA/metabolismo , Macrófagos/metabolismo , Fator de Transcrição AP-1/metabolismo , Transcrição Gênica , Sítios de Ligação , Linhagem Celular Tumoral , Cromatina/química , Cromatina/genética , DNA/química , DNA/genética , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Conformação de Ácido Nucleico , Fenótipo , Ligação Proteica , Fatores de Tempo , Fator de Transcrição AP-1/genética
9.
Am J Hum Genet ; 108(9): 1647-1668, 2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-34416157

RESUMO

Interpretation of the function of non-coding risk loci for neuropsychiatric disorders and brain-relevant traits via gene expression and alternative splicing quantitative trait locus (e/sQTL) analyses is generally performed in bulk post-mortem adult tissue. However, genetic risk loci are enriched in regulatory elements active during neocortical differentiation, and regulatory effects of risk variants may be masked by heterogeneity in bulk tissue. Here, we map e/sQTLs, and allele-specific expression in cultured cells representing two major developmental stages, primary human neural progenitors (n = 85) and their sorted neuronal progeny (n = 74), identifying numerous loci not detected in either bulk developing cortical wall or adult cortex. Using colocalization and genetic imputation via transcriptome-wide association, we uncover cell-type-specific regulatory mechanisms underlying risk for brain-relevant traits that are active during neocortical differentiation. Specifically, we identified a progenitor-specific eQTL for CENPW co-localized with common variant associations for cortical surface area and educational attainment.


Assuntos
Proteínas Cromossômicas não Histona/genética , Regulação da Expressão Gênica no Desenvolvimento , Neocórtex/metabolismo , Neurogênese/genética , Neurônios/metabolismo , Locos de Características Quantitativas , Alelos , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Diferenciação Celular , Cromatina/química , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/metabolismo , Mapeamento Cromossômico , Escolaridade , Feminino , Feto , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Masculino , Neocórtex/citologia , Neocórtex/crescimento & desenvolvimento , Células-Tronco Neurais/citologia , Células-Tronco Neurais/metabolismo , Neurônios/citologia , Neuroticismo , Doença de Parkinson/diagnóstico , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , Cultura Primária de Células , Prognóstico , Esquizofrenia/diagnóstico , Esquizofrenia/genética , Esquizofrenia/metabolismo , Transcriptoma
10.
Biostatistics ; 24(2): 388-405, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-33948626

RESUMO

The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.


Assuntos
Teorema de Bayes , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/análise , Isoformas de Proteínas/metabolismo , Análise de Sequência de RNA/métodos
11.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37042725

RESUMO

MOTIVATION: Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS: bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION: bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.


Assuntos
Genoma , Genômica , Genômica/métodos , Software
12.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37084270

RESUMO

MOTIVATION: Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. RESULTS: To address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.


Assuntos
Genômica , Software , Genômica/métodos , Genoma , Sequências Reguladoras de Ácido Nucleico , Projetos de Pesquisa
13.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-37067481

RESUMO

SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.


Assuntos
Genoma Humano , Software , Animais , Humanos , Camundongos , Incerteza
14.
PLoS Genet ; 17(3): e1009398, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33684137

RESUMO

Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Software , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Genéticos , Especificidade de Órgãos/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Reprodutibilidade dos Testes
15.
PLoS Genet ; 17(10): e1009865, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34699533

RESUMO

Chromatin accessibility and gene expression in relevant cell contexts can guide identification of regulatory elements and mechanisms at genome-wide association study (GWAS) loci. To identify regulatory elements that display differential activity across adipocyte differentiation, we performed ATAC-seq and RNA-seq in a human cell model of preadipocytes and adipocytes at days 4 and 14 of differentiation. For comparison, we created a consensus map of ATAC-seq peaks in 11 human subcutaneous adipose tissue samples. We identified 58,387 context-dependent chromatin accessibility peaks and 3,090 context-dependent genes between all timepoint comparisons (log2 fold change>1, FDR<5%) with 15,919 adipocyte- and 18,244 preadipocyte-dependent peaks. Adipocyte-dependent peaks showed increased overlap (60.1%) with Roadmap Epigenomics adipocyte nuclei enhancers compared to preadipocyte-dependent peaks (11.5%). We linked context-dependent peaks to genes based on adipocyte promoter capture Hi-C data, overlap with adipose eQTL variants, and context-dependent gene expression. Of 16,167 context-dependent peaks linked to a gene, 5,145 were linked by two or more strategies to 1,670 genes. Among GWAS loci for cardiometabolic traits, adipocyte-dependent peaks, but not preadipocyte-dependent peaks, showed significant enrichment (LD score regression P<0.005) for waist-to-hip ratio and modest enrichment (P < 0.05) for HDL-cholesterol. We identified 659 peaks linked to 503 genes by two or more approaches and overlapping a GWAS signal, suggesting a regulatory mechanism at these loci. To identify variants that may alter chromatin accessibility between timepoints, we identified 582 variants in 454 context-dependent peaks that demonstrated allelic imbalance in accessibility (FDR<5%), of which 55 peaks also overlapped GWAS variants. At one GWAS locus for palmitoleic acid, rs603424 was located in an adipocyte-dependent peak linked to SCD and exhibited allelic differences in transcriptional activity in adipocytes (P = 0.003) but not preadipocytes (P = 0.09). These results demonstrate that context-dependent peaks and genes can guide discovery of regulatory variants at GWAS loci and aid identification of regulatory mechanisms.


Assuntos
Diferenciação Celular/genética , Cromatina/genética , Expressão Gênica/genética , Locos de Características Quantitativas/genética , Adipócitos/metabolismo , Tecido Adiposo/metabolismo , Alelos , Desequilíbrio Alélico/genética , Sítios de Ligação/genética , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/metabolismo , Cromatina/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Epigenômica/métodos , Técnicas Genéticas , Estudo de Associação Genômica Ampla/métodos , Humanos , Doenças Metabólicas/genética , Doenças Metabólicas/metabolismo , Regiões Promotoras Genéticas/genética , Sequências Reguladoras de Ácido Nucleico/genética
16.
PLoS Genet ; 17(4): e1009455, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33872308

RESUMO

Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus's estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Locos de Características Quantitativas/genética , Simulação por Computador , Regulação da Expressão Gênica/genética , Humanos , Desequilíbrio de Ligação , Análise da Randomização Mendeliana , Modelos Genéticos , Transcriptoma/genética
17.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32789507

RESUMO

The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.


Assuntos
Neoplasias da Mama , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , RNA Neoplásico , RNA , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , RNA/biossíntese , RNA/genética , RNA Neoplásico/biossíntese , RNA Neoplásico/genética
18.
Bioinformatics ; 38(10): 2773-2780, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561168

RESUMO

MOTIVATION: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected. RESULTS: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes. AVAILABILITY AND IMPLEMENTATION: The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Desequilíbrio Alélico , Modelos Estatísticos , Alelos , Teorema de Bayes , Simulação por Computador , Software
19.
Bioinformatics ; 38(7): 2042-2045, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35134826

RESUMO

MOTIVATION: The R programming language is one of the most widely used programming languages for transforming raw genomic datasets into meaningful biological conclusions through analysis and visualization, which has been largely facilitated by infrastructure and tools developed by the Bioconductor project. However, existing plotting packages rely on relative positioning and sizing of plots, which is often sufficient for exploratory analysis but is poorly suited for the creation of publication-quality multi-panel images inherent to scientific manuscript preparation. RESULTS: We present plotgardener, a coordinate-based genomic data visualization package that offers a new paradigm for multi-plot figure generation in R. Plotgardener allows precise, programmatic control over the placement, esthetics and arrangements of plots while maximizing user experience through fast and memory-efficient data access, support for a wide variety of data and file types, and tight integration with the Bioconductor environment. Plotgardener also allows precise placement and sizing of ggplot2 plots, making it an invaluable tool for R users and data scientists from virtually any discipline. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/plotgardener, Code: https://github.com/PhanstielLab/plotgardener, Documentation: https://phanstiellab.github.io/plotgardener/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Linguagens de Programação , Software , Genômica , Genoma , Visualização de Dados
20.
Nucleic Acids Res ; 49(8): e48, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33524140

RESUMO

Targeted mRNA expression panels, measuring up to 800 genes, are used in academic and clinical settings due to low cost and high sensitivity for archived samples. Most samples assayed on targeted panels originate from bulk tissue comprised of many cell types, and cell-type heterogeneity confounds biological signals. Reference-free methods are used when cell-type-specific expression references are unavailable, but limited feature spaces render implementation challenging in targeted panels. Here, we present DeCompress, a semi-reference-free deconvolution method for targeted panels. DeCompress leverages a reference RNA-seq or microarray dataset from similar tissue to expand the feature space of targeted panels using compressed sensing. Ensemble reference-free deconvolution is performed on this artificially expanded dataset to estimate cell-type proportions and gene signatures. In simulated mixtures, four public cell line mixtures, and a targeted panel (1199 samples; 406 genes) from the Carolina Breast Cancer Study, DeCompress recapitulates cell-type proportions with less error than reference-free methods and finds biologically relevant compartments. We integrate compartment estimates into cis-eQTL mapping in breast cancer, identifying a tumor-specific cis-eQTL for CCR3 (C-C Motif Chemokine Receptor 3) at a risk locus. DeCompress improves upon reference-free methods without requiring expression profiles from pure cell populations, with applications in genomic analyses and clinical settings.


Assuntos
Benchmarking/métodos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , RNA Mensageiro/metabolismo , Algoritmos , Animais , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Bases de Dados Genéticas , Feminino , Genômica , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Masculino , Neoplasias/metabolismo , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Locos de Características Quantitativas , RNA Mensageiro/genética , RNA-Seq , Receptores CCR3/genética , Receptores CCR3/metabolismo , Análise de Célula Única
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa