Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
bioRxiv ; 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-37066352

RESUMO

Knowledge of locations and activities of cis -regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our V al i dated S ystematic I ntegrati on (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.

2.
bioRxiv ; 2023 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-37333291

RESUMO

Spatial transcriptomics (ST) profiles gene expression in intact tissues. However, ST data measured at each spatial location may represent gene expression of multiple cell types, making it difficult to identify cell-type-specific transcriptional variation across spatial contexts. Existing cell-type deconvolutions of ST data often require single-cell transcriptomic references, which can be limited by availability, completeness and platform effect of such references. We present RETROFIT, a reference-free Bayesian method that produces sparse and interpretable solutions to deconvolve cell types underlying each location independent of single-cell transcriptomic references. Results from synthetic and real ST datasets acquired by Slide-seq and Visium platforms demonstrate that RETROFIT outperforms existing reference-based and reference-free methods in estimating cell-type composition and reconstructing gene expression. Applying RETROFIT to human intestinal development ST data reveals spatiotemporal patterns of cellular composition and transcriptional specificity. RETROFIT is available at https://bioconductor.org/packages/release/bioc/html/retrofit.html.

3.
PLoS Comput Biol ; 19(1): e1010758, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36607897

RESUMO

Inferring gene co-expression networks is a useful process for understanding gene regulation and pathway activity. The networks are usually undirected graphs where genes are represented as nodes and an edge represents a significant co-expression relationship. When expression data of multiple (p) genes in multiple (K) conditions (e.g., treatments, tissues, strains) are available, joint estimation of networks harnessing shared information across them can significantly increase the power of analysis. In addition, examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. Condition adaptive fused graphical lasso (CFGL) is an existing method that incorporates condition specificity in a fused graphical lasso (FGL) model for estimating multiple co-expression networks. However, with computational complexity of O(p2K log K), the current implementation of CFGL is prohibitively slow even for a moderate number of genes and can only be used for a maximum of three conditions. In this paper, we propose a faster alternative of CFGL named rapid condition adaptive fused graphical lasso (RCFGL). In RCFGL, we incorporate the condition specificity into another popular model for joint network estimation, known as fused multiple graphical lasso (FMGL). We use a more efficient algorithm in the iterative steps compared to CFGL, enabling faster computation with complexity of O(p2K) and making it easily generalizable for more than three conditions. We also present a novel screening rule to determine if the full network estimation problem can be broken down into estimation of smaller disjoint sub-networks, thereby reducing the complexity further. We demonstrate the computational advantage and superior performance of our method compared to two non-condition adaptive methods, FGL and FMGL, and one condition adaptive method, CFGL in both simulation study and real data analysis. We used RCFGL to jointly estimate the gene co-expression networks in different brain regions (conditions) using a cohort of heterogeneous stock rats. We also provide an accommodating C and Python based package that implements RCFGL.


Assuntos
Algoritmos , Encéfalo , Animais , Ratos , Simulação por Computador , Redes Reguladoras de Genes/genética
4.
Biometrics ; 79(3): 2272-2285, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36056911

RESUMO

High-throughput biological experiments are essential tools for identifying biologically interesting candidates in large-scale omics studies. The results of a high-throughput biological experiment rely heavily on the operational factors chosen in its experimental and data-analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high-throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high-throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high-throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup-likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well-calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP-seq experiments: How many reads should one sequence to obtain reliable results in a cost-effective way? Our results reveal new insights into the impact of sequencing depth on the binding-site identification reproducibility, helping biologists determine the most cost-effective sequencing depth to achieve sufficient reproducibility for their study goals.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
Nat Commun ; 13(1): 6874, 2022 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-36371401

RESUMO

Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.


Assuntos
Genoma , Genômica , Teorema de Bayes , Análise por Conglomerados , Análise de Sequência de DNA/métodos
6.
Tohoku J Exp Med ; 258(3): 225-236, 2022 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-36047132

RESUMO

The therapeutic effects and mechanisms of action of total glucosides of paeony (TGP) in treating ulcerative colitis remain to be clarified. Mouse model of ulcerative colitis was treated with TGP and the indexes including scores of disease activity index, gross morphologic damage and histological damage, and inflammatory and oxidative stress markers were determined. Patients with ulcerative colitis received TGP capsule therapy and the indexes including efficacy of colonoscopy and histology, scores of Ulcerative Colitis Activity Index (UCAI) and Short Inflammatory Bowel Disease Questionnaire (SIBDQ), and inflammatory parameters were assessed. The expressions of toll-like receptor 4 (TLR4) and nuclear factor-kappa B (NF-κB) were measured in colonic tissues of mice and patients. TGP treatment significantly increased weight, decreased scores of disease activity index, gross morphologic damage and histological damage, and reduced the levels of tumor necrosis factor-α, interleukin-1ß, malondialdehyde and myeloperoxidase in mouse model. Patients treated with TGP capsule had significantly higher relief rates of diarrhea, abdominal pain, and bloody purulent stool, decreased UCAI and increased SIBDQ scores, and lower levels of erythrocyte sedimentation rate, C-reactive protein and CD4+/CD8+ T-cell ratio than those patients with routine therapy. The overall response rate of TGP capsule was significantly higher than that of routine therapy. TGP treatment significantly suppressed the expressions of TLR4 and NF-κB in colonic tissues of both mouse model and patients with UC. TGP shows a good therapeutic effect on ulcerative colitis in animals and human patients, and the underlying mechanisms may be related to the inhibition of TLR4/NF-κB signaling by TGP.


Assuntos
Colite Ulcerativa , Glucosídeos , Paeonia , Animais , Humanos , Proteína C-Reativa , Colite Ulcerativa/tratamento farmacológico , Glucosídeos/farmacologia , Glucosídeos/uso terapêutico , Interleucina-1beta , Malondialdeído , NF-kappa B/metabolismo , Paeonia/química , Peroxidase/metabolismo , Transdução de Sinais , Receptor 4 Toll-Like/metabolismo , Fator de Necrose Tumoral alfa/metabolismo , Camundongos
7.
Nutrients ; 14(8)2022 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-35458125

RESUMO

Vitamin A (VA) deficiency and diarrheal diseases are both serious public health issues worldwide. VA deficiency is associated with impaired intestinal barrier function and increased risk of mucosal infection-related mortality. The bioactive form of VA, retinoic acid, is a well-known regulator of mucosal integrity. Using Citrobacter rodentium-infected mice as a model for diarrheal diseases in humans, previous studies showed that VA-deficient (VAD) mice failed to clear C. rodentium as compared to their VA-sufficient (VAS) counterparts. However, the distinct intestinal gene responses that are dependent on the host's VA status still need to be discovered. The mRNAs extracted from the small intestine (SI) and the colon were sequenced and analyzed on three levels: differential gene expression, enrichment, and co-expression. C. rodentium infection interacted differentially with VA status to alter colon gene expression. Novel functional categories downregulated by this pathogen were identified, highlighted by genes related to the metabolism of VA, vitamin D, and ion transport, including improper upregulation of Cl- secretion and disrupted HCO3- metabolism. Our results suggest that derangement of micronutrient metabolism and ion transport, together with the compromised immune responses in VAD hosts, may be responsible for the higher mortality to C. rodentium under conditions of inadequate VA.


Assuntos
Infecções por Enterobacteriaceae , Deficiência de Vitamina A , Animais , Citrobacter rodentium , Colo/metabolismo , Diarreia/complicações , Mucosa Intestinal/metabolismo , Intestino Delgado/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Vitamina A/metabolismo , Deficiência de Vitamina A/complicações
8.
Stat Med ; 41(10): 1884-1899, 2022 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-35178743

RESUMO

High-throughput experiments are an essential part of modern biological and biomedical research. The outcomes of high-throughput biological experiments often have a lot of missing observations due to signals below detection levels. For example, most single-cell RNA-seq (scRNA-seq) protocols experience high levels of dropout due to the small amount of starting material, leading to a majority of reported expression levels being zero. Though missing data contain information about reproducibility, they are often excluded in the reproducibility assessment, potentially generating misleading assessments. In this article, we develop a regression model to assess how the reproducibility of high-throughput experiments is affected by the choices of operational factors (eg, platform or sequencing depth) when a large number of measurements are missing. Using a latent variable approach, we extend correspondence curve regression, a recently proposed method for assessing the effects of operational factors to reproducibility, to incorporate missing values. Using simulations, we show that our method is more accurate in detecting differences in reproducibility than existing measures of reproducibility. We illustrate the usefulness of our method using a single-cell RNA-seq dataset collected on HCT116 cells. We compare the reproducibility of different library preparation platforms and study the effect of sequencing depth on reproducibility, thereby determining the cost-effective sequencing depth that is required to achieve sufficient reproducibility.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Análise de Célula Única/métodos
9.
Methods Mol Biol ; 2301: 17-37, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34415529

RESUMO

Hi-C experiments are costly to perform and involve multiple complex experimental steps. Reproducibility of Hi-C data is essential for ensuring the validity of the scientific conclusions drawn from the data. In this chapter, we describe several recently developed computational methods for assessing reproducibility of Hi-C replicate experiments. These methods can also be used to assess the similarity between any two Hi-C samples.


Assuntos
Reprodutibilidade dos Testes , Software
10.
J Nutr Biochem ; 98: 108814, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34242724

RESUMO

Vitamin A (VA) deficiency remains prevalent in resource limited areas. Using Citrobacter rodentium infection in mice as a model for diarrheal diseases, previous reports showed reduced pathogen clearance and survival due to vitamin A deficient (VAD) status. To characterize the impact of preexisting VA deficiency on gene expression patterns in the intestines, and to discover novel target genes in VA-related biological pathways, VA deficiency in mice were induced by diet. Total mRNAs were extracted from small intestine (SI) and colon, and sequenced. Differentially Expressed Gene (DEG), Gene Ontology (GO) enrichment, and co-expression network analyses were performed. DEGs compared between VAS and VAD groups detected 49 SI and 94 colon genes. By GO information, SI DEGs were significantly enriched in categories relevant to retinoid metabolic process, molecule binding, and immune function. Three co-expression modules showed significant correlation with VA status in SI; these modules contained four known retinoic acid targets. In addition, other SI genes of interest (e.g., Mbl2, Cxcl14, and Nr0b2) in these modules were suggested as new candidate genes regulated by VA. Furthermore, our analysis showed that markers of two cell types in SI, mast cells and Tuft cells, were significantly altered by VA status. In colon, "cell division" was the only enriched category and was negatively associated with VA. Thus, these data suggested that SI and colon have distinct networks under the regulation of dietary VA, and that preexisting VA deficiency could have a significant impact on the host response to a variety of disease conditions.


Assuntos
Colo/metabolismo , Intestino Delgado/metabolismo , RNA-Seq/métodos , Deficiência de Vitamina A/genética , Animais , Citrobacter rodentium , Infecções por Enterobacteriaceae/genética , Infecções por Enterobacteriaceae/microbiologia , Perfilação da Expressão Gênica/métodos , Ontologia Genética , Camundongos , Camundongos Endogâmicos C57BL , RNA Mensageiro/genética , Transcriptoma , Tretinoína/metabolismo , Vitamina A/genética , Vitamina A/metabolismo
11.
Nat Commun ; 12(1): 1964, 2021 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-33785739

RESUMO

Genome-wide association meta-analysis (GWAMA) is an effective approach to enlarge sample sizes and empower the discovery of novel associations between genotype and phenotype. Independent replication has been used as a gold-standard for validating genetic associations. However, as current GWAMA often seeks to aggregate all available datasets, it becomes impossible to find a large enough independent dataset to replicate new discoveries. Here we introduce a method, MAMBA (Meta-Analysis Model-based Assessment of replicability), for assessing the "posterior-probability-of-replicability" for identified associations by leveraging the strength and consistency of association signals between contributing studies. We demonstrate using simulations that MAMBA is more powerful and robust than existing methods, and produces more accurate genetic effects estimates. We apply MAMBA to a large-scale meta-analysis of addiction phenotypes with 1.2 million individuals. In addition to accurately identifying replicable common variant associations, MAMBA also pinpoints novel replicable rare variant associations from imputation-based GWAMA and hence greatly expands the set of analyzable variants.


Assuntos
Algoritmos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Metanálise como Assunto , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Reprodutibilidade dos Testes , Tamanho da Amostra , Software
12.
Genome Res ; 30(3): 472-484, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32132109

RESUMO

Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.


Assuntos
Epigênese Genética , Hematopoese/genética , Células-Tronco Hematopoéticas/metabolismo , Animais , Camundongos , Elementos Reguladores de Transcrição , Transcriptoma
13.
Nucleic Acids Res ; 48(8): e43, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32086521

RESUMO

Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.


Assuntos
Epigenômica/métodos , Análise de Sequência de DNA/métodos , Expressão Gênica , Código das Histonas , RNA-Seq , Software
14.
J Healthc Inform Res ; 4(1): 91-109, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35415437

RESUMO

With wearable, relatively unobtrusive health monitors and smartphone sensors, it is increasingly easy to collect continuously streaming physiological data in a passive mode without placing much burden on participants. At the same time, smartphones provide the ability to survey participants to provide "ground-truth" reporting on psychological states, although this comes at an increased cost in participant burden. In this paper, we examined how analytical approaches from the field of machine learning could allow us to distill the collected physiological data into actionable decision rules about each individual's psychological state, with the eventual goal of identifying important psychological states (e.g., risk moments) without the need for ongoing burdensome active assessment (e.g., self-report). As a first step towards this goal, we compared two methods: (1) a k-nearest neighbor classifier that uses dynamic time warping distance, and (2) a random forests classifier to predict low and high states of affective arousal states based on features extracted using the tsfresh python package. Then, we compared random-forest-based predictive models tailored for the individual with individual-general models. Results showed that the individual-specific model outperformed the general one. Our results support the feasibility of using passively collected wearable data to predict psychological states, suggesting that by relying on both types of data, the active collection can be reduced or eliminated.

15.
IUBMB Life ; 72(1): 27-38, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31769130

RESUMO

Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.


Assuntos
Cromatina/metabolismo , Epigenoma , Fatores de Transcrição GATA/metabolismo , Regulação da Expressão Gênica , Hematopoese , Células-Tronco Hematopoéticas/citologia , Células-Tronco Hematopoéticas/metabolismo , Animais , Diferenciação Celular , Cromatina/genética , Fatores de Transcrição GATA/genética , Humanos
16.
Genome Biol ; 20(1): 282, 2019 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-31847870

RESUMO

The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , Algoritmos , Epigênese Genética , Genômica , Software
17.
Genome Biol ; 20(1): 57, 2019 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-30890172

RESUMO

BACKGROUND: Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. RESULTS: Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. CONCLUSIONS: In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.


Assuntos
Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Neoplasias/genética , Controle de Qualidade , Software , Humanos , Reprodutibilidade dos Testes , Células Tumorais Cultivadas
18.
PLoS Comput Biol ; 14(11): e1006571, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30485278

RESUMO

Sequencing of the T cell receptor (TCR) repertoire is a powerful tool for deeper study of immune response, but the unique structure of this type of data makes its meaningful quantification challenging. We introduce a new method, the Gamma-GPD spliced threshold model, to address this difficulty. This biologically interpretable model captures the distribution of the TCR repertoire, demonstrates stability across varying sequencing depths, and permits comparative analysis across any number of sampled individuals. We apply our method to several datasets and obtain insights regarding the differentiating features in the T cell receptor repertoire among sampled individuals across conditions. We have implemented our method in the open-source R package powerTCR.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sistema Imunitário , Receptores de Antígenos de Linfócitos T/genética , Processamento Alternativo , Animais , Neoplasias Encefálicas/metabolismo , Linfócitos T CD4-Positivos/citologia , Células Clonais , Análise por Conglomerados , Simulação por Computador , Glioblastoma/metabolismo , Humanos , Funções Verossimilhança , Pulmão/metabolismo , Camundongos , Linguagens de Programação , Receptores de Antígenos de Linfócitos T/química , Sarcoidose/metabolismo , Software
19.
PLoS Comput Biol ; 14(9): e1006436, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30240439

RESUMO

Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism.


Assuntos
Encéfalo/metabolismo , Neoplasias da Mama/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Miocárdio/metabolismo , Algoritmos , Animais , Área Sob a Curva , Neoplasias da Mama/genética , Gráficos por Computador , Simulação por Computador , Bases de Dados Factuais , Feminino , Coração , Humanos , Masculino , Neoplasias/metabolismo , Distribuição Normal , Ratos , Software
20.
J Am Stat Assoc ; 113(523): 1028-1039, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31249430

RESUMO

The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA