Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38951027

RESUMEN

Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.

2.
PLoS Comput Biol ; 19(1): e1010758, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36607897

RESUMEN

Inferring gene co-expression networks is a useful process for understanding gene regulation and pathway activity. The networks are usually undirected graphs where genes are represented as nodes and an edge represents a significant co-expression relationship. When expression data of multiple (p) genes in multiple (K) conditions (e.g., treatments, tissues, strains) are available, joint estimation of networks harnessing shared information across them can significantly increase the power of analysis. In addition, examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. Condition adaptive fused graphical lasso (CFGL) is an existing method that incorporates condition specificity in a fused graphical lasso (FGL) model for estimating multiple co-expression networks. However, with computational complexity of O(p2K log K), the current implementation of CFGL is prohibitively slow even for a moderate number of genes and can only be used for a maximum of three conditions. In this paper, we propose a faster alternative of CFGL named rapid condition adaptive fused graphical lasso (RCFGL). In RCFGL, we incorporate the condition specificity into another popular model for joint network estimation, known as fused multiple graphical lasso (FMGL). We use a more efficient algorithm in the iterative steps compared to CFGL, enabling faster computation with complexity of O(p2K) and making it easily generalizable for more than three conditions. We also present a novel screening rule to determine if the full network estimation problem can be broken down into estimation of smaller disjoint sub-networks, thereby reducing the complexity further. We demonstrate the computational advantage and superior performance of our method compared to two non-condition adaptive methods, FGL and FMGL, and one condition adaptive method, CFGL in both simulation study and real data analysis. We used RCFGL to jointly estimate the gene co-expression networks in different brain regions (conditions) using a cohort of heterogeneous stock rats. We also provide an accommodating C and Python based package that implements RCFGL.


Asunto(s)
Algoritmos , Encéfalo , Animales , Ratas , Simulación por Computador , Redes Reguladoras de Genes/genética
3.
Genome Res ; 30(3): 472-484, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32132109

RESUMEN

Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.


Asunto(s)
Epigénesis Genética , Hematopoyesis/genética , Células Madre Hematopoyéticas/metabolismo , Animales , Ratones , Elementos Reguladores de la Transcripción , Transcriptoma
4.
Biometrics ; 79(3): 2272-2285, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-36056911

RESUMEN

High-throughput biological experiments are essential tools for identifying biologically interesting candidates in large-scale omics studies. The results of a high-throughput biological experiment rely heavily on the operational factors chosen in its experimental and data-analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high-throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high-throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high-throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup-likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well-calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP-seq experiments: How many reads should one sequence to obtain reliable results in a cost-effective way? Our results reveal new insights into the impact of sequencing depth on the binding-site identification reproducibility, helping biologists determine the most cost-effective sequencing depth to achieve sufficient reproducibility for their study goals.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Reproducibilidad de los Resultados , Simulación por Computador , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
5.
Stat Med ; 41(10): 1884-1899, 2022 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-35178743

RESUMEN

High-throughput experiments are an essential part of modern biological and biomedical research. The outcomes of high-throughput biological experiments often have a lot of missing observations due to signals below detection levels. For example, most single-cell RNA-seq (scRNA-seq) protocols experience high levels of dropout due to the small amount of starting material, leading to a majority of reported expression levels being zero. Though missing data contain information about reproducibility, they are often excluded in the reproducibility assessment, potentially generating misleading assessments. In this article, we develop a regression model to assess how the reproducibility of high-throughput experiments is affected by the choices of operational factors (eg, platform or sequencing depth) when a large number of measurements are missing. Using a latent variable approach, we extend correspondence curve regression, a recently proposed method for assessing the effects of operational factors to reproducibility, to incorporate missing values. Using simulations, we show that our method is more accurate in detecting differences in reproducibility than existing measures of reproducibility. We illustrate the usefulness of our method using a single-cell RNA-seq dataset collected on HCT116 cells. We compare the reproducibility of different library preparation platforms and study the effect of sequencing depth on reproducibility, thereby determining the cost-effective sequencing depth that is required to achieve sufficient reproducibility.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Análisis de la Célula Individual/métodos
6.
Nucleic Acids Res ; 48(8): e43, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32086521

RESUMEN

Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.


Asunto(s)
Epigenómica/métodos , Análisis de Secuencia de ADN/métodos , Expresión Génica , Código de Histonas , RNA-Seq , Programas Informáticos
7.
Tohoku J Exp Med ; 258(3): 225-236, 2022 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-36047132

RESUMEN

The therapeutic effects and mechanisms of action of total glucosides of paeony (TGP) in treating ulcerative colitis remain to be clarified. Mouse model of ulcerative colitis was treated with TGP and the indexes including scores of disease activity index, gross morphologic damage and histological damage, and inflammatory and oxidative stress markers were determined. Patients with ulcerative colitis received TGP capsule therapy and the indexes including efficacy of colonoscopy and histology, scores of Ulcerative Colitis Activity Index (UCAI) and Short Inflammatory Bowel Disease Questionnaire (SIBDQ), and inflammatory parameters were assessed. The expressions of toll-like receptor 4 (TLR4) and nuclear factor-kappa B (NF-κB) were measured in colonic tissues of mice and patients. TGP treatment significantly increased weight, decreased scores of disease activity index, gross morphologic damage and histological damage, and reduced the levels of tumor necrosis factor-α, interleukin-1ß, malondialdehyde and myeloperoxidase in mouse model. Patients treated with TGP capsule had significantly higher relief rates of diarrhea, abdominal pain, and bloody purulent stool, decreased UCAI and increased SIBDQ scores, and lower levels of erythrocyte sedimentation rate, C-reactive protein and CD4+/CD8+ T-cell ratio than those patients with routine therapy. The overall response rate of TGP capsule was significantly higher than that of routine therapy. TGP treatment significantly suppressed the expressions of TLR4 and NF-κB in colonic tissues of both mouse model and patients with UC. TGP shows a good therapeutic effect on ulcerative colitis in animals and human patients, and the underlying mechanisms may be related to the inhibition of TLR4/NF-κB signaling by TGP.


Asunto(s)
Colitis Ulcerosa , Glucósidos , Paeonia , Animales , Humanos , Proteína C-Reactiva , Colitis Ulcerosa/tratamiento farmacológico , Glucósidos/farmacología , Glucósidos/uso terapéutico , Interleucina-1beta , Malondialdehído , FN-kappa B/metabolismo , Paeonia/química , Peroxidasa/metabolismo , Transducción de Señal , Receptor Toll-Like 4/metabolismo , Factor de Necrosis Tumoral alfa/metabolismo , Ratones
8.
Genome Res ; 27(11): 1939-1949, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28855260

RESUMEN

Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Biología Computacional/métodos , Línea Celular , Linaje de la Célula , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Programas Informáticos
9.
IUBMB Life ; 72(1): 27-38, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31769130

RESUMEN

Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for 20 cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinations of features (epigenetic states) simultaneously in two dimensions-along chromosomes and across cell types. The result is a segmentation that effectively paints the regulatory landscape in readily interpretable views, revealing constitutively active or silent loci as well as the loci specifically induced or repressed in each stage and lineage. Nuclease accessible DNA segments in active chromatin states were designated candidate cis-regulatory elements in each cell type, providing one of the most comprehensive registries of candidate hematopoietic regulatory elements to date. Applications of VISION resources are illustrated for the regulation of genes encoding GATA1, GATA2, GATA3, and Ikaros. VISION resources are freely available from our website http://usevision.org.


Asunto(s)
Cromatina/metabolismo , Epigenoma , Factores de Transcripción GATA/metabolismo , Regulación de la Expresión Génica , Hematopoyesis , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Animales , Diferenciación Celular , Cromatina/genética , Factores de Transcripción GATA/genética , Humanos
10.
PLoS Comput Biol ; 14(9): e1006436, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30240439

RESUMEN

Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism.


Asunto(s)
Encéfalo/metabolismo , Neoplasias de la Mama/metabolismo , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Miocardio/metabolismo , Algoritmos , Animales , Área Bajo la Curva , Neoplasias de la Mama/genética , Gráficos por Computador , Simulación por Computador , Bases de Datos Factuales , Femenino , Corazón , Humanos , Masculino , Neoplasias/metabolismo , Distribución Normal , Ratas , Programas Informáticos
11.
PLoS Comput Biol ; 14(11): e1006571, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30485278

RESUMEN

Sequencing of the T cell receptor (TCR) repertoire is a powerful tool for deeper study of immune response, but the unique structure of this type of data makes its meaningful quantification challenging. We introduce a new method, the Gamma-GPD spliced threshold model, to address this difficulty. This biologically interpretable model captures the distribution of the TCR repertoire, demonstrates stability across varying sequencing depths, and permits comparative analysis across any number of sampled individuals. We apply our method to several datasets and obtain insights regarding the differentiating features in the T cell receptor repertoire among sampled individuals across conditions. We have implemented our method in the open-source R package powerTCR.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Sistema Inmunológico , Receptores de Antígenos de Linfocitos T/genética , Empalme Alternativo , Animales , Neoplasias Encefálicas/metabolismo , Linfocitos T CD4-Positivos/citología , Células Clonales , Análisis por Conglomerados , Simulación por Computador , Glioblastoma/metabolismo , Humanos , Funciones de Verosimilitud , Pulmón/metabolismo , Ratones , Lenguajes de Programación , Receptores de Antígenos de Linfocitos T/química , Sarcoidosis/metabolismo , Programas Informáticos
12.
Biometrics ; 74(3): 803-813, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29192968

RESUMEN

The outcome of high-throughput biological experiments is affected by many operational factors in the experimental and data-analytical procedures. Understanding how these factors affect the reproducibility of the outcome is critical for establishing workflows that produce replicable discoveries. In this article, we propose a regression framework, based on a novel cumulative link model, to assess the covariate effects of operational factors on the reproducibility of findings from high-throughput experiments. In contrast to existing graphical approaches, our method allows one to succinctly characterize the simultaneous and independent effects of covariates on reproducibility and to compare reproducibility while controlling for potential confounding variables. We also establish a connection between our model and certain Archimedean copula models. This connection not only offers our regression framework an interpretation in copula models, but also provides guidance on choosing the functional forms of the regression. Furthermore, it also opens a new way to interpret and utilize these copulas in the context of reproducibility. Using simulations, we show that our method produces calibrated type I error and is more powerful in detecting difference in reproducibility than existing measures of agreement. We illustrate the usefulness of our method using a ChIP-seq study and a microarray study.


Asunto(s)
Factores de Confusión Epidemiológicos , Ensayos Analíticos de Alto Rendimiento/estadística & datos numéricos , Análisis de Regresión , Algoritmos , Sitios de Unión , Factor de Unión a CCCTC/química , Calibración , Simulación por Computador , Perfilación de la Expresión Génica/estadística & datos numéricos , Ensayos Analíticos de Alto Rendimiento/normas , Humanos , Análisis por Micromatrices/estadística & datos numéricos , Modelos Estadísticos , Reproducibilidad de los Resultados
13.
Comput Stat Data Anal ; 116: 49-66, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29255337

RESUMEN

Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency. A weighted CUSUM type test statistic is proposed for the existence of a threshold at a given expectile, and its asymptotic properties are derived under both the null and the local alternative models. This test only requires fitting the model under the null hypothesis in the absence of a threshold, thus it is computationally more efficient than the likelihood-ratio type tests. Simulation studies show that the proposed estimators and test have desirable finite sample performance in both homoscedastic and heteroscedastic cases. The application of the proposed method on a Dutch growth data and a baseball pitcher salary data reveals interesting insights. The proposed method is implemented in the R package cthreshER.

14.
J Stat Plan Inference ; 185: 41-55, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28943710

RESUMEN

We introduce a rank-based bent linear regression with an unknown change point. Using a linear reparameterization technique, we propose a rank-based estimate that can make simultaneous inference on all model parameters, including the location of the change point, in a computationally efficient manner. We also develop a score-like test for the existence of a change point, based on a weighted CUSUM process. This test only requires fitting the model under the null hypothesis in absence of a change point, thus it is computationally more efficient than likelihood-ratio type tests. The asymptotic properties of the test are derived under both the null and the local alternative models. Simulation studies and two real data examples show that the proposed methods are robust against outliers and heavy-tailed errors in both parameter estimation and hypothesis testing.

15.
BMC Bioinformatics ; 17 Suppl 1: 5, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26818110

RESUMEN

BACKGROUND: Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection. METHODS: We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset. CONCLUSIONS: Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , ARN/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , Humanos , Reproducibilidad de los Resultados
16.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955991

RESUMEN

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Genoma/genética , Genómica/métodos , Guías como Asunto , Histonas/metabolismo , Humanos , Internet , Factores de Transcripción/metabolismo
17.
Nat Methods ; 9(6): 609-14, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22522655

RESUMEN

We evaluated how variations in sequencing depth and other parameters influence interpretation of chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. Using Drosophila melanogaster S2 cells, we generated ChIP-seq data sets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin-state bias: open chromatin regions yielded higher coverage, which led to false positives if not corrected. This bias had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP-library complexity at high coverage. Removal of reads originating at the same base reduced false-positives but had little effect on detection sensitivity. Even at mappable-genome coverage depth of ∼1 read per base pair, ∼1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle data sets with deep coverage.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Cromatina/química , Algoritmos , Animales , Inmunoprecipitación de Cromatina/normas , Proteínas de Drosophila/genética , Drosophila melanogaster , Reacciones Falso Positivas , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , N-Metiltransferasa de Histona-Lisina/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Proteínas Represoras/genética , Sensibilidad y Especificidad
18.
Artículo en Inglés | MEDLINE | ID: mdl-31303886

RESUMEN

Paced by advances in high performance computing, and algorithms for multi-physics and multi-scale simulation, a number of groups have recently established numerical models of flowing blood systems, where cell-scale interactions are explicitly resolved. To be biologically representative, these models account for some or all of: (1) fluid dynamics of the carrier flow, (2) structural dynamics of the cells and vessel walls, (3) interaction and transport biochemistry, and, (4) methods for scaling to physiologically representative numbers of cells. In this article, our interest is the modelling of the tumour micro-environment. We review the broader area of cell-scale resolving blood flow modelling, while focusing on the particular interactions of tumour cells and white blood cells, known to play an important role in metastasis.

19.
PLoS Comput Biol ; 9(11): e1003326, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24244136

RESUMEN

Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.


Asunto(s)
Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reproducibilidad de los Resultados
20.
Nat Commun ; 15(1): 5357, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918381

RESUMEN

Large national-level electronic health record (EHR) datasets offer new opportunities for disentangling the role of genes and environment through deep phenotype information and approximate pedigree structures. Here we use the approximate geographical locations of patients as a proxy for spatially correlated community-level environmental risk factors. We develop a spatial mixed linear effect (SMILE) model that incorporates both genetics and environmental contribution. We extract EHR and geographical locations from 257,620 nuclear families and compile 1083 disease outcome measurements from the MarketScan dataset. We augment the EHR with publicly available environmental data, including levels of particulate matter 2.5 (PM2.5), nitrogen dioxide (NO2), climate, and sociodemographic data. We refine the estimates of genetic heritability and quantify community-level environmental contributions. We also use wind speed and direction as instrumental variables to assess the causal effects of air pollution. In total, we find PM2.5 or NO2 have statistically significant causal effects on 135 diseases, including respiratory, musculoskeletal, digestive, metabolic, and sleep disorders, where PM2.5 and NO2 tend to affect biologically distinct disease categories. These analyses showcase several robust strategies for jointly modeling genetic and environmental effects on disease risk using large EHR datasets and will benefit upcoming biobank studies in the era of precision medicine.


Asunto(s)
Contaminación del Aire , Dióxido de Nitrógeno , Material Particulado , Humanos , Contaminación del Aire/efectos adversos , Material Particulado/efectos adversos , Dióxido de Nitrógeno/efectos adversos , Dióxido de Nitrógeno/análisis , Factores de Riesgo , Exposición a Riesgos Ambientales/efectos adversos , Masculino , Femenino , Registros Electrónicos de Salud , Contaminantes Atmosféricos/efectos adversos , Contaminantes Atmosféricos/análisis , Contaminantes Atmosféricos/toxicidad , Predisposición Genética a la Enfermedad , Interacción Gen-Ambiente , Persona de Mediana Edad , Adulto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA