Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 176(3): 535-548.e24, 2019 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-30661751

RESUMO

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.


Assuntos
Previsões/métodos , Precursores de RNA/genética , Splicing de RNA/genética , Algoritmos , Processamento Alternativo/genética , Transtorno Autístico/genética , Aprendizado Profundo , Éxons/genética , Humanos , Deficiência Intelectual/genética , Íntrons/genética , Redes Neurais de Computação , Precursores de RNA/metabolismo , Sítios de Splice de RNA/genética , Sítios de Splice de RNA/fisiologia
2.
Proc Natl Acad Sci U S A ; 119(11): e2106053119, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35275789

RESUMO

SignificanceDeep profiling of the plasma proteome at scale has been a challenge for traditional approaches. We achieve superior performance across the dimensions of precision, depth, and throughput using a panel of surface-functionalized superparamagnetic nanoparticles in comparison to conventional workflows for deep proteomics interrogation. Our automated workflow leverages competitive nanoparticle-protein binding equilibria that quantitatively compress the large dynamic range of proteomes to an accessible scale. Using machine learning, we dissect the contribution of individual physicochemical properties of nanoparticles to the composition of protein coronas. Our results suggest that nanoparticle functionalization can be tailored to protein sets. This work demonstrates the feasibility of deep, precise, unbiased plasma proteomics at a scale compatible with large-scale genomics enabling multiomic studies.


Assuntos
Proteínas Sanguíneas , Aprendizado Profundo , Nanopartículas , Proteômica , Proteínas Sanguíneas/química , Nanopartículas/química , Coroa de Proteína/química , Proteoma , Proteômica/métodos
3.
Cell ; 139(3): 623-33, 2009 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-19879847

RESUMO

The C. elegans cell lineage provides a unique opportunity to look at how cell lineage affects patterns of gene expression. We developed an automatic cell lineage analyzer that converts high-resolution images of worms into a data table showing fluorescence expression with single-cell resolution. We generated expression profiles of 93 genes in 363 specific cells from L1 stage larvae and found that cells with identical fates can be formed by different gene regulatory pathways. Molecular signatures identified repeating cell fate modules within the cell lineage and enabled the generation of a molecular differentiation map that reveals points in the cell lineage when developmental fates of daughter cells begin to diverge. These results demonstrate insights that become possible using computational approaches to analyze quantitative expression from many genes in parallel using a digital gene expression atlas.


Assuntos
Caenorhabditis elegans/citologia , Caenorhabditis elegans/genética , Linhagem da Célula , Perfilação da Expressão Gênica , Animais , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans , Diferenciação Celular , Perfilação da Expressão Gênica/métodos
4.
Bioinformatics ; 36(4): 1082-1090, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31584621

RESUMO

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Genoma , Variação Estrutural do Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Filogenia , Análise de Sequência , Software
5.
Nat Methods ; 14(4): 414-416, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28263960

RESUMO

We present single-cell interpretation via multikernel learning (SIMLR), an analytic framework and software which learns a similarity measure from single-cell RNA-seq data in order to perform dimension reduction, clustering and visualization. On seven published data sets, we benchmark SIMLR against state-of-the-art methods. We show that SIMLR is scalable and greatly enhances clustering performance while improving the visualization and interpretability of single-cell sequencing data.


Assuntos
Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Algoritmos , Biologia Computacional/métodos , Humanos , Neutrófilos/citologia , Neutrófilos/fisiologia
6.
Nat Methods ; 14(9): 915-920, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28714986

RESUMO

In read cloud approaches, microfluidic partitioning of long genomic DNA fragments and barcoding of shorter fragments derived from these fragments retains long-range information in short sequencing reads. This combination of short reads with long-range information represents a powerful alternative to single-molecule long-read sequencing. We develop Genome-wide Reconstruction of Complex Structural Variants (GROC-SVs) for SV detection and assembly from read cloud data and apply this method to Illumina-sequenced 10x Genomics sarcoma and breast cancer data sets. Compared with short-fragment sequencing, GROC-SVs substantially improves the specificity of breakpoint detection at comparable sensitivity. This approach also performs sequence assembly across multiple breakpoints simultaneously, enabling the reconstruction of events exhibiting remarkable complexity. We show that chromothriptic rearrangements occurred before copy number amplifications, and that rates of single-nucleotide variants and SVs are not correlated. Our results support the use of read cloud approaches to advance the characterization of large and complex structural variation.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Variação Genética/genética , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
7.
Proteomics ; 18(2)2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29265724

RESUMO

SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples, is presented here. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. SIMLR is available on https://github.com/BatzoglouLabSU/SIMLRGitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org.


Assuntos
Genômica/métodos , Aprendizado de Máquina , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Algoritmos , Humanos
8.
BMC Genomics ; 19(1): 467, 2018 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-29914369

RESUMO

BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS: To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS: HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.


Assuntos
Genoma Humano , Haplótipos , Mutação , Software , Algoritmos , Biologia Computacional , Análise Mutacional de DNA , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
9.
Genome Res ; 25(2): 280-9, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25273070

RESUMO

Identity-by-descent (IBD) inference is the problem of establishing a genetic connection between two individuals through a genomic segment that is inherited by both individuals from a recent common ancestor. IBD inference is an important preceding step in a variety of population genomic studies, ranging from demographic studies to linking genomic variation with phenotype and disease. The problem of accurate IBD detection has become increasingly challenging with the availability of large collections of human genotypes and genomes: Given a cohort's size, a quadratic number of pairwise genome comparisons must be performed. Therefore, computation time and the false discovery rate can also scale quadratically. To enable accurate and efficient large-scale IBD detection, we present Parente2, a novel method for detecting IBD segments. Parente2 is based on an embedded log-likelihood ratio and uses a model that accounts for linkage disequilibrium by explicitly modeling haplotype frequencies. Parente2 operates directly on genotype data without the need to phase data prior to IBD inference. We evaluate Parente2's performance through extensive simulations using real data, and we show that it provides substantially higher accuracy compared to previous state-of-the-art methods while maintaining high computational efficiency.


Assuntos
Testes Genéticos/métodos , Genômica/métodos , Linhagem , Algoritmos , Conjuntos de Dados como Assunto , Ligação Genética , Testes Genéticos/normas , Genômica/normas , Haplótipos , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
Genome Res ; 25(10): 1570-80, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26286554

RESUMO

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.


Assuntos
Variação Genética , Genoma Humano , Análise de Sequência de DNA/métodos , Algoritmos , Carcinoma Ductal/genética , Carcinoma Ductal de Mama/genética , Fragmentação do DNA , Humanos , Alinhamento de Sequência/métodos
11.
PLoS Comput Biol ; 13(10): e1005621, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29023470

RESUMO

Biological networks entail important topological features and patterns critical to understanding interactions within complicated biological systems. Despite a great progress in understanding their structure, much more can be done to improve our inference and network analysis. Spectral methods play a key role in many network-based applications. Fundamental to spectral methods is the Laplacian, a matrix that captures the global structure of the network. Unfortunately, the Laplacian does not take into account intricacies of the network's local structure and is sensitive to noise in the network. These two properties are fundamental to biological networks and cannot be ignored. We propose an alternative matrix Vicus. The Vicus matrix captures the local neighborhood structure of the network and thus is more effective at modeling biological interactions. We demonstrate the advantages of Vicus in the context of spectral methods by extensive empirical benchmarking on tasks such as single cell dimensionality reduction, protein module discovery and ranking genes for cancer subtyping. Our experiments show that using Vicus, spectral methods result in more accurate and robust performance in all of these tasks.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Mapeamento de Interação de Proteínas/métodos , Animais , Análise por Conglomerados , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Leucócitos/metabolismo , Camundongos , Modelos Biológicos , Neoplasias/genética , Neoplasias/metabolismo
12.
Proc Natl Acad Sci U S A ; 112(10): E1116-25, 2015 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-25713363

RESUMO

Follicular lymphoma (FL) is incurable with conventional therapies and has a clinical course typified by multiple relapses after therapy. These tumors are genetically characterized by B-cell leukemia/lymphoma 2 (BCL2) translocation and mutation of genes involved in chromatin modification. By analyzing purified tumor cells, we identified additional novel recurrently mutated genes and confirmed mutations of one or more chromatin modifier genes within 96% of FL tumors and two or more in 76% of tumors. We defined the hierarchy of somatic mutations arising during tumor evolution by analyzing the phylogenetic relationship of somatic mutations across the coding genomes of 59 sequentially acquired biopsies from 22 patients. Among all somatically mutated genes, CREBBP mutations were most significantly enriched within the earliest inferable progenitor. These mutations were associated with a signature of decreased antigen presentation characterized by reduced transcript and protein abundance of MHC class II on tumor B cells, in line with the role of CREBBP in promoting class II transactivator (CIITA)-dependent transcriptional activation of these genes. CREBBP mutant B cells stimulated less proliferation of T cells in vitro compared with wild-type B cells from the same tumor. Transcriptional signatures of tumor-infiltrating T cells were indicative of reduced proliferation, and this corresponded to decreased frequencies of tumor-infiltrating CD4 helper T cells and CD8 memory cytotoxic T cells. These observations therefore implicate CREBBP mutation as an early event in FL evolution that contributes to immune evasion via decreased antigen presentation.


Assuntos
Células Apresentadoras de Antígenos/imunologia , Linfoma Folicular/genética , Mutação , Células-Tronco Neoplásicas/patologia , Proteína de Ligação a CREB/genética , Cromatina/metabolismo , Citometria de Fluxo , Antígenos de Histocompatibilidade Classe II/genética , Humanos , Linfoma Folicular/imunologia , Reação em Cadeia da Polimerase
13.
Bioinformatics ; 32(12): i216-i224, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307620

RESUMO

MOTIVATION: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads. RESULTS: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads. AVAILABILITY AND IMPLEMENTATION: Our source code is freely available at https://github.com/kuleshov/architect CONTACT: kuleshov@stanford.edu.


Assuntos
Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
14.
Bioinformatics ; 32(11): 1686-96, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-26353840

RESUMO

MOTIVATION: Population low-coverage whole-genome sequencing is rapidly emerging as a prominent approach for discovering genomic variation and genotyping a cohort. This approach combines substantially lower cost than full-coverage sequencing with whole-genome discovery of low-allele frequency variants, to an extent that is not possible with array genotyping or exome sequencing. However, a challenging computational problem arises of jointly discovering variants and genotyping the entire cohort. Variant discovery and genotyping are relatively straightforward tasks on a single individual that has been sequenced at high coverage, because the inference decomposes into the independent genotyping of each genomic position for which a sufficient number of confidently mapped reads are available. However, in low-coverage population sequencing, the joint inference requires leveraging the complex linkage disequilibrium (LD) patterns in the cohort to compensate for sparse and missing data in each individual. The potentially massive computation time for such inference, as well as the missing data that confound low-frequency allele discovery, need to be overcome for this approach to become practical. RESULTS: Here, we present Reveel, a novel method for single nucleotide variant calling and genotyping of large cohorts that have been sequenced at low coverage. Reveel introduces a novel technique for leveraging LD that deviates from previous Markov-based models, and which is aimed at computational efficiency as well as accuracy in capturing LD patterns present in rare haplotypes. We evaluate Reveel's performance through extensive simulations as well as real data from the 1000 Genomes Project, and show that it achieves higher accuracy in low-frequency allele discovery and substantially lower computation cost than previous state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: http://reveel.stanford.edu/ CONTACT: : serafim@cs.stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de DNA , Algoritmos , Genótipo , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
15.
Genome Res ; 23(7): 1097-108, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23568837

RESUMO

Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.


Assuntos
Neoplasias da Mama/genética , Transformação Celular Neoplásica/genética , Genoma Humano , Mutação , Alelos , Aneuploidia , Neoplasias da Mama/patologia , Carcinoma/genética , Carcinoma/patologia , Progressão da Doença , Feminino , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único
16.
Genome Res ; 22(9): 1748-59, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955986

RESUMO

Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.


Assuntos
Mapeamento Cromossômico , Ligação Genética , Genoma Humano , Estudo de Associação Genômica Ampla , Sequências Reguladoras de Ácido Nucleico , Sequência de Bases , Sítios de Ligação/genética , Imunoprecipitação da Cromatina , Doença da Artéria Coronariana/genética , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Desequilíbrio de Ligação , Anotação de Sequência Molecular , Dados de Sequência Molecular , Motivos de Nucleotídeos , Fenótipo , Polimorfismo de Nucleotídeo Único , Fator de Transcrição STAT1/metabolismo
17.
Genome Res ; 22(9): 1735-47, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955985

RESUMO

Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.


Assuntos
Montagem e Desmontagem da Cromatina , Heterogeneidade Genética , Sequências Reguladoras de Ácido Nucleico , Sítios de Ligação/genética , Linhagem Celular , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Células K562 , Nucleossomos/genética , Nucleossomos/metabolismo , Ligação Proteica , Software , Sítio de Iniciação de Transcrição
18.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955991

RESUMO

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Assuntos
Imunoprecipitação da Cromatina/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma/genética , Genômica/métodos , Guias como Assunto , Histonas/metabolismo , Humanos , Internet , Fatores de Transcrição/metabolismo
19.
Bioinformatics ; 29(13): i361-70, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23813006

RESUMO

SUMMARY: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. AVAILABILITY: http://viq854.github.com/bwbble.


Assuntos
Genoma Humano , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Variação Genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
20.
Bioinformatics ; 29(13): i18-26, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812982

RESUMO

MOTIVATION: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes. RESULTS: In this article, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal and hypodermal cells) in high-resolution images of adult C.elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum-flow and apply a cross-entropy-based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system. These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C.elegans.


Assuntos
Caenorhabditis elegans/citologia , Perfilação da Expressão Gênica , Imageamento Tridimensional/métodos , Algoritmos , Animais , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Divisão Celular , Linhagem da Célula , Microscopia Confocal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA