Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 17(2): 137-145, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31792435

RESUMO

Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.


Assuntos
Análise de Célula Única/métodos , Perfilação da Expressão Gênica , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Software
3.
Genome Res ; 29(8): 1235-1249, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31201210

RESUMO

In interphase eukaryotic cells, almost all heterochromatin is located adjacent to the nucleolus or to the nuclear lamina, thus defining nucleolus-associated domains (NADs) and lamina-associated domains (LADs), respectively. Here, we determined the first genome-scale map of murine NADs in mouse embryonic fibroblasts (MEFs) via deep sequencing of chromatin associated with purified nucleoli. We developed a Bioconductor package called NADfinder and demonstrated that it identifies NADs more accurately than other peak-calling tools, owing to its critical feature of chromosome-level local baseline correction. We detected two distinct classes of NADs. Type I NADs associate frequently with both the nucleolar periphery and the nuclear lamina, and generally display characteristics of constitutive heterochromatin, including late DNA replication, enrichment of H3K9me3, and little gene expression. In contrast, Type II NADs associate with nucleoli but do not overlap with LADs. Type II NADs tend to replicate earlier, display greater gene expression, and are more often enriched in H3K27me3 than Type I NADs. The nucleolar associations of both classes of NADs were confirmed via DNA-FISH, which also detected Type I but not Type II probes enriched at the nuclear lamina. Type II NADs are enriched in distinct gene classes, including factors important for differentiation and development. In keeping with this, we observed that a Type II NAD is developmentally regulated, and present in MEFs but not in undifferentiated embryonic stem (ES) cells.


Assuntos
Nucléolo Celular/metabolismo , Fibroblastos/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Genoma , Heterocromatina/classificação , Animais , Nucléolo Celular/ultraestrutura , Células Cultivadas , Mapeamento Cromossômico/métodos , Replicação do DNA , Embrião de Mamíferos , Fibroblastos/ultraestrutura , Heterocromatina/química , Heterocromatina/ultraestrutura , Histonas/genética , Histonas/metabolismo , Hibridização in Situ Fluorescente , Camundongos , Lâmina Nuclear/metabolismo , Lâmina Nuclear/ultraestrutura
4.
PLoS Comput Biol ; 14(5): e1006135, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29723188

RESUMO

Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Humanos
5.
Nat Methods ; 12(2): 115-21, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25633503

RESUMO

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Software , Linguagens de Programação , Interface Usuário-Computador
6.
BMC Genomics ; 18(1): 379, 2017 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-28506212

RESUMO

BACKGROUND: Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. RESULTS: Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. CONCLUSION: The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .


Assuntos
Sistemas CRISPR-Cas/genética , Bases de Dados Genéticas , Desoxirribonucleases/metabolismo , Análise de Sequência de DNA , Software , Estatística como Assunto , Anotação de Sequência Molecular
7.
PLoS Comput Biol ; 9(8): e1003118, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950696

RESUMO

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Software , Algoritmos , Animais , Genômica/normas , Humanos , Camundongos , Alinhamento de Sequência , Análise de Sequência de DNA
8.
BMC Bioinformatics ; 11: 237, 2010 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-20459804

RESUMO

BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. RESULTS: We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. CONCLUSIONS: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.


Assuntos
Imunoprecipitação da Cromatina/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Sítios de Ligação , Genoma
9.
Bioinformatics ; 25(19): 2607-8, 2009 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-19654119

RESUMO

UNLABELLED: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. AVAILABILITY AND IMPLEMENTATION: This package is implemented in R and available at the Bioconductor web site; the package contains a 'vignette' outlining typical work flows.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas , Perfilação da Expressão Gênica
10.
F1000Res ; 8: 21, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30828438

RESUMO

Bioconductor's SummarizedExperiment class unites numerical assay quantifications with sample- and experiment-level metadata.  SummarizedExperiment is the standard Bioconductor class for assays that produce matrix-like data, used by over 200 packages.  We describe the restfulSE package, a deployment of  this data model that supports remote storage.  We illustrate use of SummarizedExperiment with remote HDF5 and Google BigQuery back ends, with two applications in cancer genomics.  Our intent is to allow the use of familiar and semantically meaningful programmatic idioms to query genomic data, while abstracting the remote interface from end users and developers.


Assuntos
Genômica , Software , Genoma
11.
Methods Mol Biol ; 1418: 67-90, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27008010

RESUMO

Annotation resources make up a significant proportion of the Bioconductor project (Huber et al., Nat Methods 12:115-121, 2015). And there are also a diverse set of online resources available which are accessed using specific packages. Here we describe the most popular of these resources and give some high level examples on how to use them.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA