Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Biostatistics ; 24(1): 1-16, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-34467372

RESUMO

High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.


Assuntos
Algoritmos , Humanos , Análise de Componente Principal
2.
Nat Med ; 27(11): 1885-1892, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34789871

RESUMO

The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Therefore, a multidisciplinary group of microbiome epidemiology researchers adapted guidelines for observational and genetic studies to culture-independent human microbiome studies, and also developed new reporting elements for laboratory, bioinformatics and statistical analyses tailored to microbiome studies. The resulting tool, called 'Strengthening The Organization and Reporting of Microbiome Studies' (STORMS), is composed of a 17-item checklist organized into six sections that correspond to the typical sections of a scientific publication, presented as an editable table for inclusion in supplementary materials. The STORMS checklist provides guidance for concise and complete reporting of microbiome studies that will facilitate manuscript preparation, peer review, and reader comprehension of publications and comparative analysis of published results.


Assuntos
Biologia Computacional/métodos , Disbiose/microbiologia , Microbiota/fisiologia , Estudos Observacionais como Assunto/métodos , Projetos de Pesquisa , Humanos , Ciência Translacional Biomédica
3.
PLoS Comput Biol ; 17(11): e1009442, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34784344

RESUMO

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.


Assuntos
Biologia Computacional , Microbioma Gastrointestinal , Análise Multivariada , Simulação por Computador , Humanos , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/metabolismo , Doenças Inflamatórias Intestinais/patologia
4.
Nature ; 598(7879): 103-110, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34616066

RESUMO

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Assuntos
Epigenômica , Perfilação da Expressão Gênica , Córtex Motor/citologia , Neurônios/classificação , Análise de Célula Única , Transcriptoma , Animais , Atlas como Assunto , Conjuntos de Dados como Assunto , Epigênese Genética , Feminino , Masculino , Camundongos , Córtex Motor/anatomia & histologia , Neurônios/citologia , Neurônios/metabolismo , Especificidade de Órgãos , Reprodutibilidade dos Testes
6.
Bioinformatics ; 36(Suppl_1): i102-i110, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657377

RESUMO

MOTIVATION: Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. RESULTS: We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. AVAILABILITY AND IMPLEMENTATION: Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Software , Algoritmos , RNA-Seq , Análise de Sequência de RNA
7.
Microbiome ; 8(1): 35, 2020 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-32169095

RESUMO

BACKGROUND: There are a variety of bioinformatic pipelines and downstream analysis methods for analyzing 16S rRNA marker-gene surveys. However, appropriate assessment datasets and metrics are needed as there is limited guidance to decide between available analysis methods. Mixtures of environmental samples are useful for assessing analysis methods as one can evaluate methods based on calculated expected values using unmixed sample measurements and the mixture design. Previous studies have used mixtures of environmental samples to assess other sequencing methods such as RNAseq. But no studies have used mixtures of environmental to assess 16S rRNA sequencing. RESULTS: We developed a framework for assessing 16S rRNA sequencing analysis methods which utilizes a novel two-sample titration mixture dataset and metrics to evaluate qualitative and quantitative characteristics of count tables. Our qualitative assessment evaluates feature presence/absence exploiting features only present in unmixed samples or titrations by testing if random sampling can account for their observed relative abundance. Our quantitative assessment evaluates feature relative and differential abundance by comparing observed and expected values. We demonstrated the framework by evaluating count tables generated with three commonly used bioinformatic pipelines: (i) DADA2 a sequence inference method, (ii) Mothur a de novo clustering method, and (iii) QIIME an open-reference clustering method. The qualitative assessment results indicated that the majority of Mothur and QIIME features only present in unmixed samples or titrations were accounted for by random sampling alone, but this was not the case for DADA2 features. Combined with count table sparsity (proportion of zero-valued cells in a count table), these results indicate DADA2 has a higher false-negative rate whereas Mothur and QIIME have higher false-positive rates. The quantitative assessment results indicated the observed relative abundance and differential abundance values were consistent with expected values for all three pipelines. CONCLUSIONS: We developed a novel framework for assessing 16S rRNA marker-gene survey methods and demonstrated the framework by evaluating count tables generated with three bioinformatic pipelines. This framework is a valuable community resource for assessing 16S rRNA marker-gene survey bioinformatic methods and will help scientists identify appropriate analysis methods for their marker-gene surveys.


Assuntos
Biologia Computacional/métodos , Análise de Dados , Microbiota/genética , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodos , Adulto , Ensaios Clínicos como Assunto , Feminino , Marcadores Genéticos , Humanos , Masculino , Software , Adulto Jovem
8.
F1000Res ; 8: 1769, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32148761

RESUMO

An increasing emphasis on understanding the dynamics of microbial communities in various settings has led to the proliferation of longitudinal metagenomic sampling studies. Data from whole metagenomic shotgun sequencing and marker-gene survey studies have characteristics that drive novel statistical methodological development for estimating time intervals of differential abundance. In designing a study and the frequency of collection prior to a study, one may wish to model the ability to detect an effect, e.g., there may be issues with respect to cost, ease of access, etc. Additionally, while every study is unique, it is possible that in certain scenarios one statistical framework may be more appropriate than another. Here, we present a simulation paradigm implemented in the R Bioconductor software package microbiomeDASim available at http://bioconductor.org/packages/microbiomeDASim microbiomeDASim. microbiomeDASim allows investigators to simulate longitudinal differential abundant microbiome features with a variety of known functional forms with flexible parameters to control desired signal-to-noise ratio. We present metrics of success results on one particular method called metaSplines.


Assuntos
Microbiota , Software , Análise de Sequência de DNA
9.
F1000Res ; 7: 1096, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30135734

RESUMO

Interactive and integrative data visualization tools and libraries are integral to exploration and analysis of genomic data. Web based genome browsers allow integrative data exploration of a large number of data sets for a specific region in the genome. Currently available web-based genome browsers are developed for specific use cases and datasets, therefore integration and extensibility of the visualizations and the underlying libraries from these tools is a challenging task. Genomic data visualization and software libraries that enable bioinformatic researchers and developers to implement customized genomic data viewers and data analyses for their application are much needed. Using recent advances in core web platform APIs and technologies including Web Components, we developed the Epiviz Component Library, a reusable and extensible data visualization library and application framework for genomic data. Epiviz Components can be integrated with most JavaScript libraries and frameworks designed for HTML. To demonstrate the ease of integration with other frameworks, we developed an R/Bioconductor epivizrChart package, that provides interactive, shareable and reproducible visualizations of genomic data objects in R, Shiny and also create standalone HTML documents. The component library is modular by design, reusable and natively extensible and therefore simplifies the process of managing and developing bioinformatic applications.


Assuntos
Gráficos por Computador , Bases de Dados de Ácidos Nucleicos , Genômica , Software , Navegador
10.
Biostatistics ; 19(2): 185-198, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036413

RESUMO

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.


Assuntos
Bioestatística/métodos , Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Modelos Estatísticos , Humanos
11.
Bioinformatics ; 32(24): 3836-3838, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27540268

RESUMO

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. There are several existing batch adjustment tools for '-omics' data, but they do not indicate a priori whether adjustment needs to be conducted or how correction should be applied. We present a software pipeline, BatchQC, which addresses these issues using interactive visualizations and statistics that evaluate the impact of batch effects in a genomic dataset. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. We used the BatchQC pipeline on both simulated and real data to demonstrate the effectiveness of this software toolkit. AVAILABILITY AND IMPLEMENTATION: BatchQC is available through Bioconductor: http://bioconductor.org/packages/BatchQC and GitHub: https://github.com/mani2012/BatchQC CONTACT: wej@bu.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Genoma , Humanos , Interface Usuário-Computador
12.
Genome Res ; 26(8): 1110-23, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27311443

RESUMO

Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type-specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type-specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.


Assuntos
Proteínas de Ligação a DNA/genética , Heterogeneidade Genética , Ligação Proteica/genética , Fatores de Transcrição/genética , Sítios de Ligação/genética , Linhagem da Célula/genética , Biologia Computacional , Regulação da Expressão Gênica , Genômica , Humanos , Sensibilidade e Especificidade
13.
BMC Genomics ; 17: 440, 2016 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-27277524

RESUMO

BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrhea in inhabitants from low-income countries and in visitors to these countries. The impact of the human intestinal microbiota on the initiation and progression of ETEC diarrhea is not yet well understood. RESULTS: We used 16S rRNA (ribosomal RNA) gene sequencing to study changes in the fecal microbiota of 12 volunteers during a human challenge study with ETEC (H10407) and subsequent treatment with ciprofloxacin. Five subjects developed severe diarrhea and seven experienced few or no symptoms. Diarrheal symptoms were associated with high concentrations of fecal E. coli as measured by quantitative culture, quantitative PCR, and normalized number of 16S rRNA gene sequences. Large changes in other members of the microbiota varied greatly from individual to individual, whether or not diarrhea occurred. Nonetheless the variation within an individual was small compared to variation between individuals. Ciprofloxacin treatment reorganized microbiota populations; however, the original structure was largely restored at one and three month follow-up visits. CONCLUSION: Symptomatic ETEC infections, but not asymptomatic infections, were associated with high fecal concentrations of E. coli. Both infection and ciprofloxacin treatment caused variable changes in other bacteria that generally reverted to baseline levels after three months.


Assuntos
Ciprofloxacina/uso terapêutico , Escherichia coli Enterotoxigênica/efeitos dos fármacos , Escherichia coli Enterotoxigênica/fisiologia , Infecções por Escherichia coli/tratamento farmacológico , Infecções por Escherichia coli/microbiologia , Microbioma Gastrointestinal/efeitos dos fármacos , Adulto , Ciprofloxacina/farmacologia , Diarreia/tratamento farmacológico , Diarreia/microbiologia , Fezes/microbiologia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Metagenoma , Metagenômica/métodos , Pessoa de Meia-Idade , RNA Ribossômico 16S , Curva ROC , Resultado do Tratamento , Adulto Jovem
14.
mBio ; 7(3)2016 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-27165796

RESUMO

UNLABELLED: Macrophages are mononuclear phagocytes that constitute a first line of defense against pathogens. While lethal to many microbes, they are the primary host cells of Leishmania spp. parasites, the obligate intracellular pathogens that cause leishmaniasis. We conducted transcriptomic profiling of two Leishmania species and the human macrophage over the course of intracellular infection by using high-throughput RNA sequencing to characterize the global gene expression changes and reprogramming events that underlie the interactions between the pathogen and its host. A systematic exclusion of the generic effects of large-particle phagocytosis revealed a vigorous, parasite-specific response of the human macrophage early in the infection that was greatly tempered at later time points. An analogous temporal expression pattern was observed with the parasite, suggesting that much of the reprogramming that occurs as parasites transform into intracellular forms generally stabilizes shortly after entry. Following that, the parasite establishes an intracellular niche within macrophages, with minimal communication between the parasite and the host cell later during the infection. No significant difference was observed between parasite species transcriptomes or in the transcriptional response of macrophages infected with each species. Our comparative analysis of gene expression changes that occur as mouse and human macrophages are infected by Leishmania spp. points toward a general signature of the Leishmania-macrophage infectome. IMPORTANCE: Little is known about the transcriptional changes that occur within mammalian cells harboring intracellular pathogens. This study characterizes the gene expression signatures of Leishmania spp. parasites and the coordinated response of infected human macrophages as the pathogen enters and persists within them. After accounting for the generic effects of large-particle phagocytosis, we observed a parasite-specific response of the human macrophages early in infection that was reduced at later time points. A similar expression pattern was observed in the parasites. Our analyses provide specific insights into the interplay between human macrophages and Leishmania parasites and constitute an important general resource for the study of how pathogens evade host defenses and modulate the functions of the cell to survive intracellularly.


Assuntos
Interações Hospedeiro-Parasita , Leishmania/genética , Macrófagos/parasitologia , Transcriptoma , Animais , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Leishmania/imunologia , Leishmania/metabolismo , Macrófagos/imunologia , Macrófagos/metabolismo , Redes e Vias Metabólicas/genética , Camundongos , Fagocitose
15.
BMC Cancer ; 16: 88, 2016 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-26868017

RESUMO

BACKGROUND: Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. METHOD: Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. RESULT: We found that the classical promoter epigenomic mark--H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. CONCLUSION: Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.


Assuntos
Neoplasias do Colo/genética , Metilação de DNA/genética , Epigenômica , Genoma Humano , Linhagem Celular Tumoral , Cromatina/genética , Neoplasias do Colo/patologia , Ilhas de CpG/genética , Histonas/genética , Humanos , Regiões Promotoras Genéticas
16.
Cancer Inform ; 14: 71-81, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26078586

RESUMO

Gene expression signatures are commonly used to create cancer prognosis and diagnosis methods, yet only a small number of them are successfully deployed in the clinic since many fail to replicate performance on subsequent validation. A primary reason for this lack of reproducibility is the fact that these signatures attempt to model the highly variable and unstable genomic behavior of cancer. Our group recently introduced gene expression anti-profiles as a robust methodology to derive gene expression signatures based on the observation that while gene expression measurements are highly heterogeneous across tumors of a specific cancer type relative to the normal tissue, their degree of deviation from normal tissue expression in specific genes involved in tissue differentiation is a stable tumor mark that is reproducible across experiments and cancer types. Here we show that constructing gene expression signatures based on variability and the anti-profile approach yields classifiers capable of successfully distinguishing benign growths from cancerous growths based on deviation from normal expression. We then show that this same approach generates stable and reproducible signatures that predict probability of relapse and survival based on tumor gene expression. These results suggest that using the anti-profile framework for the discovery of genomic signatures is an avenue leading to the development of reproducible signatures suitable for adoption in clinical settings.

17.
Nat Methods ; 12(2): 115-21, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25633503

RESUMO

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Software , Linguagens de Programação , Interface Usuário-Computador
18.
Genome Med ; 6(8): 61, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25191524

RESUMO

BACKGROUND: One of the most provocative recent observations in cancer epigenetics is the discovery of large hypomethylated blocks, including single copy genes, in colorectal cancer, that correspond in location to heterochromatic LOCKs (large organized chromatin lysine-modifications) and LADs (lamin-associated domains). METHODS: Here we performed a comprehensive genome-scale analysis of 10 breast, 28 colon, nine lung, 38 thyroid, 18 pancreas cancers, and five pancreas neuroendocrine tumors as well as matched normal tissue from most of these cases, as well as 51 premalignant lesions. We used a new statistical approach that allows the identification of large hypomethylated blocks on the Illumina HumanMethylation450 BeadChip platform. RESULTS: We find that hypomethylated blocks are a universal feature of common solid human cancer, and that they occur at the earliest stage of premalignant tumors and progress through clinical stages of thyroid and colon cancer development. We also find that the disrupted CpG islands widely reported previously, including hypermethylated island bodies and hypomethylated shores, are enriched in hypomethylated blocks, with flattening of the methylation signal within and flanking the islands. Finally, we found that genes showing higher between individual gene expression variability are enriched within these hypomethylated blocks. CONCLUSION: Thus hypomethylated blocks appear to be a universal defining epigenetic alteration in human cancer, at least for common solid tumors.

19.
Nat Methods ; 11(9): 938-40, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25086505

RESUMO

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.


Assuntos
Mapeamento Cromossômico/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Genômica/métodos , Internet , Software , Interface Usuário-Computador , Algoritmos , Sistemas de Gerenciamento de Base de Dados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA