Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Bioinformatics ; 36(18): 4682-4690, 2020 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-32618995

RESUMO

MOTIVATION: Genomic data repositories like The Cancer Genome Atlas, Encyclopedia of DNA Elements, Bioconductor's AnnotationHub and ExperimentHub etc., provide public access to large amounts of genomic data as flat files. Researchers often download a subset of data files from these repositories to perform exploratory data analysis. We developed Epiviz File Server, a Python library that implements an in situ data query system for local or remotely hosted indexed genomic files, not only for visualization but also data transformation. The File Server library decouples data retrieval and transformation from specific visualization and analysis tools and provides an abstract interface to define computations independent of the location, format or structure of the file. We demonstrate the File Server in two use cases: (i) integration with Galaxy workflows and (ii) using Epiviz to create a custom genome browser from the Epigenome Roadmap dataset. AVAILABILITY AND IMPLEMENTATION: Epiviz File Server is open source and is available on GitHub at http://github.com/epiviz/epivizFileServer. The documentation for the File Server library is available at http://epivizfileserver.rtfd.io.


Assuntos
Genoma , Genômica , Computadores , Armazenamento e Recuperação da Informação , Software
2.
Bioinformatics ; 36(7): 2195-2201, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31782758

RESUMO

MOTIVATION: Integrative analysis of genomic data that includes statistical methods in combination with visual exploration has gained widespread adoption. Many existing methods involve a combination of tools and resources: user interfaces that provide visualization of large genomic datasets, and computational environments that focus on data analyses over various subsets of a given dataset. Over the last few years, we have developed Epiviz as an integrative and interactive genomic data analysis tool that incorporates visualization tightly with state-of-the-art statistical analysis framework. RESULTS: In this article, we present Epiviz Feed, a proactive and automatic visual analytics system integrated with Epiviz that alleviates the burden of manually executing data analysis required to test biologically meaningful hypotheses. Results of interest that are proactively identified by server-side computations are listed as notifications in a feed. The feed turns genomic data analysis into a collaborative work between the analyst and the computational environment, which shortens the analysis time and allows the analyst to explore results efficiently.We discuss three ways where the proposed system advances the field of genomic data analysis: (i) takes the first step of proactive data analysis by utilizing available CPU power from the server to automate the analysis process; (ii) summarizes hypothesis test results in a way that analysts can easily understand and investigate; (iii) enables filtering and grouping of analysis results for quick search. This effort provides initial work on systems that substantially expand how computational and visualization frameworks can be tightly integrated to facilitate interactive genomic data analysis. AVAILABILITY AND IMPLEMENTATION: The source code for Epiviz Feed application is available at http://github.com/epiviz/epiviz_feed_polymer. The Epiviz Computational Server is available at http://github.com/epiviz/epiviz-feed-computation. Please refer to Epiviz documentation site for details: http://epiviz.github.io/.


Assuntos
Genômica , Software , Genoma , Projetos de Pesquisa
3.
Bioinformatics ; 35(19): 3870-3872, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30821316

RESUMO

SUMMARY: We developed the metagenomeFeatures R Bioconductor package along with annotation packages for three 16S rRNA databases (Greengenes, RDP and SILVA) to facilitate working with 16S rRNA databases and marker-gene survey feature data. The metagenomeFeatures package defines two classes, MgDb for working with 16S rRNA sequence databases, and mgFeatures for marker-gene survey feature data. The associated annotation packages provide a consistent interface to the different databases facilitating database comparison and exploration. The mgFeatures-class represents a crucial step in the development of a common data structure for working with 16S marker-gene survey data in R. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/release/bioc/html/metagenomeFeatures.html. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Assuntos
Bases de Dados de Ácidos Nucleicos , Software , RNA Ribossômico 16S , Inquéritos e Questionários
4.
J Immunol ; 201(4): 1154-1164, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29997126

RESUMO

The uptake and destruction of bacteria by phagocytic cells is an essential defense mechanism in metazoans. To identify novel genes involved in the phagocytosis of Staphylococcus aureus, a major human pathogen, we assessed the phagocytic capacity of adult blood cells (hemocytes) of the fruit fly, Drosophila melanogaster, by testing several lines of the Drosophila Genetic Reference Panel. Natural genetic variation in the gene RNA-binding Fox protein 1 (Rbfox1) correlated with low phagocytic capacity in hemocytes, pointing to Rbfox1 as a candidate regulator of phagocytosis. Loss of Rbfox1 resulted in increased expression of the Ig superfamily member Down syndrome adhesion molecule 4 (Dscam4). Silencing of Dscam4 in Rbfox1-depleted blood cells rescued the fly's cellular immune response to S. aureus, indicating that downregulation of Dscam4 by Rbfox1 is critical for S. aureus phagocytosis in Drosophila To our knowledge, this study is the first to demonstrate a link between Rbfox1, Dscam4, and host defense against S. aureus.


Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/imunologia , Hemócitos/imunologia , Imunidade Celular , Fatores de Processamento de RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Infecções Estafilocócicas/imunologia , Staphylococcus aureus/fisiologia , Animais , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/metabolismo , Proteínas de Drosophila/genética , Técnicas de Inativação de Genes , Humanos , Fagocitose , Fatores de Processamento de RNA/genética , Proteínas de Ligação a RNA/genética , Infecções Estafilocócicas/genética
5.
Nucleic Acids Res ; 46(6): 2777-2787, 2018 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-29529268

RESUMO

Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.


Assuntos
Biologia Computacional/métodos , Metagenoma/genética , Metagenômica/métodos , Sequenciamento Completo do Genoma/métodos , Bactérias/classificação , Bactérias/genética , Criança , Biologia Computacional/estatística & dados numéricos , Diarreia/diagnóstico , Diarreia/genética , Humanos , Internet , Metagenômica/estatística & dados numéricos , Reprodutibilidade dos Testes , Navegador , Sequenciamento Completo do Genoma/estatística & dados numéricos
6.
BMC Bioinformatics ; 20(1): 421, 2019 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-31409274

RESUMO

BACKGROUND: Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. RESULTS: In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses - alternative splicing and gene differential expression - without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. CONCLUSIONS: The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.


Assuntos
Algoritmos , Transcriptoma , Processamento Alternativo , Animais , Área Sob a Curva , Drosophila/genética , Humanos , RNA/química , RNA/metabolismo , Curva ROC , Análise de Sequência de RNA
7.
BMC Genomics ; 19(1): 799, 2018 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-30400812

RESUMO

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.


Assuntos
Algoritmos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Microbiota , RNA Ribossômico 16S/genética , Teorema de Bayes
8.
PLoS Pathog ; 12(4): e1005511, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27046031

RESUMO

Intracellular colonization and persistent infection by the kinetoplastid protozoan parasite, Trypanosoma cruzi, underlie the pathogenesis of human Chagas disease. To obtain global insights into the T. cruzi infective process, transcriptome dynamics were simultaneously captured in the parasite and host cells in an infection time course of human fibroblasts. Extensive remodeling of the T. cruzi transcriptome was observed during the early establishment of intracellular infection, coincident with a major developmental transition in the parasite. Contrasting this early response, few additional changes in steady state mRNA levels were detected once mature T. cruzi amastigotes were formed. Our findings suggest that transcriptome remodeling is required to establish a modified template to guide developmental transitions in the parasite, whereas homeostatic functions are regulated independently of transcriptomic changes, similar to that reported in related trypanosomatids. Despite complex mechanisms for regulation of phenotypic expression in T. cruzi, transcriptomic signatures derived from distinct developmental stages mirror known or projected characteristics of T. cruzi biology. Focusing on energy metabolism, we were able to validate predictions forecast in the mRNA expression profiles. We demonstrate measurable differences in the bioenergetic properties of the different mammalian-infective stages of T. cruzi and present additional findings that underscore the importance of mitochondrial electron transport in T. cruzi amastigote growth and survival. Consequences of T. cruzi colonization for the host include dynamic expression of immune response genes and cell cycle regulators with upregulation of host cholesterol and lipid synthesis pathways, which may serve to fuel intracellular T. cruzi growth. Thus, in addition to the biological inferences gained from gene ontology and functional enrichment analysis of differentially expressed genes in parasite and host, our comprehensive, high resolution transcriptomic dataset provides a substantially more detailed interpretation of T. cruzi infection biology and offers a basis for future drug and vaccine discovery efforts.


Assuntos
Fibroblastos/metabolismo , Transcriptoma/imunologia , Trypanosoma cruzi/imunologia , Animais , Células Cultivadas , Perfilação da Expressão Gênica , Humanos , Espaço Intracelular/imunologia , Proteínas de Protozoários/genética , RNA Mensageiro/metabolismo
9.
Bioinformatics ; 32(11): 1618-24, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-27246923

RESUMO

MOTIVATION: DNA methylation aberrations are now known to, almost universally, accompany the initiation and progression of cancers. In particular, the colon cancer epigenome contains specific genomic regions that, along with differences in methylation levels with respect to normal colon tissue, also show increased epigenetic and gene expression heterogeneity at the population level, i.e. across tumor samples, in comparison with other regions in the genome. Tumors are highly heterogeneous at the clonal level as well, and the relationship between clonal and population heterogeneity is poorly understood. RESULTS: We present an approach that uses sequencing reads from high-throughput sequencing of bisulfite-converted DNA to reconstruct heterogeneous cell populations by assembling cell-specific methylation patterns. Our methodology is based on the solution of a specific class of minimum cost network flow problems. We use our methods to analyze the relationship between clonal heterogeneity and population heterogeneity in high-coverage data from multiple samples of colon tumor and matched normal tissues. AVAILABILITY AND IMPLEMENTATION: http://github.com/hcorrada/methylFlow CONTACT: hcorrada@umiacs.umd.edu SUPPLEMENTARY INFORMATION: SUPPLEMENTARY INFORMATION is available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenômica , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Sulfitos
10.
Bioinformatics ; 32(12): 1873-9, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-26873931

RESUMO

MOTIVATION: Developing targeted therapeutics and identifying biomarkers relies on large amounts of research participant data. Beyond human DNA, scientists now investigate the DNA of micro-organisms inhabiting the human body. Recent work shows that an individual's collection of microbial DNA consistently identifies that person and could be used to link a real-world identity to a sensitive attribute in a research dataset. Unfortunately, the current suite of DNA-specific privacy-preserving analysis tools does not meet the requirements for microbiome sequencing studies. RESULTS: To address privacy concerns around microbiome sequencing, we implement metagenomic analyses using secure computation. Our implementation allows comparative analysis over combined data without revealing the feature counts for any individual sample. We focus on three analyses and perform an evaluation on datasets currently used by the microbiome research community. We use our implementation to simulate sharing data between four policy-domains. Additionally, we describe an application of our implementation for patients to combine data that allows drug developers to query against and compensate patients for the analysis. AVAILABILITY AND IMPLEMENTATION: The software is freely available for download at: http://cbcb.umd.edu/∼hcorrada/projects/secureseq.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: hcorrada@umiacs.umd.edu.


Assuntos
Microbiota , DNA , Humanos , Metagenômica , Privacidade , Software
11.
Nucleic Acids Res ; 43(14): 6799-813, 2015 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-26150419

RESUMO

Protozoan parasites of the genus Leishmania are the etiological agents of leishmaniasis, a group of diseases with a worldwide incidence of 0.9-1.6 million cases per year. We used RNA-seq to conduct a high-resolution transcriptomic analysis of the global changes in gene expression and RNA processing events that occur as L. major transforms from non-infective procyclic promastigotes to infective metacyclic promastigotes. Careful statistical analysis across multiple biological replicates and the removal of batch effects provided a high quality framework for comprehensively analyzing differential gene expression and transcriptome remodeling in this pathogen as it acquires its infectivity. We also identified precise 5' and 3' UTR boundaries for a majority of Leishmania genes and detected widespread alternative trans-splicing and polyadenylation. An investigation of possible correlations between stage-specific preferential trans-splicing or polyadenylation sites and differentially expressed genes revealed a lack of systematic association, establishing that differences in expression levels cannot be attributed to stage-regulated alternative RNA processing. Our findings build on and improve existing expression datasets and provide a substantially more detailed view of L. major biology that will inform the field and potentially provide a stronger basis for drug discovery and vaccine development efforts.


Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Leishmania major/genética , Processamento Pós-Transcricional do RNA , Perfilação da Expressão Gênica , Ontologia Genética , Genes de Protozoários , Leishmania major/crescimento & desenvolvimento , Leishmania major/metabolismo , Poliadenilação , Análise de Sequência de RNA , Trans-Splicing
12.
Nurs Res ; 66(2): 115-122, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28125511

RESUMO

BACKGROUND: A statistical methodology is available to estimate the proportion of cell types (cellular heterogeneity) in adult whole blood specimens used in epigenome-wide association studies (EWAS). However, there is no methodology to estimate the proportion of cell types in umbilical cord blood (also a heterogeneous tissue) used in EWAS. OBJECTIVES: The objectives of this study were to determine whether differences in DNA methylation (DNAm) patterns in umbilical cord blood are the result of blood cell type proportion changes that typically occur across gestational age and to demonstrate the effect of cell type proportion confounding by comparing preterm infants exposed and not exposed to antenatal steroids. METHODS: We obtained DNAm profiles of cord blood using the Illumina HumanMethylation27k BeadChip array for 385 neonates from the Boston Birth Cohort. We estimated cell type proportions for six cell types using the deconvolution method developed by . RESULTS: The cell type proportion estimates segregated into two groups that were significantly different by gestational age, indicating that gestational age was associated with cell type proportion. Among infants exposed to antenatal steroids, the number of differentially methylated CpGs dropped from 127 to 1 after controlling for cell type proportion. DISCUSSION: EWAS utilizing cord blood are confounded by cell type proportion. Careful study design including correction for cell type proportion and interpretation of results of EWAS using cord blood are critical.


Assuntos
Metilação de DNA , Sangue Fetal/metabolismo , Idade Gestacional , Diferenciação Celular , Fenômenos Fisiológicos Celulares , Feminino , Humanos , Recém-Nascido
13.
Biostatistics ; 16(4): 627-40, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25964664

RESUMO

The recent growth of high-throughput transcriptome technology has been paralleled by the development of statistical methodologies to analyze the data they produce. Some of these newly developed methods are based on the assumption that the data observed or a transformation of the data are relatively symmetric with light tails, usually summarized by assuming a Gaussian random component. It is indeed very difficult to assess this assumption for small sample sizes. In this article, we utilize L-moments statistics as the basis of exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, we use L-moments ratios for assessing the shape (skewness and kurtosis) of high-throughput transcriptome data. Based on these statistics, we propose an algorithm for identifying genes with distributions that are markedly different from the majority in the data. In addition, we also illustrate the utility of this framework to characterize the robustness of distributional assumptions. We apply it to RNA-seq data and find that methods based on the simple [Formula: see text]-test for differential expression analysis using L-moments as weights are robust.


Assuntos
Interpretação Estatística de Dados , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Tamanho da Amostra
14.
Nucleic Acids Res ; 42(6): 3503-14, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24435799

RESUMO

The amount of tissue-specific expression variability (EV) across individuals is an essential characteristic of a gene and believed to have evolved, in part, under functional constraints. However, the determinants and functional implications of EV are only beginning to be investigated. Our analyses based on multiple expression profiles in 41 primary human tissues show that a gene's EV is significantly correlated with a number of features pertaining to the genomic, epigenomic, regulatory, polymorphic, functional, structural and network characteristics of the gene. We found that (i) EV of a gene is encoded, in part, by its genomic context and is further influenced by the epigenome; (ii) strong promoters induce less variable expression; (iii) less variable gene loci evolve under purifying selection against copy number polymorphisms; (iv) genes that encode inherently disordered or highly interacting proteins exhibit lower variability; and (v) genes with less variable expression are enriched for house-keeping functions, while genes with highly variable expression tend to function in development and extra-cellular response and are associated with human diseases. Thus, our analysis reveals a number of potential mediators as well as functional and evolutionary correlates of EV, and provides new insights into the inherent variability in eukaryotic gene expression.


Assuntos
Expressão Gênica , Variação Genética , Doença/genética , Epigênese Genética , Genômica , Humanos , Proteínas Intrinsicamente Desordenadas/genética , Polimorfismo Genético , Regiões Promotoras Genéticas , Transcriptoma
15.
BMC Bioinformatics ; 16 Suppl 11: S4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26328750

RESUMO

BACKGROUND: Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. RESULTS: In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. CONCLUSIONS: Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Genômica/métodos , Proteínas/genética , Software , Algoritmos , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , Fluxo de Trabalho
16.
BMC Genomics ; 16: 1108, 2015 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-26715493

RESUMO

BACKGROUND: Parasites of the genus Leishmania are the causative agents of leishmaniasis, a group of diseases that range in manifestations from skin lesions to fatal visceral disease. The life cycle of Leishmania parasites is split between its insect vector and its mammalian host, where it resides primarily inside of macrophages. Once intracellular, Leishmania parasites must evade or deactivate the host's innate and adaptive immune responses in order to survive and replicate. RESULTS: We performed transcriptome profiling using RNA-seq to simultaneously identify global changes in murine macrophage and L. major gene expression as the parasite entered and persisted within murine macrophages during the first 72 h of an infection. Differential gene expression, pathway, and gene ontology analyses enabled us to identify modulations in host and parasite responses during an infection. The most substantial and dynamic gene expression responses by both macrophage and parasite were observed during early infection. Murine genes related to both pro- and anti-inflammatory immune responses and glycolysis were substantially upregulated and genes related to lipid metabolism, biogenesis, and Fc gamma receptor-mediated phagocytosis were downregulated. Upregulated parasite genes included those aimed at mitigating the effects of an oxidative response by the host immune system while downregulated genes were related to translation, cell signaling, fatty acid biosynthesis, and flagellum structure. CONCLUSIONS: The gene expression patterns identified in this work yield signatures that characterize multiple developmental stages of L. major parasites and the coordinated response of Leishmania-infected macrophages in the real-time setting of a dual biological system. This comprehensive dataset offers a clearer and more sensitive picture of the interplay between host and parasite during intracellular infection, providing additional insights into how pathogens are able to evade host defenses and modulate the biological functions of the cell in order to survive in the mammalian environment.


Assuntos
Interações Hospedeiro-Patógeno/genética , Leishmania major/fisiologia , Macrófagos/metabolismo , Animais , Perfilação da Expressão Gênica , Leishmania major/genética , Camundongos , Transcriptoma/genética
17.
Bioinformatics ; 30(10): 1363-9, 2014 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-24478339

RESUMO

MOTIVATION: The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. RESULTS: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. AVAILABILITY AND IMPLEMENTATION: http://bioconductor.org/packages/release/bioc/html/minfi.html. CONTACT: khansen@jhsph.edu; rafa@jimmy.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Idoso , Algoritmos , Neoplasias do Colo/genética , Genoma , Humanos , Polimorfismo de Nucleotídeo Único , Software
18.
Bioinformatics ; 30(9): 1214-9, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24413520

RESUMO

MOTIVATION: Base-calling of sequencing data produced by high-throughput sequencing platforms is a fundamental process in current bioinformatics analysis. However, existing third-party probabilistic or machine-learning methods that significantly improve the accuracy of base-calls on these platforms are impractical for production use due to their computational inefficiency. RESULTS: We directly formulate base-calling as a blind deconvolution problem and implemented BlindCall as an efficient solver to this inverse problem. BlindCall produced base-calls at accuracy comparable to state-of-the-art probabilistic methods while processing data at rates 10 times faster in most cases. The computational complexity of BlindCall scales linearly with read length making it better suited for new long-read sequencing technologies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Humanos , Probabilidade , Reprodutibilidade dos Testes , Software , Fatores de Tempo
19.
F1000Res ; 9: 601, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32742640

RESUMO

The rich data produced by the second phase of the Human Microbiome Project (iHMP) offers a unique opportunity to test hypotheses that interactions between microbial communities and a human host might impact an individual's health or disease status. In this work we describe infrastructure that integrates Metaviz, an interactive microbiome data analysis and visualization tool, with the iHMP Data Coordination Center web portal and the HMP2Data R/Bioconductor package. We describe integrative statistical and visual analyses of two datasets from iHMP using Metaviz along with the metagenomeSeq R/Bioconductor package for statistical analysis of differential abundance analysis. These use cases demonstrate the utility of a combined approach to access and analyze data from this resource.


Assuntos
Análise de Dados , Microbiota , Interpretação Estatística de Dados , Humanos , Projetos de Pesquisa
20.
JCO Clin Cancer Inform ; 4: 71-88, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31990579

RESUMO

PURPOSE: In this work, we introduce CDGnet (Cancer-Drug-Gene Network), an evidence-based network approach for recommending targeted cancer therapies. CDGnet represents a user-friendly informatics tool that expands the range of targeted therapy options for patients with cancer who undergo molecular profiling by including the biologic context via pathway information. METHODS: CDGnet considers biologic pathway information specifically by looking at targets or biomarkers downstream of oncogenes and is personalized for individual patients via user-inputted molecular alterations and cancer type. It integrates a number of different sources of knowledge: patient-specific inputs (molecular alterations and cancer type), US Food and Drug Administration-approved therapies and biomarkers (curated from DailyMed), pathways for specific cancer types (from Kyoto Encyclopedia of Genes and Genomes [KEGG]), gene-drug connections (from DrugBank), and oncogene information (from KEGG). We consider 4 different evidence-based categories for therapy recommendations. Our tool is delivered via an R/Shiny Web application. For the 2 categories that use pathway information, we include an interactive Sankey visualization built on top of d3.js that also provides links to PubChem. RESULTS: We present a scenario for a patient who has estrogen receptor (ER)-positive breast cancer with FGFR1 amplification. Although many therapies exist for patients with ER-positive breast cancer, FGFR1 amplifications may confer resistance to such treatments. CDGnet provides therapy recommendations, including PIK3CA, MAPK, and RAF inhibitors, by considering targets or biomarkers downstream of FGFR1. CONCLUSION: CDGnet provides results in a number of easily accessible and usable forms, separating targeted cancer therapies into categories in an evidence-based manner that incorporates biologic pathway information.


Assuntos
Antineoplásicos/uso terapêutico , Biomarcadores Tumorais/genética , Medicina Baseada em Evidências , Redes Reguladoras de Genes , Terapia de Alvo Molecular , Neoplasias/tratamento farmacológico , Medicina de Precisão , Biomarcadores Tumorais/antagonistas & inibidores , Humanos , Neoplasias/genética , Neoplasias/patologia , Seleção de Pacientes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA