Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 36(18): 4682-4690, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-32618995

RESUMEN

MOTIVATION: Genomic data repositories like The Cancer Genome Atlas, Encyclopedia of DNA Elements, Bioconductor's AnnotationHub and ExperimentHub etc., provide public access to large amounts of genomic data as flat files. Researchers often download a subset of data files from these repositories to perform exploratory data analysis. We developed Epiviz File Server, a Python library that implements an in situ data query system for local or remotely hosted indexed genomic files, not only for visualization but also data transformation. The File Server library decouples data retrieval and transformation from specific visualization and analysis tools and provides an abstract interface to define computations independent of the location, format or structure of the file. We demonstrate the File Server in two use cases: (i) integration with Galaxy workflows and (ii) using Epiviz to create a custom genome browser from the Epigenome Roadmap dataset. AVAILABILITY AND IMPLEMENTATION: Epiviz File Server is open source and is available on GitHub at http://github.com/epiviz/epivizFileServer. The documentation for the File Server library is available at http://epivizfileserver.rtfd.io.


Asunto(s)
Genoma , Genómica , Computadores , Almacenamiento y Recuperación de la Información , Programas Informáticos
2.
JCO Clin Cancer Inform ; 4: 71-88, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31990579

RESUMEN

PURPOSE: In this work, we introduce CDGnet (Cancer-Drug-Gene Network), an evidence-based network approach for recommending targeted cancer therapies. CDGnet represents a user-friendly informatics tool that expands the range of targeted therapy options for patients with cancer who undergo molecular profiling by including the biologic context via pathway information. METHODS: CDGnet considers biologic pathway information specifically by looking at targets or biomarkers downstream of oncogenes and is personalized for individual patients via user-inputted molecular alterations and cancer type. It integrates a number of different sources of knowledge: patient-specific inputs (molecular alterations and cancer type), US Food and Drug Administration-approved therapies and biomarkers (curated from DailyMed), pathways for specific cancer types (from Kyoto Encyclopedia of Genes and Genomes [KEGG]), gene-drug connections (from DrugBank), and oncogene information (from KEGG). We consider 4 different evidence-based categories for therapy recommendations. Our tool is delivered via an R/Shiny Web application. For the 2 categories that use pathway information, we include an interactive Sankey visualization built on top of d3.js that also provides links to PubChem. RESULTS: We present a scenario for a patient who has estrogen receptor (ER)-positive breast cancer with FGFR1 amplification. Although many therapies exist for patients with ER-positive breast cancer, FGFR1 amplifications may confer resistance to such treatments. CDGnet provides therapy recommendations, including PIK3CA, MAPK, and RAF inhibitors, by considering targets or biomarkers downstream of FGFR1. CONCLUSION: CDGnet provides results in a number of easily accessible and usable forms, separating targeted cancer therapies into categories in an evidence-based manner that incorporates biologic pathway information.


Asunto(s)
Antineoplásicos/uso terapéutico , Biomarcadores de Tumor/genética , Medicina Basada en la Evidencia , Redes Reguladoras de Genes , Terapia Molecular Dirigida , Neoplasias/tratamiento farmacológico , Medicina de Precisión , Biomarcadores de Tumor/antagonistas & inhibidores , Humanos , Neoplasias/genética , Neoplasias/patología , Selección de Paciente
3.
F1000Res ; 9: 601, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32742640

RESUMEN

The rich data produced by the second phase of the Human Microbiome Project (iHMP) offers a unique opportunity to test hypotheses that interactions between microbial communities and a human host might impact an individual's health or disease status. In this work we describe infrastructure that integrates Metaviz, an interactive microbiome data analysis and visualization tool, with the iHMP Data Coordination Center web portal and the HMP2Data R/Bioconductor package. We describe integrative statistical and visual analyses of two datasets from iHMP using Metaviz along with the metagenomeSeq R/Bioconductor package for statistical analysis of differential abundance analysis. These use cases demonstrate the utility of a combined approach to access and analyze data from this resource.


Asunto(s)
Análisis de Datos , Microbiota , Interpretación Estadística de Datos , Humanos , Proyectos de Investigación
4.
Bioinformatics ; 36(7): 2195-2201, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31782758

RESUMEN

MOTIVATION: Integrative analysis of genomic data that includes statistical methods in combination with visual exploration has gained widespread adoption. Many existing methods involve a combination of tools and resources: user interfaces that provide visualization of large genomic datasets, and computational environments that focus on data analyses over various subsets of a given dataset. Over the last few years, we have developed Epiviz as an integrative and interactive genomic data analysis tool that incorporates visualization tightly with state-of-the-art statistical analysis framework. RESULTS: In this article, we present Epiviz Feed, a proactive and automatic visual analytics system integrated with Epiviz that alleviates the burden of manually executing data analysis required to test biologically meaningful hypotheses. Results of interest that are proactively identified by server-side computations are listed as notifications in a feed. The feed turns genomic data analysis into a collaborative work between the analyst and the computational environment, which shortens the analysis time and allows the analyst to explore results efficiently.We discuss three ways where the proposed system advances the field of genomic data analysis: (i) takes the first step of proactive data analysis by utilizing available CPU power from the server to automate the analysis process; (ii) summarizes hypothesis test results in a way that analysts can easily understand and investigate; (iii) enables filtering and grouping of analysis results for quick search. This effort provides initial work on systems that substantially expand how computational and visualization frameworks can be tightly integrated to facilitate interactive genomic data analysis. AVAILABILITY AND IMPLEMENTATION: The source code for Epiviz Feed application is available at http://github.com/epiviz/epiviz_feed_polymer. The Epiviz Computational Server is available at http://github.com/epiviz/epiviz-feed-computation. Please refer to Epiviz documentation site for details: http://epiviz.github.io/.


Asunto(s)
Genómica , Programas Informáticos , Genoma , Proyectos de Investigación
5.
BMC Bioinformatics ; 20(1): 421, 2019 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-31409274

RESUMEN

BACKGROUND: Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. RESULTS: In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses - alternative splicing and gene differential expression - without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. CONCLUSIONS: The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.


Asunto(s)
Algoritmos , Transcriptoma , Empalme Alternativo , Animales , Área Bajo la Curva , Drosophila/genética , Humanos , ARN/química , ARN/metabolismo , Curva ROC , Análisis de Secuencia de ARN
6.
Bioinformatics ; 35(19): 3870-3872, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-30821316

RESUMEN

SUMMARY: We developed the metagenomeFeatures R Bioconductor package along with annotation packages for three 16S rRNA databases (Greengenes, RDP and SILVA) to facilitate working with 16S rRNA databases and marker-gene survey feature data. The metagenomeFeatures package defines two classes, MgDb for working with 16S rRNA sequence databases, and mgFeatures for marker-gene survey feature data. The associated annotation packages provide a consistent interface to the different databases facilitating database comparison and exploration. The mgFeatures-class represents a crucial step in the development of a common data structure for working with 16S marker-gene survey data in R. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/release/bioc/html/metagenomeFeatures.html. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Programas Informáticos , ARN Ribosómico 16S , Encuestas y Cuestionarios
7.
BMC Genomics ; 19(1): 799, 2018 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-30400812

RESUMEN

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Teorema de Bayes
8.
Microbiome ; 6(1): 197, 2018 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-30396371

RESUMEN

The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.


Asunto(s)
Armas Biológicas , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Humanos , Microbiota/fisiología , Análisis de Secuencia de ADN/métodos
9.
J Immunol ; 201(4): 1154-1164, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29997126

RESUMEN

The uptake and destruction of bacteria by phagocytic cells is an essential defense mechanism in metazoans. To identify novel genes involved in the phagocytosis of Staphylococcus aureus, a major human pathogen, we assessed the phagocytic capacity of adult blood cells (hemocytes) of the fruit fly, Drosophila melanogaster, by testing several lines of the Drosophila Genetic Reference Panel. Natural genetic variation in the gene RNA-binding Fox protein 1 (Rbfox1) correlated with low phagocytic capacity in hemocytes, pointing to Rbfox1 as a candidate regulator of phagocytosis. Loss of Rbfox1 resulted in increased expression of the Ig superfamily member Down syndrome adhesion molecule 4 (Dscam4). Silencing of Dscam4 in Rbfox1-depleted blood cells rescued the fly's cellular immune response to S. aureus, indicating that downregulation of Dscam4 by Rbfox1 is critical for S. aureus phagocytosis in Drosophila To our knowledge, this study is the first to demonstrate a link between Rbfox1, Dscam4, and host defense against S. aureus.


Asunto(s)
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/inmunología , Hemocitos/inmunología , Inmunidad Celular , Factores de Empalme de ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Infecciones Estafilocócicas/inmunología , Staphylococcus aureus/fisiología , Animales , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/metabolismo , Proteínas de Drosophila/genética , Técnicas de Inactivación de Genes , Humanos , Fagocitosis , Factores de Empalme de ARN/genética , Proteínas de Unión al ARN/genética , Infecciones Estafilocócicas/genética
10.
Nucleic Acids Res ; 46(6): 2777-2787, 2018 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-29529268

RESUMEN

Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.


Asunto(s)
Biología Computacional/métodos , Metagenoma/genética , Metagenómica/métodos , Secuenciación Completa del Genoma/métodos , Bacterias/clasificación , Bacterias/genética , Niño , Biología Computacional/estadística & datos numéricos , Diarrea/diagnóstico , Diarrea/genética , Humanos , Internet , Metagenómica/estadística & datos numéricos , Reproducibilidad de los Resultados , Navegador Web , Secuenciación Completa del Genoma/estadística & datos numéricos
11.
Nurs Res ; 66(2): 115-122, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28125511

RESUMEN

BACKGROUND: A statistical methodology is available to estimate the proportion of cell types (cellular heterogeneity) in adult whole blood specimens used in epigenome-wide association studies (EWAS). However, there is no methodology to estimate the proportion of cell types in umbilical cord blood (also a heterogeneous tissue) used in EWAS. OBJECTIVES: The objectives of this study were to determine whether differences in DNA methylation (DNAm) patterns in umbilical cord blood are the result of blood cell type proportion changes that typically occur across gestational age and to demonstrate the effect of cell type proportion confounding by comparing preterm infants exposed and not exposed to antenatal steroids. METHODS: We obtained DNAm profiles of cord blood using the Illumina HumanMethylation27k BeadChip array for 385 neonates from the Boston Birth Cohort. We estimated cell type proportions for six cell types using the deconvolution method developed by . RESULTS: The cell type proportion estimates segregated into two groups that were significantly different by gestational age, indicating that gestational age was associated with cell type proportion. Among infants exposed to antenatal steroids, the number of differentially methylated CpGs dropped from 127 to 1 after controlling for cell type proportion. DISCUSSION: EWAS utilizing cord blood are confounded by cell type proportion. Careful study design including correction for cell type proportion and interpretation of results of EWAS using cord blood are critical.


Asunto(s)
Metilación de ADN , Sangre Fetal/metabolismo , Edad Gestacional , Diferenciación Celular , Fenómenos Fisiológicos Celulares , Femenino , Humanos , Recién Nacido
12.
Bioinformatics ; 32(11): 1618-24, 2016 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-27246923

RESUMEN

MOTIVATION: DNA methylation aberrations are now known to, almost universally, accompany the initiation and progression of cancers. In particular, the colon cancer epigenome contains specific genomic regions that, along with differences in methylation levels with respect to normal colon tissue, also show increased epigenetic and gene expression heterogeneity at the population level, i.e. across tumor samples, in comparison with other regions in the genome. Tumors are highly heterogeneous at the clonal level as well, and the relationship between clonal and population heterogeneity is poorly understood. RESULTS: We present an approach that uses sequencing reads from high-throughput sequencing of bisulfite-converted DNA to reconstruct heterogeneous cell populations by assembling cell-specific methylation patterns. Our methodology is based on the solution of a specific class of minimum cost network flow problems. We use our methods to analyze the relationship between clonal heterogeneity and population heterogeneity in high-coverage data from multiple samples of colon tumor and matched normal tissues. AVAILABILITY AND IMPLEMENTATION: http://github.com/hcorrada/methylFlow CONTACT: hcorrada@umiacs.umd.edu SUPPLEMENTARY INFORMATION: SUPPLEMENTARY INFORMATION is available at Bioinformatics online.


Asunto(s)
Metilación de ADN , Epigenómica , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Sulfitos
13.
PLoS Pathog ; 12(4): e1005511, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-27046031

RESUMEN

Intracellular colonization and persistent infection by the kinetoplastid protozoan parasite, Trypanosoma cruzi, underlie the pathogenesis of human Chagas disease. To obtain global insights into the T. cruzi infective process, transcriptome dynamics were simultaneously captured in the parasite and host cells in an infection time course of human fibroblasts. Extensive remodeling of the T. cruzi transcriptome was observed during the early establishment of intracellular infection, coincident with a major developmental transition in the parasite. Contrasting this early response, few additional changes in steady state mRNA levels were detected once mature T. cruzi amastigotes were formed. Our findings suggest that transcriptome remodeling is required to establish a modified template to guide developmental transitions in the parasite, whereas homeostatic functions are regulated independently of transcriptomic changes, similar to that reported in related trypanosomatids. Despite complex mechanisms for regulation of phenotypic expression in T. cruzi, transcriptomic signatures derived from distinct developmental stages mirror known or projected characteristics of T. cruzi biology. Focusing on energy metabolism, we were able to validate predictions forecast in the mRNA expression profiles. We demonstrate measurable differences in the bioenergetic properties of the different mammalian-infective stages of T. cruzi and present additional findings that underscore the importance of mitochondrial electron transport in T. cruzi amastigote growth and survival. Consequences of T. cruzi colonization for the host include dynamic expression of immune response genes and cell cycle regulators with upregulation of host cholesterol and lipid synthesis pathways, which may serve to fuel intracellular T. cruzi growth. Thus, in addition to the biological inferences gained from gene ontology and functional enrichment analysis of differentially expressed genes in parasite and host, our comprehensive, high resolution transcriptomic dataset provides a substantially more detailed interpretation of T. cruzi infection biology and offers a basis for future drug and vaccine discovery efforts.


Asunto(s)
Fibroblastos/metabolismo , Transcriptoma/inmunología , Trypanosoma cruzi/inmunología , Animales , Células Cultivadas , Perfilación de la Expresión Génica , Humanos , Espacio Intracelular/inmunología , Proteínas Protozoarias/genética , ARN Mensajero/metabolismo
14.
Bioinformatics ; 32(12): 1873-9, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-26873931

RESUMEN

MOTIVATION: Developing targeted therapeutics and identifying biomarkers relies on large amounts of research participant data. Beyond human DNA, scientists now investigate the DNA of micro-organisms inhabiting the human body. Recent work shows that an individual's collection of microbial DNA consistently identifies that person and could be used to link a real-world identity to a sensitive attribute in a research dataset. Unfortunately, the current suite of DNA-specific privacy-preserving analysis tools does not meet the requirements for microbiome sequencing studies. RESULTS: To address privacy concerns around microbiome sequencing, we implement metagenomic analyses using secure computation. Our implementation allows comparative analysis over combined data without revealing the feature counts for any individual sample. We focus on three analyses and perform an evaluation on datasets currently used by the microbiome research community. We use our implementation to simulate sharing data between four policy-domains. Additionally, we describe an application of our implementation for patients to combine data that allows drug developers to query against and compensate patients for the analysis. AVAILABILITY AND IMPLEMENTATION: The software is freely available for download at: http://cbcb.umd.edu/∼hcorrada/projects/secureseq.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: hcorrada@umiacs.umd.edu.


Asunto(s)
Microbiota , ADN , Humanos , Metagenómica , Privacidad , Programas Informáticos
15.
BMC Genomics ; 16: 1108, 2015 Dec 29.
Artículo en Inglés | MEDLINE | ID: mdl-26715493

RESUMEN

BACKGROUND: Parasites of the genus Leishmania are the causative agents of leishmaniasis, a group of diseases that range in manifestations from skin lesions to fatal visceral disease. The life cycle of Leishmania parasites is split between its insect vector and its mammalian host, where it resides primarily inside of macrophages. Once intracellular, Leishmania parasites must evade or deactivate the host's innate and adaptive immune responses in order to survive and replicate. RESULTS: We performed transcriptome profiling using RNA-seq to simultaneously identify global changes in murine macrophage and L. major gene expression as the parasite entered and persisted within murine macrophages during the first 72 h of an infection. Differential gene expression, pathway, and gene ontology analyses enabled us to identify modulations in host and parasite responses during an infection. The most substantial and dynamic gene expression responses by both macrophage and parasite were observed during early infection. Murine genes related to both pro- and anti-inflammatory immune responses and glycolysis were substantially upregulated and genes related to lipid metabolism, biogenesis, and Fc gamma receptor-mediated phagocytosis were downregulated. Upregulated parasite genes included those aimed at mitigating the effects of an oxidative response by the host immune system while downregulated genes were related to translation, cell signaling, fatty acid biosynthesis, and flagellum structure. CONCLUSIONS: The gene expression patterns identified in this work yield signatures that characterize multiple developmental stages of L. major parasites and the coordinated response of Leishmania-infected macrophages in the real-time setting of a dual biological system. This comprehensive dataset offers a clearer and more sensitive picture of the interplay between host and parasite during intracellular infection, providing additional insights into how pathogens are able to evade host defenses and modulate the biological functions of the cell in order to survive in the mammalian environment.


Asunto(s)
Interacciones Huésped-Patógeno/genética , Leishmania major/fisiología , Macrófagos/metabolismo , Animales , Perfilación de la Expresión Génica , Leishmania major/genética , Ratones , Transcriptoma/genética
16.
BMC Bioinformatics ; 16 Suppl 11: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26328750

RESUMEN

BACKGROUND: Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. RESULTS: In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. CONCLUSIONS: Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Genómica/métodos , Proteínas/genética , Programas Informáticos , Algoritmos , Genoma Humano , Humanos , Almacenamiento y Recuperación de la Información , Flujo de Trabajo
17.
Nucleic Acids Res ; 43(14): 6799-813, 2015 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-26150419

RESUMEN

Protozoan parasites of the genus Leishmania are the etiological agents of leishmaniasis, a group of diseases with a worldwide incidence of 0.9-1.6 million cases per year. We used RNA-seq to conduct a high-resolution transcriptomic analysis of the global changes in gene expression and RNA processing events that occur as L. major transforms from non-infective procyclic promastigotes to infective metacyclic promastigotes. Careful statistical analysis across multiple biological replicates and the removal of batch effects provided a high quality framework for comprehensively analyzing differential gene expression and transcriptome remodeling in this pathogen as it acquires its infectivity. We also identified precise 5' and 3' UTR boundaries for a majority of Leishmania genes and detected widespread alternative trans-splicing and polyadenylation. An investigation of possible correlations between stage-specific preferential trans-splicing or polyadenylation sites and differentially expressed genes revealed a lack of systematic association, establishing that differences in expression levels cannot be attributed to stage-regulated alternative RNA processing. Our findings build on and improve existing expression datasets and provide a substantially more detailed view of L. major biology that will inform the field and potentially provide a stronger basis for drug discovery and vaccine development efforts.


Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Leishmania major/genética , Procesamiento Postranscripcional del ARN , Perfilación de la Expresión Génica , Ontología de Genes , Genes Protozoarios , Leishmania major/crecimiento & desarrollo , Leishmania major/metabolismo , Poliadenilación , Análisis de Secuencia de ARN , Trans-Empalme
18.
Biostatistics ; 16(4): 627-40, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25964664

RESUMEN

The recent growth of high-throughput transcriptome technology has been paralleled by the development of statistical methodologies to analyze the data they produce. Some of these newly developed methods are based on the assumption that the data observed or a transformation of the data are relatively symmetric with light tails, usually summarized by assuming a Gaussian random component. It is indeed very difficult to assess this assumption for small sample sizes. In this article, we utilize L-moments statistics as the basis of exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, we use L-moments ratios for assessing the shape (skewness and kurtosis) of high-throughput transcriptome data. Based on these statistics, we propose an algorithm for identifying genes with distributions that are markedly different from the majority in the data. In addition, we also illustrate the utility of this framework to characterize the robustness of distributional assumptions. We apply it to RNA-seq data and find that methods based on the simple [Formula: see text]-test for differential expression analysis using L-moments as weights are robust.


Asunto(s)
Interpretación Estadística de Datos , Perfilación de la Expresión Génica/métodos , Transcriptoma/genética , Tamaño de la Muestra
19.
PeerJ ; 2: e561, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25332844

RESUMEN

Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.

20.
Genome Biol ; 15(6): R76, 2014 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-24995464

RESUMEN

BACKGROUND: Diarrheal diseases continue to contribute significantly to morbidity and mortality in infants and young children in developing countries. There is an urgent need to better understand the contributions of novel, potentially uncultured, diarrheal pathogens to severe diarrheal disease, as well as distortions in normal gut microbiota composition that might facilitate severe disease. RESULTS: We use high throughput 16S rRNA gene sequencing to compare fecal microbiota composition in children under five years of age who have been diagnosed with moderate to severe diarrhea (MSD) with the microbiota from diarrhea-free controls. Our study includes 992 children from four low-income countries in West and East Africa, and Southeast Asia. Known pathogens, as well as bacteria currently not considered as important diarrhea-causing pathogens, are positively associated with MSD, and these include Escherichia/Shigella, and Granulicatella species, and Streptococcus mitis/pneumoniae groups. In both cases and controls, there tend to be distinct negative correlations between facultative anaerobic lineages and obligate anaerobic lineages. Overall genus-level microbiota composition exhibit a shift in controls from low to high levels of Prevotella and in MSD cases from high to low levels of Escherichia/Shigella in younger versus older children; however, there was significant variation among many genera by both site and age. CONCLUSIONS: Our findings expand the current understanding of microbiota-associated diarrhea pathogenicity in young children from developing countries. Our findings are necessarily based on correlative analyses and must be further validated through epidemiological and molecular techniques.


Asunto(s)
Diarrea Infantil/microbiología , Disentería/microbiología , Intestinos/microbiología , Microbiota/genética , Bangladesh , Secuencia de Bases , Estudios de Casos y Controles , Preescolar , Heces/microbiología , Femenino , Gambia , Humanos , Lactante , Recién Nacido , Kenia , Masculino , Malí , Tipificación Molecular , Pobreza , ARN Bacteriano/genética , ARN Ribosómico 16S/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA