Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Bioinformatics ; 35(14): i13-i22, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510682

RESUMEN

MOTIVATION: Bacterial metagenomics profiling for metagenomic whole sequencing (mWGS) usually starts by aligning sequencing reads to a collection of reference genomes. Current profiling tools are designed to work against a small representative collection of genomes, and do not scale very well to larger reference genome collections. However, large reference genome collections are capable of providing a more complete and accurate profile of the bacterial population in a metagenomics dataset. In this paper, we discuss a scalable, efficient and affordable approach to this problem, bringing big data solutions within the reach of laboratories with modest resources. RESULTS: We developed Flint, a metagenomics profiling pipeline that is built on top of the Apache Spark framework, and is designed for fast real-time profiling of metagenomic samples against a large collection of reference genomes. Flint takes advantage of Spark's built-in parallelism and streaming engine architecture to quickly map reads against a large (170 GB) reference collection of 43 552 bacterial genomes from Ensembl. Flint runs on Amazon's Elastic MapReduce service, and is able to profile 1 million Illumina paired-end reads against over 40 K genomes on 64 machines in 67 s-an order of magnitude faster than the state of the art, while using a much larger reference collection. Streaming the sequencing reads allows this approach to sustain mapping rates of 55 million reads per hour, at an hourly cluster cost of $8.00 USD, while avoiding the necessity of storing large quantities of intermediate alignments. AVAILABILITY AND IMPLEMENTATION: Flint is open source software, available under the MIT License (MIT). Source code is available at https://github.com/camilo-v/flint. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nube Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Microbiota , Algoritmos , Metagenómica , Análisis de Secuencia de ADN , Programas Informáticos
2.
Front Bioinform ; 3: 1154588, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37405310

RESUMEN

Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced reads coming from thousands of microbial genomes. Analyzing and understanding these profiles can be a challenge since the data they represent are complex. Particularly challenging is their visualization, as existing techniques are inadequate when the taxa number is in the thousands. We present a technique, and accompanying software, for the visualization of metagenomic abundance profiles using a space-filling curve that transforms a profile into an interactive 2D image. We created Jasper, an easy to use tool for the visualization and exploration of metagenomic profiles from DNA sequencing data. It orders taxa using a space-filling Hilbert curve, and creates a "Microbiome Map", where each position in the image represents the abundance of a single taxon from a reference collection. Jasper can order taxa in multiple ways, and the resulting microbiome maps can highlight "hot spots" of microbes that are dominant in taxonomic clades or biological conditions. We use Jasper to visualize samples from a variety of microbiome studies, and discuss ways in which microbiome maps can be an invaluable tool to visualize spatial, temporal, disease, and differential profiles. Our approach can create detailed microbiome maps involving hundreds of thousands of microbial reference genomes with the potential to unravel latent relationships (taxonomic, spatio-temporal, functional, and other) that could remain hidden using traditional visualization techniques. The maps can also be converted into animated movies that bring to life the dynamicity of microbiomes.

3.
Microb Genom ; 8(12)2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36748547

RESUMEN

The use of whole metagenomic data to infer the relative abundance of all its microbes is well established. The same data can be used to determine the replication rate of all eubacterial taxa with circular chromosomes. Despite their availability, the replication rate profiles (metareplicome) have not been fully exploited in microbiome analyses. Another relatively new approach is the application of causal inferencing to analyse microbiome data that goes beyond correlational studies. A novel scalable pipeline called MeRRCI (Metagenome, metaResistome, and metaReplicome for Causal Inferencing) was developed. MeRRCI combines efficient computation of the metagenome (bacterial relative abundance), metaresistome (antimicrobial gene abundance) and metareplicome (replication rates), and integrates environmental variables (metadata) for causality analysis using Bayesian networks. MeRRCI was applied to an infant gut microbiome data set to investigate the microbial community's response to antibiotics. Our analysis suggests that the current treatment stratagem contributes to preterm infant gut dysbiosis, allowing a proliferation of pathobionts. The study highlights the specific antibacterial resistance genes that may contribute to exponential cell division in the presence of antibiotics for various pathogens, namely Klebsiella pneumoniae, Citrobacter freundii, Staphylococcus epidermidis, Veilonella parvula and Clostridium perfringens. These organisms often contribute to the harmful long-term sequelae seen in these young infants.


Asunto(s)
Recien Nacido Prematuro , Metagenoma , Lactante , Recién Nacido , Humanos , Antibacterianos/farmacología , Disbiosis , Teorema de Bayes , Bacterias , Farmacorresistencia Bacteriana/genética
4.
Proc Natl Acad Sci U S A ; 105(8): 3100-5, 2008 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-18287045

RESUMEN

One of the hallmarks of the Gram-negative bacterium Pseudomonas aeruginosa is its ability to thrive in diverse environments that includes humans with a variety of debilitating diseases or immune deficiencies. Here we report the complete sequence and comparative analysis of the genomes of two representative P. aeruginosa strains isolated from cystic fibrosis (CF) patients whose genetic disorder predisposes them to infections by this pathogen. The comparison of the genomes of the two CF strains with those of other P. aeruginosa presents a picture of a mosaic genome, consisting of a conserved core component, interrupted in each strain by combinations of specific blocks of genes. These strain-specific segments of the genome are found in limited chromosomal locations, referred to as regions of genomic plasticity. The ability of P. aeruginosa to shape its genomic composition to favor survival in the widest range of environmental reservoirs, with corresponding enhancement of its metabolic capacity is supported by the identification of a genomic island in one of the sequenced CF isolates, encoding enzymes capable of degrading terpenoids produced by trees. This work suggests that niche adaptation is a major evolutionary force influencing the composition of bacterial genomes. Unlike genome reduction seen in host-adapted bacterial pathogens, the genetic capacity of P. aeruginosa is determined by the ability of individual strains to acquire or discard genomic segments, giving rise to strains with customized genomic repertoires. Consequently, this organism can survive in a wide range of environmental reservoirs that can serve as sources of the infecting organisms.


Asunto(s)
Fibrosis Quística/complicaciones , Ambiente , Evolución Molecular , Genoma Bacteriano , Filogenia , Infecciones por Pseudomonas/microbiología , Pseudomonas aeruginosa/genética , Secuencia de Bases , Genómica , Humanos , Datos de Secuencia Molecular , Infecciones por Pseudomonas/etiología , Alineación de Secuencia , Análisis de Secuencia de ADN
5.
Nat Commun ; 10(1): 3028, 2019 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-31292434

RESUMEN

Cerebellar neuronal progenitors undergo a series of divisions before irreversibly exiting the cell cycle and differentiating into neurons. Dysfunction of this process underlies many neurological diseases including ataxia and the most common pediatric brain tumor, medulloblastoma. To better define the pathways controlling the most abundant neuronal cells in the mammalian cerebellum, cerebellar granule cell progenitors (GCPs), we performed RNA-sequencing of GCPs exiting the cell cycle. Time-series modeling of GCP cell cycle exit identified downregulation of activity of the epigenetic reader protein Brd4. Brd4 binding to the Gli1 locus is controlled by Casein Kinase 1δ (CK1 δ)-dependent phosphorylation during GCP proliferation, and decreases during GCP cell cycle exit. Importantly, conditional deletion of Brd4 in vivo in the developing cerebellum induces cerebellar morphological deficits and ataxia. These studies define an essential role for Brd4 in cerebellar granule cell neurogenesis and are critical for designing clinical trials utilizing Brd4 inhibitors in neurological indications.


Asunto(s)
Ataxia Cerebelosa/genética , Corteza Cerebelosa/crecimiento & desarrollo , Células-Madre Neurales/fisiología , Neurogénesis/fisiología , Proteínas Nucleares/metabolismo , Factores de Transcripción/metabolismo , Animales , Animales Recién Nacidos , Quinasa Idelta de la Caseína , Ciclo Celular/fisiología , Diferenciación Celular/fisiología , Proliferación Celular/fisiología , Ataxia Cerebelosa/patología , Corteza Cerebelosa/citología , Corteza Cerebelosa/patología , Modelos Animales de Enfermedad , Regulación hacia Abajo , Humanos , Ratones , Ratones Noqueados , Neuronas/fisiología , Proteínas Nucleares/genética , Fosforilación/fisiología , Cultivo Primario de Células , Factores de Transcripción/genética , Proteína con Dedos de Zinc GLI1/metabolismo
6.
Sci Rep ; 7(1): 17344, 2017 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-29229974

RESUMEN

We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While cataloguing the profiles based on the known biotypes, all the employed RNA-Seq methods generated just a small consensus of significant biotypes. We thus designed an approach to reconcile the profiles from all methods following a simple strategy: we selected genes that were confirmed as differentially expressed by the ensemble predictions obtained in a regression model. The main advantages of this approach are: 1) Selection of a high-confidence gene set identifying relevant pathways; 2) Use of a regression model whose covariates embed all method-driven outcomes to predict an averaged profile; 3) Method-specific assessment of prediction power and significance. Furthermore, the approach can be generalized to any biological system for which noisy RNA-Seq profiles are computed. As our analyses concerned bio-annotations of both high-quality protein coding genes and ncRNAs, we considered the associations between pseudogenes and parental genes (targets). Among the candidate targets that were validated, we identified PINK1, which is studied in patients with Parkinson and cancer (especially melanoma).


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Melanoma/genética , Modelos Estadísticos , Seudogenes , Análisis de Secuencia de ARN/métodos , Neoplasias Cutáneas/genética , Transcriptoma , Regulación Neoplásica de la Expresión Génica , Humanos , Melanoma Cutáneo Maligno
7.
Methods Mol Biol ; 1167: 157-83, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24823777

RESUMEN

The detection of transcripts and the measurement of their associated activity at the pseudogene scale have recently become important topics of research. Being integral part of many recent studies aimed at establishing a role for a variety of noncoding RNA structures, pseudogenes' popularity has substantially increased due to the discovery of regulatory properties and complex mechanisms of action that, while requiring further investigation, analysis, and validation, promise as well to have a broad impact on human disease. Currently, there are relatively few methodologies specifically designed to accomplish the detection of pseudogene transcripts and tools that either replace or integrate manual annotation procedures are very much needed. In particular, it seems to us justified that we engage in advancing the computational treatment of pseudogenes at the whole transcriptome level. Catalogs of human pseudogenes have started to be delivered, through RNA-Seq technologies. However, just a certain number of transcriptomes has been covered. Furthermore, while most proposals have led to the production of a targeted algorithm, especially used for detection, few computational pipelines were designed following a comprehensive approach addressing identification and quantification of transcriptional activity within a unifying methodological frame. Given the currently incomplete evidence, the limitations of the impacts due to the lack of extensive testing, and the presence of unsolved uncertainties affecting the reproducibility of results, our motivation for the proposal of a new computational approach is high and timely. We have considered a hybrid approach, based on the assembly of a variety of computational tools, including RNA-Seq methods and machine learning applications, all applied to transcriptome data of various complexities. Our initial strategy is to provide lists of pseudogenes to be validated against the currently known examples, in order to extend our knowledge further. An ultimate goal that is naturally linked to this work is to provide an automatic approach that analyzes transcriptomes with the goal of detecting candidate pseudogenes through characteristic features and that allows efficient and reproducible pseudogene classification models.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Seudogenes/genética , Transcripción Genética , Encéfalo/metabolismo , Bases de Datos de Ácidos Nucleicos , Humanos , Análisis de Secuencia de ARN , Transcriptoma
8.
J Clin Bioinforma ; 3(1): 8, 2013 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-23594746

RESUMEN

BACKGROUND: Exploring stromal changes associated with tumor growth and development is a growing area of oncologic research. In order to study molecular changes in the stroma it is recommended to separate tumor tissue from stromal tissue. This is relevant to xenograft models where tumors can be small and difficult to separate from host tissue. We introduce a novel definition of cross-alignment/cross-hybridization to compare qualitatively the ability of high-throughput mRNA sequencing, RNA-Seq, and microarrays to detect tumor and stromal expression from mixed 'pseudo-xenograft' samples vis-à-vis genes and pathways in cross-alignment (RNA-Seq) and cross-hybridization (microarrays). Samples consisted of normal mouse lung and human breast cancer cells; these were combined in fixed proportions to create a titration series of 25% steps. Our definition identifies genes in a given species (human or mouse) with undetectable expression in same-species RNA but detectable expression in cross-species RNA. We demonstrate the comparative value of this method and discuss its potential contribution in cancer research. RESULTS: Our method can identify genes from either species that demonstrate cross-hybridization and/or cross-alignment properties. Surprisingly, the set of genes identified using a simpler and more common approach (using a 'pure' cross-species sample and calling all detected genes as 'crossers') is not a superset of the genes identified using our technique. The observed levels of cross-hybridization are relatively low: 5.3% of human genes detected in mouse, and 3.5% of mouse genes detected in human. Observed levels of cross-alignment are practically comparable to the levels of cross-hybridization: 6.5% of human genes detected in mouse, and 2.3% of mouse genes detected in human. We also observed a relatively high percentage of orthologs: 40.3% of cross-hybridizing genes, and 32.2% of cross-aligning genes.Normalizing the gene catalog to use Consensus Coding Sequence (CCDS) IDs (Genome Res 19:1316-1323, 2009), our results show that the observed levels of cross-hybridization are low: 2.7% of human CCDS IDs are detected in mouse, and 2.4% of mouse CCDS IDs are detected in human. Levels of cross-alignment using the RNA-Seq data are comparable for the mouse, 2.2% of mouse CCDS IDs detected in human, and 9.9% of human CCDS IDs detected in mouse. However, the lists of cross-aligning/cross-hybridizing genes contain many that are of specific interest to oncologic researchers. CONCLUSIONS: The conservative definition that we propose identifies genes in mouse whose expression can be attributed to human RNA, and vice versa, as well as revealing genes with cross-alignment/cross-hybridization properties which could not be identified using a simpler but more established approach. The overall percentage of genes affected by cross-hybridization/cross-alignment is small, but includes genes that are of interest to oncologic researchers. Which platform to use with mixed xenograft samples, microarrays or RNA-Seq, appears to be primarily a question of cost and whether the detection and measurement of expression of specific genes of interest are likely to be affected by cross-hybridization or cross-alignment.

9.
Adv Bioinformatics ; 2011: 271563, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22194743

RESUMEN

The GenSensor Suite consists of four web tools for elucidating relationships among genes and proteins. GenPath results show which biochemical, regulatory, or other gene set categories are over- or under-represented in an input list compared to a background list. All common gene sets are available for searching in GenPath, plus some specialized sets. Users can add custom background lists. GenInteract builds an interaction gene list from a single gene input and then analyzes this in GenPath. GenPubMed uses a PubMed query to identify a list of PubMed IDs, from which a gene list is extracted and queried in GenPath. GenViewer allows the user to query one gene set against another in GenPath. GenPath results are presented with relevant P- and q-values in an uncluttered, fully linked, and integrated table. Users can easily copy this table and paste it directly into a spreadsheet or document.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA