Pesquisa | BVS IEC

Unravelling plasmidome distribution and interaction with its hosting microbiome.

Brown Kav, Aya; Rozov, Roye; Bogumil, David; Sørensen, Søren Johannes; Hansen, Lars Hestbjerg; Benhar, Itai; Halperin, Eran; Shamir, Ron; Mizrahi, Itzhak.

Environ Microbiol ; 22(1): 32-44, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31602783

RESUMO

Horizontal gene transfer via plasmids plays a pivotal role in microbial evolution. The forces that shape plasmidomes functionality and distribution in natural environments are insufficiently understood. Here, we present a comparative study of plasmidomes across adjacent microbial environments present in different individual rumen microbiomes. Our findings show that the rumen plasmidome displays enormous unknown functional potential currently unannotated in available databases. Nevertheless, this unknown functionality is conserved and shared with published rat gut plasmidome data. Moreover, the rumen plasmidome is highly diverse compared with the microbiome that hosts these plasmids, across both similar and different rumen habitats. Our analysis demonstrates that its structure is shaped more by stochasticity than selection. Nevertheless, the plasmidome is an active partner in its intricate relationship with the host microbiome with both interacting with and responding to their environment.

Assuntos

Bactérias/genética , Microbiota/genética , Plasmídeos/genética , Rúmen/microbiologia , Animais , Transferência Genética Horizontal

Faucet: streaming de novo assembly graph construction.

Rozov, Roye; Goldshlager, Gil; Halperin, Eran; Shamir, Ron.

Bioinformatics ; 34(1): 147-154, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-29036597

RESUMO

Motivation: We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. Results: Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata-coverage counts collected at junction k-mers and connections bridging between junction pairs-contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency-namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14-110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. Availability and implementation: Faucet is available at https://github.com/Shamir-Lab/Faucet. Contact: rshamir@tau.ac.il or eranhalperin@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica/métodos , Metagenoma , Microbiota/genética , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos

Recycler: an algorithm for detecting plasmids from de novo assembly graphs.

Rozov, Roye; Brown Kav, Aya; Bogumil, David; Shterzer, Naama; Halperin, Eran; Mizrahi, Itzhak; Shamir, Ron.

Bioinformatics ; 33(4): 475-482, 2017 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-28003256

RESUMO

Motivation: Plasmids and other mobile elements are central contributors to microbial evolution and genome innovation. Recently, they have been found to have important roles in antibiotic resistance and in affecting production of metabolites used in industrial and agricultural applications. However, their characterization through deep sequencing remains challenging, in spite of rapid drops in cost and throughput increases for sequencing. Here, we attempt to ameliorate this situation by introducing a new circular element assembly algorithm, leveraging assembly graphs provided by a conventional de novo assembler and alignments of paired-end reads to assemble cyclic sequences likely to be plasmids, phages and other circular elements. Results: We introduce Recycler, the first tool that can extract complete circular contigs from sequence data of isolate microbial genomes, plasmidome and metagenome sequence data. We show that Recycler greatly increases the number of true plasmids recovered relative to other approaches while remaining highly accurate. We demonstrate this trend via simulations of plasmidomes, comparisons of predictions with reference data for isolate samples, and assessments of annotation accuracy on metagenome data. In addition, we provide validation by DNA amplification of 77 plasmids predicted by Recycler from the different sequenced samples in which Recycler showed mean accuracy of 89% across all data types-isolate, microbiome and plasmidome. Availability and Implementation: Recycler is available at http://github.com/Shamir-Lab/Recycler. Contact: imizrahi@bgu.ac.il. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Bactérias/genética , Genoma Bacteriano , Metagenoma , Plasmídeos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos

Patient-specific induced pluripotent stem-cell-derived models of LEOPARD syndrome.

Carvajal-Vergara, Xonia; Sevilla, Ana; D'Souza, Sunita L; Ang, Yen-Sin; Schaniel, Christoph; Lee, Dung-Fang; Yang, Lei; Kaplan, Aaron D; Adler, Eric D; Rozov, Roye; Ge, Yongchao; Cohen, Ninette; Edelmann, Lisa J; Chang, Betty; Waghray, Avinash; Su, Jie; Pardo, Sherly; Lichtenbelt, Klaske D; Tartaglia, Marco; Gelb, Bruce D; Lemischka, Ihor R.

Nature ; 465(7299): 808-12, 2010 Jun 10.

Artigo em Inglês | MEDLINE | ID: mdl-20535210

RESUMO

The generation of reprogrammed induced pluripotent stem cells (iPSCs) from patients with defined genetic disorders holds the promise of increased understanding of the aetiologies of complex diseases and may also facilitate the development of novel therapeutic interventions. We have generated iPSCs from patients with LEOPARD syndrome (an acronym formed from its main features; that is, lentigines, electrocardiographic abnormalities, ocular hypertelorism, pulmonary valve stenosis, abnormal genitalia, retardation of growth and deafness), an autosomal-dominant developmental disorder belonging to a relatively prevalent class of inherited RAS-mitogen-activated protein kinase signalling diseases, which also includes Noonan syndrome, with pleomorphic effects on several tissues and organ systems. The patient-derived cells have a mutation in the PTPN11 gene, which encodes the SHP2 phosphatase. The iPSCs have been extensively characterized and produce multiple differentiated cell lineages. A major disease phenotype in patients with LEOPARD syndrome is hypertrophic cardiomyopathy. We show that in vitro-derived cardiomyocytes from LEOPARD syndrome iPSCs are larger, have a higher degree of sarcomeric organization and preferential localization of NFATC4 in the nucleus when compared with cardiomyocytes derived from human embryonic stem cells or wild-type iPSCs derived from a healthy brother of one of the LEOPARD syndrome patients. These features correlate with a potential hypertrophic state. We also provide molecular insights into signalling pathways that may promote the disease phenotype.

Assuntos

Células-Tronco Pluripotentes Induzidas/patologia , Síndrome LEOPARD/patologia , Modelos Biológicos , Medicina de Precisão , Adulto , Diferenciação Celular , Linhagem Celular , Linhagem da Célula , Células Cultivadas , Células-Tronco Embrionárias/metabolismo , Ativação Enzimática , Feminino , Fibroblastos/metabolismo , Fibroblastos/patologia , Perfilação da Expressão Gênica , Proteínas de Homeodomínio/genética , Humanos , Células-Tronco Pluripotentes Induzidas/enzimologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Síndrome LEOPARD/tratamento farmacológico , Síndrome LEOPARD/metabolismo , Masculino , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Miócitos Cardíacos/metabolismo , Miócitos Cardíacos/patologia , Fatores de Transcrição NFATC/genética , Fatores de Transcrição NFATC/metabolismo , Proteína Homeobox Nanog , Fator 3 de Transcrição de Octâmero/genética , Fosfoproteínas/análise , Reação em Cadeia da Polimerase , Proteína Tirosina Fosfatase não Receptora Tipo 11/genética , Proteína Tirosina Fosfatase não Receptora Tipo 11/metabolismo , Fatores de Transcrição SOXB1/genética

Systems-level dynamic analyses of fate change in murine embryonic stem cells.

Lu, Rong; Markowetz, Florian; Unwin, Richard D; Leek, Jeffrey T; Airoldi, Edoardo M; MacArthur, Ben D; Lachmann, Alexander; Rozov, Roye; Ma'ayan, Avi; Boyer, Laurie A; Troyanskaya, Olga G; Whetton, Anthony D; Lemischka, Ihor R.

Nature ; 462(7271): 358-62, 2009 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-19924215

RESUMO

Molecular regulation of embryonic stem cell (ESC) fate involves a coordinated interaction between epigenetic, transcriptional and translational mechanisms. It is unclear how these different molecular regulatory mechanisms interact to regulate changes in stem cell fate. Here we present a dynamic systems-level study of cell fate change in murine ESCs following a well-defined perturbation. Global changes in histone acetylation, chromatin-bound RNA polymerase II, messenger RNA (mRNA), and nuclear protein levels were measured over 5 days after downregulation of Nanog, a key pluripotency regulator. Our data demonstrate how a single genetic perturbation leads to progressive widespread changes in several molecular regulatory layers, and provide a dynamic view of information flow in the epigenome, transcriptome and proteome. We observe that a large proportion of changes in nuclear protein levels are not accompanied by concordant changes in the expression of corresponding mRNAs, indicating important roles for translational and post-translational regulation of ESC fate. Gene-ontology analysis across different molecular layers indicates that although chromatin reconfiguration is important for altering cell fate, it is preceded by transcription-factor-mediated regulatory events. The temporal order of gene expression alterations shows the order of the regulatory network reconfiguration and offers further insight into the gene regulatory network. Our studies extend the conventional systems biology approach to include many molecular species, regulatory layers and temporal series, and underscore the complexity of the multilayer regulatory mechanisms responsible for changes in protein expression that determine stem cell fate.

Assuntos

Diferenciação Celular , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Animais , Epigênese Genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Camundongos , Proteoma , Fatores de Tempo

Fast lossless compression via cascading Bloom filters.

Rozov, Roye; Shamir, Ron; Halperin, Eran.

BMC Bioinformatics ; 15 Suppl 9: S7, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25252952

RESUMO

BACKGROUND: Data from large Next Generation Sequencing (NGS) experiments present challenges both in terms of costs associated with storage and in time required for file transfer. It is sometimes possible to store only a summary relevant to particular applications, but generally it is desirable to keep all information needed to revisit experimental results in the future. Thus, the need for efficient lossless compression methods for NGS reads arises. It has been shown that NGS-specific compression schemes can improve results over generic compression methods, such as the Lempel-Ziv algorithm, Burrows-Wheeler transform, or Arithmetic Coding. When a reference genome is available, effective compression can be achieved by first aligning the reads to the reference genome, and then encoding each read using the alignment position combined with the differences in the read relative to the reference. These reference-based methods have been shown to compress better than reference-free schemes, but the alignment step they require demands several hours of CPU time on a typical dataset, whereas reference-free methods can usually compress in minutes. RESULTS: We present a new approach that achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress. In contrast to reference-based methods that first align reads to the genome, we hash all reads into Bloom filters to encode, and decode by querying the same Bloom filters using read-length subsequences of the reference genome. Further compression is achieved by using a cascade of such filters. CONCLUSIONS: Our method, called BARCODE, runs an order of magnitude faster than reference-based methods, while compressing an order of magnitude better than reference-free methods, over a broad range of sequencing coverage. In high coverage (50-100 fold), compared to the best tested compressors, BARCODE saves 80-90% of the running time while only increasing space slightly.

Assuntos

Compressão de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Compressão de Dados/economia , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/economia , Software

MGMR: leveraging RNA-Seq population data to optimize expression estimation.

Rozov, Roye; Halperin, Eran; Shamir, Ron.

BMC Bioinformatics ; 13 Suppl 6: S2, 2012 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-22537041

RESUMO

BACKGROUND: RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples RESULTS: In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes. CONCLUSIONS: We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.

Assuntos

Perfilação da Expressão Gênica/métodos , Genética Populacional/métodos , Modelos Estatísticos , Genoma , Projeto HapMap , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de RNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA