Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 10 de 10
1.
Science ; 369(6502)2020 07 24.
Article En | MEDLINE | ID: mdl-32703847

The extensive heterogeneity of biological data poses challenges to analysis and interpretation. Construction of a large-scale mechanistic model of Escherichia coli enabled us to integrate and cross-evaluate a massive, heterogeneous dataset based on measurements reported by various groups over decades. We identified inconsistencies with functional consequences across the data, including that the total output of the ribosomes and RNA polymerases described by data are not sufficient for a cell to reproduce measured doubling times, that measured metabolic parameters are neither fully compatible with each other nor with overall growth, and that essential proteins are absent during the cell cycle-and the cell is robust to this absence. Finally, considering these data as a whole leads to successful predictions of new experimental outcomes, in this case protein half-lives.


Data Analysis , Datasets as Topic , Escherichia coli Proteins , Escherichia coli , Computer Simulation
2.
Nat Protoc ; 13(1): 155-169, 2018 Jan.
Article En | MEDLINE | ID: mdl-29266096

Although kinases are important regulators of many cellular processes, measuring their activity in live cells remains challenging. We have developed kinase translocation reporters (KTRs), which enable multiplexed measurements of the dynamics of kinase activity at a single-cell level. These KTRs are composed of an engineered construct in which a kinase substrate is fused to a bipartite nuclear localization signal (bNLS) and nuclear export signal (NES), as well as to a fluorescent protein for microscopy-based detection of its localization. The negative charge introduced by phosphorylation of the substrate is used to directly modulate nuclear import and export, thereby regulating the reporter's distribution between the cytoplasm and nucleus. The relative cytoplasmic versus nuclear fluorescence of the KTR construct (the C/N ratio) is used as a proxy for the kinase activity in living, single cells. Multiple KTRs can be studied in the same cell by fusing them to different fluorescent proteins. Here, we present a protocol to execute and analyze live-cell microscopy experiments using KTRs. We describe strategies for development of new KTRs and procedures for lentiviral expression of KTRs in a cell line of choice. Cells are then plated in a 96-well plate, from which multichannel fluorescent images are acquired with automated time-lapse microscopy. We provide detailed guidance for a computational analysis and parameterization pipeline. The entire procedure, from virus production to data analysis, can be completed in ∼10 d.


Molecular Imaging/methods , Nuclear Localization Signals/metabolism , Phosphotransferases , Recombinant Fusion Proteins/metabolism , Single-Cell Analysis/methods , Cell Nucleus/chemistry , Cell Nucleus/metabolism , Cytoplasm/chemistry , Cytoplasm/metabolism , Genes, Reporter , HEK293 Cells , Humans , Image Processing, Computer-Assisted , Luminescent Proteins/chemistry , Luminescent Proteins/genetics , Luminescent Proteins/metabolism , Nuclear Localization Signals/genetics , Phosphorylation , Phosphotransferases/analysis , Phosphotransferases/metabolism , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics
3.
PeerJ ; 5: e4026, 2017.
Article En | MEDLINE | ID: mdl-29204318

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.

4.
Front Microbiol ; 6: 381, 2015.
Article En | MEDLINE | ID: mdl-26005436

Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

5.
Front Genet ; 4: 41, 2013.
Article En | MEDLINE | ID: mdl-23579547

Metagenomics is a primary tool for the description of microbial and viral communities. The sheer magnitude of the data generated in each metagenome makes identifying key differences in the function and taxonomy between communities difficult to elucidate. Here we discuss the application of seven different data mining and statistical analyses by comparing and contrasting the metabolic functions of 212 microbial metagenomes within and between 10 environments. Not all approaches are appropriate for all questions, and researchers should decide which approach addresses their questions. This work demonstrated the use of each approach: for example, random forests provided a robust and enlightening description of both the clustering of metagenomes and the metabolic processes that were important in separating microbial communities from different environments. All analyses identified that the presence of phage genes within the microbial community was a predictor of whether the microbial community was host-associated or free-living. Several analyses identified the subtle differences that occur with environments, such as those seen in different regions of the marine environment.

6.
Sci Rep ; 3: 1033, 2013.
Article En | MEDLINE | ID: mdl-23301154

All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.


Computing Methodologies , Information Theory , Metagenomics/methods , Sequence Analysis, DNA/statistics & numerical data , Algorithms , Bacteria/genetics , Bacteriophages/genetics , Base Composition , Base Sequence , Genome, Bacterial/genetics , Genome, Viral/genetics , Genomic Library , Metagenome/genetics
7.
Environ Microbiol ; 14(11): 3043-65, 2012 Nov.
Article En | MEDLINE | ID: mdl-23039259

Oxygen minimum zones (OMZs) are oceanographic features that affect ocean productivity and biodiversity, and contribute to ocean nitrogen loss and greenhouse gas emissions. Here we describe the viral communities associated with the Eastern Tropical South Pacific (ETSP) OMZ off Iquique, Chile for the first time through abundance estimates and viral metagenomic analysis. The viral-to-microbial ratio (VMR) in the ETSP OMZ fluctuated in the oxycline and declined in the anoxic core to below one on several occasions. The number of viral genotypes (unique genomes as defined by sequence assembly) ranged from 2040 at the surface to 98 in the oxycline, which is the lowest viral diversity recorded to date in the ocean. Within the ETSP OMZ viromes, only 4.95% of genotypes were shared between surface and anoxic core viromes using reciprocal BLASTn sequence comparison. ETSP virome comparison with surface marine viromes (Sargasso Sea, Gulf of Mexico, Kingman Reef, Chesapeake Bay) revealed a dissimilarity of ETSP OMZ viruses to those from other oceanic regions. From the 1.4 million non-redundant DNA sequences sampled within the altered oxygen conditions of the ETSP OMZ, more than 97.8% were novel. Of the average 3.2% of sequences that showed similarity to the SEED non-redundant database, phage sequences dominated the surface viromes, eukaryotic virus sequences dominated the oxycline viromes, and phage sequences dominated the anoxic core viromes. The viral community of the ETSP OMZ was characterized by fluctuations in abundance, taxa and diversity across the oxygen gradient. The ecological significance of these changes was difficult to predict; however, it appears that the reduction in oxygen coincides with an increased shedding of eukaryotic viruses in the oxycline, and a shift to unique viral genotypes in the anoxic core.


Biodiversity , Oxygen/metabolism , Seawater/virology , Virus Physiological Phenomena , Anaerobiosis , Bacteria/classification , Bacteria/genetics , Bacteriophages/genetics , Bacteriophages/physiology , Chile , Genotype , Nitrogen/metabolism , Oceans and Seas , Oxidation-Reduction , Phylogeny , Sulfur/metabolism , Viruses/genetics
8.
PLoS One ; 7(8): e42888, 2012.
Article En | MEDLINE | ID: mdl-22936998

Next-generation sequencing technologies are rapidly transforming molecular systematic studies of non-model animal taxa. The arachnid order Opiliones (commonly known as "harvestmen") includes more than 6,400 described species placed into four well-supported lineages (suborders). Fossil plus molecular clock evidence indicates that these lineages were diverging in the late Silurian to mid-Carboniferous, with some fossil harvestmen representing the earliest known land animals. Perhaps because of this ancient divergence, phylogenetic resolution of subordinal interrelationships within Opiliones has been difficult. We present the first phylogenomics analysis for harvestmen, derived from comparative RNA-Seq data for eight species representing all suborders. Over 30 gigabases of original Illumina short-read data were used in de novo assemblies, resulting in 50-80,000 transcripts per taxon. Transcripts were compared to published scorpion and tick genomics data, and a stringent filtering process was used to identify over 350 putatively single-copy, orthologous protein-coding genes shared among taxa. Phylogenetic analyses using various partitioning strategies, data coding schemes, and analytical methods overwhelmingly support the "classical" hypothesis of Opiliones relationships, including the higher-level clades Palpatores and Phalangida. Relaxed molecular clock analyses using multiple alternative fossil calibration strategies corroborate ancient divergences within Opiliones that are possibly deeper than the recorded fossil record indicates. The assembled data matrices, comprising genes that are conserved, highly expressed, and varying in length and phylogenetic informativeness, represent an important resource for future molecular systematic studies of Opiliones and other arachnid groups.


Arachnida/genetics , Phylogeny , Animals , Arachnida/classification , Transcriptome/genetics
9.
Nucleic Acids Res ; 40(16): e126, 2012 Sep.
Article En | MEDLINE | ID: mdl-22584627

Prophages are phages in lysogeny that are integrated into, and replicated as part of, the host bacterial genome. These mobile elements can have tremendous impact on their bacterial hosts' genomes and phenotypes, which may lead to strain emergence and diversification, increased virulence or antibiotic resistance. However, finding prophages in microbial genomes remains a problem with no definitive solution. The majority of existing tools rely on detecting genomic regions enriched in protein-coding genes with known phage homologs, which hinders the de novo discovery of phage regions. In this study, a weighted phage detection algorithm, PhiSpy was developed based on seven distinctive characteristics of prophages, i.e. protein length, transcription strand directionality, customized AT and GC skew, the abundance of unique phage words, phage insertion points and the similarity of phage proteins. The first five characteristics are capable of identifying prophages without any sequence similarity with known phage genes. PhiSpy locates prophages by ranking genomic regions enriched in distinctive phage traits, which leads to the successful prediction of 94% of prophages in 50 complete bacterial genomes with a 6% false-negative rate and a 0.66% false-positive rate.


Algorithms , Genome, Bacterial , Prophages/genetics , Base Composition , Codon , DNA, Bacterial/chemistry , Transcription, Genetic , Viral Proteins/genetics
10.
BMC Bioinformatics ; 11: 319, 2010 Jun 14.
Article En | MEDLINE | ID: mdl-20546611

BACKGROUND: The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. RESULTS: The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. CONCLUSIONS: We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.


Databases, Genetic , Genome , Metagenomics/methods , Software
...