RESUMO
Environmental factors restrict the distribution of microbial eukaryotes but the exact boundaries for eukaryotic life are not known. Here, we examine protistan communities at the extremes of salinity and osmotic pressure, and report rich assemblages inhabiting Bannock and Discovery, two deep-sea superhaline anoxic basins in the Mediterranean. Using a rRNA-based approach, we detected 1,538 protistan rRNA gene sequences from water samples with total salinity ranging from 39 to 280 g/Kg, and obtained evidence that this DNA was endogenous to the extreme habitat sampled. Statistical analyses indicate that the discovered phylotypes represent only a fraction of species actually inhabiting both the brine and the brine-seawater interface, with as much as 82% of the actual richness missed by our survey. Jaccard indices (e.g., for a comparison of community membership) suggest that the brine/interface protistan communities are unique to Bannock and Discovery basins, and share little (0.8-2.8%) in species composition with overlying waters with typical marine salinity and oxygen tension. The protistan communities from the basins' brine and brine/seawater interface appear to be particularly enriched with dinoflagellates, ciliates and other alveolates, as well as fungi, and are conspicuously poor in stramenopiles. The uniqueness and diversity of brine and brine-interface protistan communities make them promising targets for protistan discovery.
Assuntos
Oxigênio/análise , Água do Mar/microbiologia , Cloreto de Sódio/análise , Microbiologia da Água , Mar Mediterrâneo , Filogenia , RNA Ribossômico/genética , Especificidade da EspécieRESUMO
SNPs located within the open reading frame of a gene that result in an alteration in the amino acid sequence of the encoded protein [nonsynonymous SNPs (nsSNPs)] might directly or indirectly affect functionality of the protein, alone or in the interactions in a multi-protein complex, by increasing/decreasing the activity of the metabolic pathway. Understanding the functional consequences of such changes and drawing conclusions about the molecular basis of diseases, involves integrating information from multiple heterogeneous sources including sequence, structure data and pathway relations between proteins. The data from NCBI's SNP database (dbSNP), gene and protein databases from Entrez, protein structures from the PDB and pathway information from KEGG have all been cross referenced into the StSNP web server, in an effort to provide combined integrated, reports about nsSNPs. StSNP provides 'on the fly' comparative modeling of nsSNPs with links to metabolic pathway information, along with real-time visual comparative analysis of the modeled structures using the Friend software application. The use of metabolic pathways in StSNP allows a researcher to examine possible disease-related pathways associated with a particular nsSNP(s), and link the diseases with the current available molecular structure data. The server is publicly available at http://glinka.bio.neu.edu/StSNP/.
Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Bases de Dados Genéticas , Polimorfismo de Nucleotídeo Único/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sistemas de Gerenciamento de Base de Dados , Humanos , Internet , Fases de Leitura Aberta/genética , Mapeamento de Interação de Proteínas/métodos , Proteínas/análise , Alinhamento de Sequência/métodosRESUMO
TOPOFIT-DB (T-DB) is a public web-based database of protein structural alignments based on the TOPOFIT method, providing a comprehensive resource for comparative analysis of protein structure families. The TOPOFIT method is based on the discovery of a saturation point on the alignment curve (topomax point) which presents an ability to objectively identify a border between common and variable parts in a protein structural family, providing additional insight into protein comparison and functional annotation. TOPOFIT also effectively detects non-sequential relations between protein structures. T-DB provides users with the convenient ability to retrieve and analyze structural neighbors for a protein; do one-to-all calculation of a user provided structure against the entire current PDB release with T-Server, and pair-wise comparison using the TOPOFIT method through the T-Pair web page. All outputs are reported in various web-based tables and graphics, with automated viewing of the structure-sequence alignments in the Friend software package for complete, detailed analysis. T-DB presents researchers with the opportunity for comprehensive studies of the variability in proteins and is publicly available at http://mozart.bio.neu.edu/topofit/index.php.
Assuntos
Bases de Dados de Proteínas , Homologia Estrutural de Proteína , Internet , Computação Matemática , Interface Usuário-ComputadorRESUMO
BACKGROUND: The main tool to discover novel microbial eukaryotes is the rRNA approach. This approach has important biases, including PCR discrimination against certain rRNA gene species, which makes molecular inventories skewed relative to the source communities. The degree of this bias has not been quantified, and it remains unclear whether species missed from clone libraries could be recovered by increasing sequencing efforts, or whether they cannot be detected in principle. Here we attempt to discriminate between these possibilities by statistically analysing four protistan inventories obtained using different general eukaryotic PCR primers. RESULTS: We show that each PCR primer set-specific clone library is not a sample from the community diversity but rather from a fraction of this diversity. Therefore, even sequencing such clone libraries to saturation would only recover that fraction, which, according to the parametric models, varies between 17 +/- 4% to 49 +/- 10%, depending on the set of primers. The pooled data is thus qualitatively richer than individual libraries, even if normalized to the same sequencing effort. CONCLUSION: The use of a single pair of primers leads to significant underestimation of the true community richness at all levels of taxonomic hierarchy. The majority of available protistan rRNA gene surveys likely sampled less than half of the target diversity, and might have completely missed the rest. The use of multiple PCR primers reduces this bias but does not necessarily eliminate it.
Assuntos
DNA de Protozoário/genética , Eucariotos/genética , Variação Genética , RNA Ribossômico/genética , Animais , Primers do DNA/genética , DNA Ribossômico/genética , Filogenia , Análise de Sequência de DNARESUMO
Similarity of protein structures has been analyzed using three-dimensional Delaunay triangulation patterns derived from the backbone representation. It has been found that structurally related proteins have a common spatial invariant part, a set of tetrahedrons, mathematically described as a common spatial subgraph volume of the three-dimensional contact graph derived from Delaunay tessellation (DT). Based on this property of protein structures, we present a novel common volume superimposition (TOPOFIT) method to produce structural alignments. Structural alignments usually evaluated by a number of equivalent (aligned) positions (N(e)) with corresponding root mean square deviation (RMSD). The superimposition of the DT patterns allows one to uniquely identify a maximal common number of equivalent residues in the structural alignment. In other words, TOPOFIT identifies a feature point on the RMSD N(e) curve, a topomax point, until which the topologies of two structures correspond to each other, including backbone and interresidue contacts, whereas the growing number of mismatches between the DT patterns occurs at larger RMSD (N(e)) after the topomax point. It has been found that the topomax point is present in all alignments from different protein structural classes; therefore, the TOPOFIT method identifies common, invariant structural parts between proteins. The alignments produced by the TOPOFIT method have a good correlation with alignments produced by other current methods. This novel method opens new opportunities for the comparative analysis of protein structures and for more detailed studies on understanding the molecular principles of tertiary structure organization and functionality. The TOPOFIT method also helps to detect conformational changes, topological differences in variable parts, which are particularly important for studies of variations in active/ binding sites and protein classification.
Assuntos
Proteínas/química , Software , Homologia Estrutural de Proteína , Gráficos por Computador , Estrutura Terciária de ProteínaRESUMO
This is the second paper in a series of three that investigates eukaryotic microbial diversity and taxon distribution in the Cariaco Basin, Venezuela, the ocean's largest anoxic marine basin. Here, we use phylogenetic information, multivariate community analyses and statistical richness predictions to test whether protists exhibit habitat specialization within defined geochemical layers of the water column. We also analyze spatio-temporal distributions of protists across two seasons and two geographic sites within the basin. Non-metric multidimensional scaling indicates that these two basin sites are inhabited by distinct protistan assemblages, an observation that is supported by the minimal overlap in observed and predicted richness of sampled sites. A comparison of parametric richness estimations indicates that protistan communities in closely spaced-but geochemically different-habitats are very dissimilar, and may share as few as 5% of total operational taxonomic units (OTUs). This is supported by a canonical correspondence analysis, indicating that the empirically observed OTUs are organized along opposing gradients in oxidants and reductants. Our phylogenetic analyses identify many new clades at species to class levels, some of which appear restricted to specific layers of the water column and have a significantly nonrandom distribution. These findings suggest many pelagic protists are restricted to specific habitats, and likely diversify, at least in part due to separation by geochemical barriers.
Assuntos
Ecossistema , Eucariotos/fisiologia , Água do Mar/parasitologia , Região do Caribe , Filogenia , RNA de Protozoário/genética , RNA Ribossômico/genética , VenezuelaRESUMO
Microbial diversity and distribution are topics of intensive research. In two companion papers in this issue, we describe the results of the Cariaco Microbial Observatory (Caribbean Sea, Venezuela). The Basin contains the largest body of marine anoxic water, and presents an opportunity to study protistan communities across biogeochemical gradients. In the first paper, we survey 18S ribosomal RNA (rRNA) gene sequence diversity using both Sanger- and pyrosequencing-based approaches, employing multiple PCR primers, and state-of-the-art statistical analyses to estimate microbial richness missed by the survey. Sampling the Basin at three stations, in two seasons, and at four depths with distinct biogeochemical regimes, we obtained the largest, and arguably the least biased collection of over 6000 nearly full-length protistan rRNA gene sequences from a given oceanographic regime to date, and over 80,000 pyrosequencing tags. These represent all major and many minor protistan taxa, at frequencies globally similar between the two sequence collections. This large data set provided, via the recently developed parametric modeling, the first statistically sound prediction of the total size of protistan richness in a large and varied environment, such as the Cariaco Basin: over 36,000 species, defined as almost full-length 18S rRNA gene sequence clusters sharing over 99% sequence homology. This richness is a small fraction of the grand total of known protists (over 100,000-500,000 species), suggesting a degree of protistan endemism.
Assuntos
Eucariotos/classificação , Eucariotos/isolamento & purificação , Água do Mar/parasitologia , Biodiversidade , Região do Caribe , DNA de Protozoário/genética , Genes de RNAr , Reação em Cadeia da Polimerase , RNA Ribossômico 18S/genética , VenezuelaRESUMO
The rRNA approach is the principal tool to study microbial diversity, but it has important biases. These include polymerase chain reaction (PCR) primers bias, and relative inefficiency of DNA extraction techniques. Such sources of potential undersampling of microbial diversity are well known, but the scale of the undersampling has not been quantified. Using a marine tidal flat bacterial community as a model, we show that even with unlimited sampling and sequencing effort, a single combination of PCR primers/DNA extraction technique enables theoretical recovery of only half of the richness recoverable with three such combinations. This shows that different combinations of PCR primers/DNA extraction techniques recover in principle different species, as well as higher taxa. The majority of earlier estimates of microbial richness seem to be underestimates. The combined use of multiple PCR primer sets, multiple DNA extraction techniques, and deep community sequencing will minimize the biases and recover substantially more species than prior studies, but we caution that even this--yet to be used--approach may still leave an unknown number of species and higher taxa undetected.
Assuntos
Biodiversidade , Primers do DNA/genética , DNA Ribossômico/genética , DNA Ribossômico/isolamento & purificação , Erros de Diagnóstico , Microbiologia Ambiental , Metagenômica/métodos , Reação em Cadeia da Polimerase/métodos , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificaçãoRESUMO
BACKGROUND: The impact of climate on biodiversity is indisputable. Climate changes over geological time must have significantly influenced the evolution of biodiversity, ultimately leading to its present pattern. Here we consider the paleoclimate data record, inferring that present-day hot and cold environments should contain, respectively, the largest and the smallest diversity of ancestral lineages of microbial eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS: We investigate this hypothesis by analyzing an original dataset of 18S rRNA gene sequences from Western Greenland in the Arctic, and data from the existing literature on 18S rRNA gene diversity in hydrothermal vent, temperate sediments, and anoxic water column communities. Unexpectedly, the community from the cold environment emerged as one of the richest observed to date in protistan species, and most diverse in ancestral lineages. CONCLUSIONS/SIGNIFICANCE: This pattern is consistent with natural selection sweeps on aerobic non-psychrophilic microbial eukaryotes repeatedly caused by low temperatures and global anoxia of snowball Earth conditions. It implies that cold refuges persisted through the periods of greenhouse conditions, which agrees with some, although not all, current views on the extent of the past global cooling and warming events. We therefore identify cold environments as promising targets for microbial discovery.
Assuntos
Biodiversidade , Clima , Eucariotos , Microbiologia da Água , Animais , Regiões Árticas , Evolução Biológica , Mudança Climática , Eucariotos/classificação , Eucariotos/genética , Groenlândia , Oceanos e Mares , Filogenia , RNA Ribossômico 18S/classificação , RNA Ribossômico 18S/genética , TemperaturaRESUMO
UNLABELLED: Friend is a bioinformatics application designed for simultaneous analysis and visualization of multiple structures and sequences of proteins and/or DNA/RNA. The application provides basic functionalities, such as structure visualization, with different rendering and coloring, sequence alignment and simple phylogeny analysis, along with a number of extended features to perform more complex analyses of sequence structure relationships, including structural alignment of proteins, investigation of specific interaction motifs, studies of protein-protein and protein-DNA interactions and protein super-families. It is also useful for functional annotation of proteins, protein modeling and protein folding studies. Friend provides three levels of usage: (1) an extensive GUI for a scientist with no programming experience, (2) a command line interface for scripting for a scientist with some programming experience and (3) the ability to extend Friend with user written libraries for an experienced programmer. The application is linked and communicates with local and remote sequence and structure databases. AVAILABILITY: http://mozart.bio.neu.edu/friend.
Assuntos
Biologia Computacional/métodos , Software , Biologia Computacional/instrumentação , Gráficos por Computador , DNA/química , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Bases de Dados de Proteínas , Éxons , Internet , Muramidase/química , Conformação de Ácido Nucleico , Filogenia , Dobramento de Proteína , Proteínas , RNA/química , Alinhamento de Sequência , Análise de Sequência de Proteína , Interface Usuário-ComputadorRESUMO
UNLABELLED: Comparative analysis of exon/intron organization of genes and their resulting protein structures is important for understanding evolutionary relationships between species, rules of protein organization and protein functionality. We present Structural Exon Database (SEDB), with a Web interface, an application that allows users to retrieve the exon/intron organization of genes and map the location of the exon boundaries and the intron phase onto a multiple structural alignment. SEDB is linked with Friend, an integrated analytical multiple sequence/structure viewer, which allows simultaneous visualization of exon boundaries on structure and sequence alignments. With SEDB researchers can study the correlations of gene structure with the properties of the encoded three-dimensional protein structures across eukaryotic organisms. AVAILABILITY: SEDB is publicly available at http://glinka.bio.neu.edu/SEDB/SEDB.html SUPPLEMENTARY INFORMATION: On the SEDB Web site.