Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
PLoS Comput Biol ; 16(11): e1007845, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33137102

RESUMO

For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an "other" category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as "other," providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.


Assuntos
Bacteriófagos/metabolismo , Bases de Dados de Proteínas , Internet , Proteínas Estruturais Virais/classificação , Redes Neurais de Computação , Reprodutibilidade dos Testes , Proteínas Estruturais Virais/genética
2.
Proc Natl Acad Sci U S A ; 112(45): 14024-9, 2015 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-26512100

RESUMO

Observations from human microbiome studies are often conflicting or inconclusive. Many factors likely contribute to these issues including small cohort sizes, sample collection, and handling and processing differences. The field of microbiome research is moving from 16S rDNA gene sequencing to a more comprehensive genomic and functional representation through whole-genome sequencing (WGS) of complete communities. Here we performed quantitative and qualitative analyses comparing WGS metagenomic data from human stool specimens using the Illumina Nextera XT and Illumina TruSeq DNA PCR-free kits, and the KAPA Biosystems Hyper Prep PCR and PCR-free systems. Significant differences in taxonomy are observed among the four different next-generation sequencing library preparations using a DNA mock community and a cell control of known concentration. We also revealed biases in error profiles, duplication rates, and loss of reads representing organisms that have a high %G+C content that can significantly impact results. As with all methods, the use of benchmarking controls has revealed critical differences among methods that impact sequencing results and later would impact study interpretation. We recommend that the community adopt PCR-free-based approaches to reduce PCR bias that affects calculations of abundance and to improve assemblies for accurate taxonomic assignment. Furthermore, the inclusion of a known-input cell spike-in control provides accurate quantitation of organisms in clinical samples.


Assuntos
Biblioteca Gênica , Genoma Bacteriano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Microbiota/genética , Análise de Variância , Composição de Bases , Sequência de Bases , Fezes/química , Humanos , Metagenômica/tendências , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Análise de Sequência de DNA , Especificidade da Espécie
3.
BMC Genomics ; 18(1): 296, 2017 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-28407798

RESUMO

BACKGROUND: Metagenomics is the study of the microbial genomes isolated from communities found on our bodies or in our environment. By correctly determining the relation between human health and the human associated microbial communities, novel mechanisms of health and disease can be found, thus enabling the development of novel diagnostics and therapeutics. Due to the diversity of the microbial communities, strategies developed for aligning human genomes cannot be utilized, and genomes of the microbial species in the community must be assembled de novo. However, in order to obtain the best metagenomic assemblies, it is important to choose the proper assembler. Due to the rapidly evolving nature of metagenomics, new assemblers are constantly created, and the field has not yet agreed on a standardized process. Furthermore, the truth sets used to compare these methods are either too simple (computationally derived diverse communities) or complex (microbial communities of unknown composition), yielding results that are hard to interpret. In this analysis, we interrogate the strengths and weaknesses of five popular assemblers through the use of defined biological samples of known genomic composition and abundance. We assessed the performance of each assembler on their ability to reassemble genomes, call taxonomic abundances, and recreate open reading frames (ORFs). RESULTS: We tested five metagenomic assemblers: Omega, metaSPAdes, IDBA-UD, metaVelvet and MEGAHIT on known and synthetic metagenomic data sets. MetaSPAdes excelled in diverse sets, IDBA-UD performed well all around, metaVelvet had high accuracy in high abundance organisms, and MEGAHIT was able to accurately differentiate similar organisms within a community. At the ORF level, metaSPAdes and MEGAHIT had the least number of missing ORFs within diverse and similar communities respectively. CONCLUSIONS: Depending on the metagenomics question asked, the correct assembler for the task at hand will differ. It is important to choose the appropriate assembler, and thus clearly define the biological problem of an experiment, as different assemblers will give different answers to the same question.


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Metagenômica/métodos , Confiabilidade dos Dados , Genoma Bacteriano , Humanos , Fases de Leitura Aberta , Software
4.
Proteins ; 84 Suppl 1: 34-50, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26473983

RESUMO

The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Bactérias/química , Biologia Computacional/métodos , Gráficos por Computador , Cristalografia por Raios X , Bases de Dados de Proteínas , Humanos , Cooperação Internacional , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Multimerização Proteica , Estrutura Secundária de Proteína , Homologia de Sequência de Aminoácidos , Vírus/química
5.
Proteins ; 82 Suppl 2: 26-42, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24318984

RESUMO

For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Sequência de Aminoácidos , Modelos Moleculares , Dados de Sequência Molecular , Proteínas/genética , Alinhamento de Sequência
6.
PLoS Comput Biol ; 8(8): e1002657, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22927809

RESUMO

Phages play critical roles in the survival and pathogenicity of their hosts, via lysogenic conversion factors, and in nutrient redistribution, via cell lysis. Analyses of phage- and viral-encoded genes in environmental samples provide insights into the physiological impact of viruses on microbial communities and human health. However, phage ORFs are extremely diverse of which over 70% of them are dissimilar to any genes with annotated functions in GenBank. Better identification of viruses would also aid in better detection and diagnosis of disease, in vaccine development, and generally in better understanding the physiological potential of any environment. In contrast to enzymes, viral structural protein function can be much more challenging to detect from sequence data because of low sequence conservation, few known conserved catalytic sites or sequence domains, and relatively limited experimental data. We have designed a method of predicting phage structural protein sequences that uses Artificial Neural Networks (ANNs). First, we trained ANNs to classify viral structural proteins using amino acid frequency; these correctly classify a large fraction of test cases with a high degree of specificity and sensitivity. Subsequently, we added estimates of protein isoelectric points as a feature to ANNs that classify specialized families of proteins, namely major capsid and tail proteins. As expected, these more specialized ANNs are more accurate than the structural ANNs. To experimentally validate the ANN predictions, several ORFs with no significant similarities to known sequences that are ANN-predicted structural proteins were examined by transmission electron microscopy. Some of these self-assembled into structures strongly resembling virion structures. Thus, our ANNs are new tools for identifying phage and potential prophage structural proteins that are difficult or impossible to detect by other bioinformatic analysis. The networks will be valuable when sequence is available but in vitro propagation of the phage may not be practical or possible.


Assuntos
Bacteriófagos/fisiologia , Redes Neurais de Computação , Proteínas Virais/química , Bacteriófagos/genética , Genes Virais , Fases de Leitura Aberta
7.
Cell Metab ; 25(5): 1054-1062.e5, 2017 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-28467925

RESUMO

The presence of advanced fibrosis in nonalcoholic fatty liver disease (NAFLD) is the most important predictor of liver mortality. There are limited data on the diagnostic accuracy of gut microbiota-derived signature for predicting the presence of advanced fibrosis. In this prospective study, we characterized the gut microbiome compositions using whole-genome shotgun sequencing of DNA extracted from stool samples. This study included 86 uniquely well-characterized patients with biopsy-proven NAFLD, of which 72 had mild/moderate (stage 0-2 fibrosis) NAFLD, and 14 had advanced fibrosis (stage 3 or 4 fibrosis). We identified a set of 40 features (p < 0.006), which included 37 bacterial species that were used to construct a Random Forest classifier model to distinguish mild/moderate NAFLD from advanced fibrosis. The model had a robust diagnostic accuracy (AUC 0.936) for detecting advanced fibrosis. This study provides preliminary evidence for a fecal-microbiome-derived metagenomic signature to detect advanced fibrosis in NAFLD.


Assuntos
Bactérias/isolamento & purificação , Microbioma Gastrointestinal , Cirrose Hepática/microbiologia , Hepatopatia Gordurosa não Alcoólica/microbiologia , Adulto , Idoso , Bactérias/genética , Fezes/microbiologia , Feminino , Humanos , Cirrose Hepática/diagnóstico , Masculino , Metagenômica/métodos , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Prognóstico , Estudos Prospectivos
8.
Sci Rep ; 6: 31731, 2016 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-27558918

RESUMO

As reports on possible associations between microbes and the host increase in number, more meaningful interpretations of this information require an ability to compare data sets across studies. This is dependent upon standardization of workflows to ensure comparability both within and between studies. Here we propose the standard use of an alternate collection and stabilization method that would facilitate such comparisons. The DNA Genotek OMNIgene∙Gut Stool Microbiome Kit was compared to the currently accepted community standard of freezing to store human stool samples prior to whole genome sequencing (WGS) for microbiome studies. This stabilization and collection device allows for ambient temperature storage, automation, and ease of shipping/transfer of samples. The device permitted the same data reproducibility as with frozen samples, and yielded higher recovery of nucleic acids. Collection and stabilization of stool microbiome samples with the DNA Genotek collection device, combined with our extraction and WGS, provides a robust, reproducible workflow that enables standardized global collection, storage, and analysis of stool for microbiome studies.


Assuntos
Microbiota , Manejo de Espécimes/métodos , Temperatura , Algoritmos , Estudos de Coortes , DNA Bacteriano/análise , DNA Bacteriano/química , Fezes , Congelamento , Humanos , Modelos Lineares , Modelos Estatísticos , Ácidos Nucleicos/química , Fases de Leitura Aberta , Controle de Qualidade , Reprodutibilidade dos Testes , Sequenciamento Completo do Genoma
9.
Nat Commun ; 5: 4498, 2014 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-25058116

RESUMO

Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its ~97 kbp genome is six times more abundant in publicly available metagenomes than all other known phages together; it comprises up to 90% and 22% of all reads in virus-like particle (VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all human faecal metagenomic sequencing reads in the public databases. The majority of crAssphage-encoded proteins match no known sequences in the database, which is why it was not detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host for this phage, consistent with Bacteroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.


Assuntos
Bacteriófagos/isolamento & purificação , Fezes/virologia , Metagenoma , Bacteriófagos/genética , Bacteroides/virologia , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Fezes/microbiologia , Feminino , Humanos , Dados de Sequência Molecular , Proteínas Virais/genética
11.
J Bacteriol ; 185(21): 6434-47, 2003 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-14563879

RESUMO

Two bacteriophages of an environmental isolate of Vibrio parahaemolyticus were isolated and sequenced. The VP16T and VP16C phages were separated from a mixed lysate based on plaque morphology and exhibit 73 to 88% sequence identity over about 80% of their genomes. Only about 25% of their predicted open reading frames are similar to genes with known functions in the GenBank database. Both phages have cos sites and open reading frames encoding proteins closely related to coliphage lambda's terminase protein (the large subunit). Like in coliphage lambda and other siphophages, a large operon in each phage appears to encode proteins involved in DNA packaging and capsid assembly and presumably in host lysis; we refer to this as the structural operon. In addition, both phages have open reading frames closely related to genes encoding DNA polymerase and helicase proteins. Both phages also encode several putative transcription regulators, an apparent polypeptide deformylase, and a protein related to a virulence-associated protein, VapE, of Dichelobacter nodosus. Despite the similarity of the proteins and genome organization, each of the phages also encodes a few proteins not encoded by the other. We did not identify genes closely related to genes encoding integrase proteins belonging to either the tyrosine or serine recombinase family, and we have no evidence so far that these phages can lysogenize the V. parahaemolyticus strain 16 host. Surprisingly for active lytic viruses, the two phages have a codon usage that is very different than that of the host, suggesting the possibility that they may be relative newcomers to growth in V. parahaemolyticus. The DNA sequences should allow us to characterize the lifestyles of VP16T and VP16C and the interactions between these phages and their host at the molecular level, as well as their relationships to other marine and nonmarine phages.


Assuntos
Bacteriófagos/genética , Genoma Viral , Vibrio parahaemolyticus/virologia , Bacteriófagos/isolamento & purificação , Bacteriófagos/ultraestrutura , Composição de Bases , Dados de Sequência Molecular , Fases de Leitura Aberta , Homologia de Sequência , Microbiologia da Água
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA