Pesquisa | Portal Regional da BVS

1.

Few-Shot Learning Enables Population-Scale Analysis of Leaf Traits in Populus trichocarpa.

Lagergren, John; Pavicic, Mirko; Chhetri, Hari B; York, Larry M; Hyatt, Doug; Kainer, David; Rutter, Erica M; Flores, Kevin; Bailey-Bale, Jack; Klein, Marie; Taylor, Gail; Jacobson, Daniel; Streich, Jared.

Plant Phenomics ; 5: 0072, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37519935

RESUMO

Plant phenotyping is typically a time-consuming and expensive endeavor, requiring large groups of researchers to meticulously measure biologically relevant plant traits, and is the main bottleneck in understanding plant adaptation and the genetic architecture underlying complex traits at population scale. In this work, we address these challenges by leveraging few-shot learning with convolutional neural networks to segment the leaf body and visible venation of 2,906 Populus trichocarpa leaf images obtained in the field. In contrast to previous methods, our approach (a) does not require experimental or image preprocessing, (b) uses the raw RGB images at full resolution, and (c) requires very few samples for training (e.g., just 8 images for vein segmentation). Traits relating to leaf morphology and vein topology are extracted from the resulting segmentations using traditional open-source image-processing tools, validated using real-world physical measurements, and used to conduct a genome-wide association study to identify genes controlling the traits. In this way, the current work is designed to provide the plant phenotyping community with (a) methods for fast and accurate image-based feature extraction that require minimal training data and (b) a new population-scale dataset, including 68 different leaf phenotypes, for domain scientists and machine learning researchers. All of the few-shot learning code, data, and results are made publicly available.

2.

Rex in Caldicellulosiruptor bescii: Novel regulon members and its effect on the production of ethanol and overflow metabolites.

Sander, Kyle; Chung, Daehwan; Hyatt, Doug; Westpheling, Janet; Klingeman, Dawn M; Rodriguez, Miguel; Engle, Nancy L; Tschaplinski, Timothy J; Davison, Brian H; Brown, Steven D.

Microbiologyopen ; 8(2): e00639, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-29797457

RESUMO

Rex is a global redox-sensing transcription factor that senses and responds to the intracellular [NADH]/[NAD+ ] ratio to regulate genes for central metabolism, and a variety of metabolic processes in Gram-positive bacteria. We decipher and validate four new members of the Rex regulon in Caldicellulosiruptor bescii; a gene encoding a class V aminotransferase, the HydG FeFe Hydrogenase maturation protein, an oxidoreductase, and a gene encoding a hypothetical protein. Structural genes for the NiFe and FeFe hydrogenases, pyruvate:ferredoxin oxidoreductase, as well as the rex gene itself are also members of this regulon, as has been predicted previously in different organisms. A C. bescii rex deletion strain constructed in an ethanol-producing strain made 54% more ethanol (0.16 mmol/L) than its genetic parent after 36 hr of fermentation, though only under nitrogen limited conditions. Metabolomic interrogation shows this rex-deficient ethanol-producing strain synthesizes other reduced overflow metabolism products likely in response to more reduced intracellular redox conditions and the accumulation of pyruvate. These results suggest ethanol production is strongly dependent on the native intracellular redox state in C. bescii, and highlight the combined promise of using this gene and manipulation of culture conditions to yield strains capable of producing ethanol at higher yields and final titer.

Assuntos

Etanol/metabolismo , Firmicutes/genética , Redes e Vias Metabólicas/genética , Regulon , Fatores de Transcrição/metabolismo , Metaboloma , Oxirredução

3.

Isolation and Whole-Genome Sequencing of Environmental Campylobacter.

Kelley, Brittni R; Ellis, J Christopher; Hyatt, Doug; Jacobson, Dan; Johnson, Jeremiah.

Curr Protoc Microbiol ; 51(1): e64, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30369079

RESUMO

As a leading cause of bacterial-derived gastroenteritis worldwide, Campylobacter has a significant impact on human health. In the developed world, most campylobacteriosis cases are attributed to the consumption of undercooked, contaminated poultry; however, it has been shown that Campylobacter can be transmitted to humans through contaminated water and other types of food, including beef and milk. As such, high-resolution microbial source-tracking is essential for health department officials to determine the source(s) of Campylobacter outbreaks. For these reasons, this protocol provides the techniques needed for isolation of Campylobacter from agricultural and environmental sources, as well as human clinical specimens. Additionally, we describe a simple method for preparing high-quality genomic DNA that can be used for whole-genome sequencing and downstream bioinformatics analyses of Campylobacter genotypes. © 2018 by John Wiley & Sons, Inc.

Assuntos

Infecções por Campylobacter/microbiologia , Campylobacter/genética , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , Microbiologia Ambiental , Microbiologia de Alimentos , Sequenciamento Completo do Genoma/métodos , Biologia Computacional , DNA Bacteriano/química , Humanos , Epidemiologia Molecular/métodos , Análise de Sequência de DNA

4.

Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam.

Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk; Hyatt, Doug; Pan, Chongle.

BMC Evol Biol ; 14: 207, 2014 Oct 09.

Artigo em Inglês | MEDLINE | ID: mdl-25293379

RESUMO

BACKGROUND: Phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. RESULTS: A total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. CONCLUSIONS: Our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.

Assuntos

Archaea/genética , Bactérias/genética , Anotação de Sequência Molecular/métodos , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Filogenia

5.

Quality scores for 32,000 genomes.

Land, Miriam L; Hyatt, Doug; Jun, Se-Ran; Kora, Guruprasad H; Hauser, Loren J; Lukjancenko, Oksana; Ussery, David W.

Stand Genomic Sci ; 9: 20, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25780509

RESUMO

BACKGROUND: More than 80% of the microbial genomes in GenBank are of 'draft' quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. RESULTS: Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. CONCLUSIONS: The score can be used to set thresholds for screening data when analyzing "all published genomes" and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an 'A' (codons ending with a 'U') are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.

6.

Gene and translation initiation site prediction in metagenomic sequences.

Hyatt, Doug; LoCascio, Philip F; Hauser, Loren J; Uberbacher, Edward C.

Bioinformatics ; 28(17): 2223-30, 2012 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-22796954

RESUMO

MOTIVATION: Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. RESULTS: We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements. AVAILABILITY: The Prodigal software is freely available under the General Public License from http://code.google.com/p/prodigal/.

Assuntos

Metagenômica/métodos , Modelos Genéticos , Iniciação Traducional da Cadeia Peptídica , Software , Algoritmos , Sequência de Bases , Simulação por Computador , Mycoplasma/genética , Fases de Leitura Aberta , Análise de Sequência de DNA/métodos

7.

Exhaustive database searching for amino acid mutations in proteomes.

Hyatt, Doug; Pan, Chongle.

Bioinformatics ; 28(14): 1895-901, 2012 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-22581177

RESUMO

MOTIVATION: Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. RESULTS: The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage. AVAILABILITY: The Sipros algorithm is freely available at\newline http://code.google.com/p/sipros.

Assuntos

Algoritmos , Aminoácidos , Mutação , Proteômica/métodos , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Consórcios Microbianos/genética , Mineração , Proteínas/química , Proteoma/análise , Rodopseudomonas/genética , Espectrometria de Massas em Tandem

8.

BESC knowledgebase public portal.

Syed, Mustafa H; Karpinets, Tatiana V; Parang, Morey; Leuze, Michael R; Park, Byung H; Hyatt, Doug; Brown, Steven D; Moulton, Steve; Galloway, Michael D; Uberbacher, Edward C.

Bioinformatics ; 28(5): 750-1, 2012 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-22238270

RESUMO

UNLABELLED: The BioEnergy Science Center (BESC) is undertaking large experimental campaigns to understand the biosynthesis and biodegradation of biomass and to develop biofuel solutions. BESC is generating large volumes of diverse data, including genome sequences, omics data and assay results. The purpose of the BESC Knowledgebase is to serve as a centralized repository for experimentally generated data and to provide an integrated, interactive and user-friendly analysis framework. The Portal makes available tools for visualization, integration and analysis of data either produced by BESC or obtained from external resources. AVAILABILITY: http://besckb.ornl.gov.

Assuntos

Biocombustíveis , Bases de Conhecimento , Bactérias/metabolismo , Eucariotos/metabolismo , Genômica , Plantas/metabolismo

9.

Quantitative tracking of isotope flows in proteomes of microbial communities.

Pan, Chongle; Fischer, Curt R; Hyatt, Doug; Bowen, Benjamin P; Hettich, Robert L; Banfield, Jillian F.

Mol Cell Proteomics ; 10(4): M110.006049, 2011 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21285414

RESUMO

Stable isotope probing (SIP) has been used to track nutrient flows in microbial communities, but existing protein-based SIP methods capable of quantifying the degree of label incorporation into peptides and proteins have been demonstrated only by targeting usually less than 100 proteins per sample. Our method automatically (i) identifies the sequence of and (ii) quantifies the degree of heavy atom enrichment for thousands of proteins from microbial community proteome samples. These features make our method suitable for comparing isotopic differences between closely related protein sequences, and for detecting labeling patterns in low-abundance proteins or proteins derived from rare community members. The proteomic SIP method was validated using proteome samples of known stable isotope incorporation levels at 0.4%, â¼50%, and â¼98%. The method was then used to monitor incorporation of (15)N into established and regrowing microbial biofilms. The results indicate organism-specific migration patterns from established communities into regrowing communities and provide insights into metabolism during biofilm formation. The proteomic SIP method can be extended to many systems to track fluxes of (13)C or (15)N in microbial communities.

Assuntos

Sulfato de Amônio/metabolismo , Biofilmes/crescimento & desenvolvimento , Consórcios Microbianos , Proteoma/metabolismo , Actinobacteria/metabolismo , Proteínas de Bactérias/metabolismo , Bacteriófagos/metabolismo , Marcação por Isótopo , Leptospira/metabolismo , Leptospira/virologia , Redes e Vias Metabólicas , Isótopos de Nitrogênio , Espectrometria de Massas em Tandem , Thermoplasmales/metabolismo , Proteínas Virais/metabolismo

10.

Enigmatic, ultrasmall, uncultivated Archaea.

Baker, Brett J; Comolli, Luis R; Dick, Gregory J; Hauser, Loren J; Hyatt, Doug; Dill, Brian D; Land, Miriam L; Verberkmoes, Nathan C; Hettich, Robert L; Banfield, Jillian F.

Proc Natl Acad Sci U S A ; 107(19): 8806-11, 2010 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-20421484

RESUMO

Metagenomics has provided access to genomes of as yet uncultivated microorganisms in natural environments, yet there are gaps in our knowledge-particularly for Archaea-that occur at relatively low abundance and in extreme environments. Ultrasmall cells (<500 nm in diameter) from lineages without cultivated representatives that branch near the crenarchaeal/euryarchaeal divide have been detected in a variety of acidic ecosystems. We reconstructed composite, near-complete approximately 1-Mb genomes for three lineages, referred to as ARMAN (archaeal Richmond Mine acidophilic nanoorganisms), from environmental samples and a biofilm filtrate. Genes of two lineages are among the smallest yet described, enabling a 10% higher coding density than found genomes of the same size, and there are noncontiguous genes. No biological function could be inferred for up to 45% of genes and no more than 63% of the predicted proteins could be assigned to a revised set of archaeal clusters of orthologous groups. Some core metabolic genes are more common in Crenarchaeota than Euryarchaeota, up to 21% of genes have the highest sequence identity to bacterial genes, and 12 belong to clusters of orthologous groups that were previously exclusive to bacteria. A small subset of 3D cryo-electron tomographic reconstructions clearly show penetration of the ARMAN cell wall and cytoplasmic membranes by protuberances extended from cells of the archaeal order Thermoplasmatales. Interspecies interactions, the presence of a unique internal tubular organelle [Comolli, et al. (2009) ISME J 3:159-167], and many genes previously only affiliated with Crenarchaea or Bacteria indicate extensive unique physiology in organisms that branched close to the time that Cren- and Euryarchaeotal lineages diverged.

Assuntos

Archaea/citologia , Archaea/genética , Archaea/metabolismo , Archaea/ultraestrutura , Proteínas Arqueais/classificação , Proteínas Arqueais/genética , Biofilmes , Ciclo Celular , Replicação do DNA , Genoma Arqueal/genética , Genoma Bacteriano/genética , Dados de Sequência Molecular , Biossíntese de Proteínas , Proteômica , Especificidade da Espécie , Transcrição Gênica

11.

Prodigal: prokaryotic gene recognition and translation initiation site identification.

Hyatt, Doug; Chen, Gwo-Liang; Locascio, Philip F; Land, Miriam L; Larimer, Frank W; Hauser, Loren J.

BMC Bioinformatics ; 11: 119, 2010 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-20211023

RESUMO

BACKGROUND: The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. RESULTS: With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. CONCLUSION: We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.

Assuntos

Iniciação Traducional da Cadeia Peptídica/genética , Software , Algoritmos , Bases de Dados Genéticas , Genoma Bacteriano , Células Procarióticas

12.

GrailEXP and Genome Analysis Pipeline for genome annotation.

Uberbacher, Edward C; Hyatt, Doug; Shah, Manesh.

Curr Protoc Hum Genet ; Chapter 6: Unit 6.5, 2004 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-18428363

RESUMO

The Gene Recognition and Analysis Internet Link (GRAIL) is one of the most widely used systems for evaluating the protein-coding potential of anonymous DNA sequences. This unit describes the use of the XGRAIL and genQuest client-server applications to locate exons in DNA sequences, to develop gene models, and to search databases for homologs. A support protocol describes how to obtain the GRAIL and genQuest client software by anonymous FTP.

Assuntos

DNA/genética , Bases de Dados Genéticas , Genoma , Éxons , Internet , Análise de Sequência de DNA , Interface Usuário-Computador

13.

GrailEXP and Genome Analysis Pipeline for genome annotation.

Uberbacher, Edward C; Hyatt, Doug; Shah, Manesh.

Curr Protoc Bioinformatics ; Chapter 4: Unit4.9, 2004 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-18428726

RESUMO

The Basic Protocol describes the use of GrailEXP, the latest version of the gene finding system from Oak Ridge National Laboratory. GrailEXP provides gene models, by making use of sequence similarity with Expressed Sequence Tags (ESTs) and known genes. GrailEXP also provides alternatively spliced constructs for each gene based on the available EST evidence. The Support Protocol describes the use of the Genome Analysis Pipeline, a web application which allows users to perform comprehensive sequence analysis by offering a selection from a wide choice of supported gene finders, other biological feature finders, and database searches.

Assuntos

Mapeamento Cromossômico/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Dados de Sequência Molecular

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA