Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Plant Dis ; 106(6): 1573-1596, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35538602

RESUMEN

Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. Colletotrichum species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available Colletotrichum species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.


Asunto(s)
Colletotrichum , Colletotrichum/genética , Comercio , ADN , Internacionalidad , Filogenia
2.
Curr Issues Mol Biol ; 43(2): 978-995, 2021 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-34563039

RESUMEN

This paper describes the microbial community composition and genes for key metabolic genes, particularly the nitrogen fixation of the mucous-enveloped gut digesta of green (Lytechinus variegatus) and purple (Strongylocentrotus purpuratus) sea urchins by using the shotgun metagenomics approach. Both green and purple urchins showed high relative abundances of Gammaproteobacteria at 30% and 60%, respectively. However, Alphaproteobacteria in the green urchins had higher relative abundances (20%) than the purple urchins (2%). At the genus level, Vibrio was dominant in both green (~9%) and purple (~10%) urchins, whereas Psychromonas was prevalent only in purple urchins (~24%). An enrichment of Roseobacter and Ruegeria was found in the green urchins, whereas purple urchins revealed a higher abundance of Shewanella, Photobacterium, and Bacteroides (q-value < 0.01). Analysis of key metabolic genes at the KEGG-Level-2 categories revealed genes for amino acids (~20%), nucleotides (~5%), cofactors and vitamins (~6%), energy (~5%), carbohydrates (~13%) metabolisms, and an abundance of genes for assimilatory nitrogen reduction pathway in both urchins. Overall, the results from this study revealed the differences in the microbial community and genes designated for the metabolic processes in the nutrient-rich sea urchin gut digesta, suggesting their likely importance to the host and their environment.


Asunto(s)
Bacterias/genética , Biología Computacional , Microbioma Gastrointestinal/genética , Lytechinus/microbiología , Metagenómica , Strongylocentrotus purpuratus/microbiología , Animales , Bacterias/clasificación , Bacterias/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
3.
BMC Bioinformatics ; 21(1): 412, 2020 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-32957925

RESUMEN

BACKGROUND: It is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of "incremental learning" addresses the need to update an existing classifier to accommodate new data without sacrificing much accuracy compared to retraining the classifier with all data. RESULTS: We demonstrate how classification improves over time by incrementally training a classifier on progressive RefSeq snapshots and testing it on: (a) all known current genomes (as a ground truth set) and (b) a real experimental metagenomic gut sample. We demonstrate that as a classifier model's knowledge of genomes grows, classification accuracy increases. The proof-of-concept naïve Bayes implementation, when updated yearly, now runs in 1/4th of the non-incremental time with no accuracy loss. CONCLUSIONS: It is evident that classification improves by having the most current knowledge at its disposal. Therefore, it is of utmost importance to make classifiers computationally tractable to keep up with the data deluge. The incremental learning classifier can be efficiently updated without the cost of reprocessing nor the access to the existing database and therefore save storage as well as computation resources.


Asunto(s)
Microbioma Gastrointestinal/genética , Genoma Bacteriano , Aprendizaje Automático , Metagenómica/métodos , Algoritmos , Bacterias/genética , Teorema de Bayes , Humanos , Metagenoma , Análisis de Secuencia de ADN/métodos
4.
BMC Genomics ; 21(1): 47, 2020 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-31937263

RESUMEN

BACKGROUND: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for genome-wide RNAi screening have become available in this model. All these techniques depend on a high quality genome assembly and precise gene models. However, the first version of the genome assembly was generated by Sanger sequencing, and with a small set of RNA sequence data limiting annotation quality. RESULTS: Here, we present an improved genome assembly (Tcas5.2) and an enhanced genome annotation resulting in a new official gene set (OGS3) for Tribolium castaneum, which significantly increase the quality of the genomic resources. By adding large-distance jumping library DNA sequencing to join scaffolds and fill small gaps, the gaps in the genome assembly were reduced and the N50 increased to 4753kbp. The precision of the gene models was enhanced by the use of a large body of RNA-Seq reads of different life history stages and tissue types, leading to the discovery of 1452 novel gene sequences. We also added new features such as alternative splicing, well defined UTRs and microRNA target predictions. For quality control, 399 gene models were evaluated by manual inspection. The current gene set was submitted to Genbank and accepted as a RefSeq genome by NCBI. CONCLUSIONS: The new genome assembly (Tcas5.2) and the official gene set (OGS3) provide enhanced genomic resources for genetic work in Tribolium castaneum. The much improved information on transcription start sites supports transgenic and gene editing approaches. Further, novel types of information such as splice variants and microRNA target genes open additional possibilities for analysis.


Asunto(s)
Genes de Insecto , Genoma de los Insectos , Genómica , Tribolium/genética , Animales , Sitios de Unión , Biología Computacional/métodos , Genómica/métodos , MicroARNs/genética , Anotación de Secuencia Molecular , Filogenia , Interferencia de ARN , Reproducibilidad de los Resultados
5.
BMC Genomics ; 20(1): 591, 2019 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-31319791

RESUMEN

BACKGROUND: During the last decade, plant biotechnological laboratories have sparked a monumental revolution with the rapid development of next sequencing technologies at affordable prices. Soon, these sequencing technologies and assembling of whole genomes will extend beyond the plant computational biologists and become commonplace within the plant biology disciplines. The current availability of large-scale genomic resources for non-traditional plant model systems (the so-called 'orphan crops') is enabling the construction of high-density integrated physical and genetic linkage maps with potential applications in plant breeding. The newly available fully sequenced plant genomes represent an incredible opportunity for comparative analyses that may reveal new aspects of genome biology and evolution. The analysis of the expansion and evolution of gene families across species is a common approach to infer biological functions. To date, the extent and role of gene families in plants has only been partially addressed and many gene families remain to be investigated. Manual identification of gene families is highly time-consuming and laborious, requiring an iterative process of manual and computational analysis to identify members of a given family, typically combining numerous BLAST searches and manually cleaning data. Due to the increasing abundance of genome sequences and the agronomical interest in plant gene families, the field needs a clear, automated annotation tool. RESULTS: Here, we present the geneHummus package, an R-based pipeline for the identification and characterization of plant gene families. The impact of this pipeline comes from a reduction in hands-on annotation time combined with high specificity and sensitivity in extracting only proteins from the RefSeq database and providing the conserved domain architectures based on SPARCLE. As a case study we focused on the auxin receptor factors gene (ARF) family in Cicer arietinum (chickpea) and other legumes. CONCLUSION: We anticipate that our pipeline should be suitable for any taxonomic plant family, and likely other gene families, vastly improving the speed and ease of genomic data processing.


Asunto(s)
Fabaceae/genética , Genes de Plantas , Familia de Multigenes , Programas Informáticos , Cicer/genética , Filogenia , Proteínas de Plantas/genética , Receptores de Superficie Celular/genética , Transcriptoma
6.
BMC Genomics ; 20(1): 835, 2019 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-31711414

RESUMEN

BACKGROUND: Tail-anchored membrane proteins (TAMPs) differ from other integral membrane proteins, because they contain a single transmembrane domain at the extreme carboxyl-terminus and are therefore obliged to target to membranes post-translationally. Although 3-5% of all transmembrane proteins are predicted to be TAMPs only a small number are well characterized. RESULTS: To identify novel putative TAMPs across different species, we used TAMPfinder software to identify 859, 657 and 119 putative TAMPs in human (Homo sapiens), plant (Arabidopsis thaliana), and yeast (Saccharomyces cerevisiae), respectively. Bioinformatics analyses of these putative TAMP sequences suggest that the list is highly enriched for authentic TAMPs. To experimentally validate the software predictions several human and plant proteins identified by TAMPfinder that were previously uncharacterized were expressed in cells and visualized at subcellular membranes by fluorescence microscopy and further analyzed by carbonate extraction or by bimolecular fluorescence complementation. With the exception of the pro-apoptotic protein harakiri, which is, peripherally bound to the membrane this subset of novel proteins behave like genuine TAMPs. Comprehensive bioinformatics analysis of the generated TAMP datasets revealed previously unappreciated common and species-specific features such as the unusual size distribution of and the propensity of TAMP proteins to be part of larger complexes. Additionally, novel features of the amino acid sequences that anchor TAMPs to membranes were also revealed. CONCLUSIONS: The findings in this study more than double the number of predicted annotated TAMPs and provide new insights into the common and species-specific features of TAMPs. Furthermore, the list of TAMPs and annotations provide a resource for further investigation.


Asunto(s)
Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Animales , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Línea Celular , Simulación por Computador , Ontología de Genes , Genoma , Humanos , Proteínas de la Membrana/genética , Ratones , Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Programas Informáticos
7.
Am J Med Genet A ; 176(7): 1667-1669, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29740950

RESUMEN

Pathogenic variants in CHD2 (chromodomain helicase DNA-binding protein 2) have been reported in neurodevelopmental disorders with a broad spectrum of phenotypic variability, ranging from mild intellectual disability to atonic-myoclonic epilepsy. However, given the paucity of reported cases the extent of this phenotypic spectrum is currently unknown. Furthermore, all confirmed pathogenic CHD2 variants reported to date have been de novo, preventing the study of intrafamilial phenotypic heterogeneity and creating ambiguity regarding recurrence risk, penetrance, and expressivity. Here, we report the first known case of an inherited pathogenic CHD2 variant in affected mother and daughter. This case demonstrates intrafamilial phenotypic heterogeneity and confirms potential heritability of CHD2-related neurodevelopmental disorders.


Asunto(s)
Proteínas de Unión al ADN/genética , Mutación , Trastornos del Neurodesarrollo/genética , Trastornos del Neurodesarrollo/patología , Adulto , Preescolar , Electroencefalografía , Humanos , Persona de Mediana Edad , Madres , Núcleo Familiar , Fenotipo , Adulto Joven
8.
Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29792589

RESUMEN

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Nucleótidos/genética , Filogenia , Composición de Base , Células Procariotas , Análisis de Secuencia de ADN
9.
J Gen Virol ; 98(10): 2596-2606, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28884679

RESUMEN

Integration is an important feature of retroviruses and retrovirus-based therapeutic transfection vectors. The non-primate lentivirus equine infectious anaemia virus (EIAV) primarily targets macrophages/monocytes in vivo. Investigation of the integration features of EIAVDLV121 strains, which are adapted to donkey monocyte-derived macrophages (MDMs), is of great interest. In this study, we analysed the integration features of EIAVDLV121 in equine MDMs during in vitro infection. Our previously published integration sites (IS) for EIAVFDDV13 in fetal equine dermal (FED) cells were also analysed in parallel as references. Sequencing of the host genomic regions flanking the viral IS showed that reference sequence (RefSeq) genes were preferentially targeted for integration by EIAVDLV121. Introns, AT-rich regions, long interspersed nuclear elements (LINEs) and DNA transposons were also predominantly biased toward viral insertion, which is consistent with EIAVFDDV13 integration into the horse genome in FED cells. In addition, the most significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, specifically gag junctions for EIAVDLV121 and tight junctions for EIAVFDDV13, are regulators of metabolic function, which is consistent with the common bioprocesses, specifically cell cycle and chromosome/DNA organization, identified by gene ontology (GO) analysis. Our results demonstrate that EIAV integration occurs in regions that harbour structural and topological features of local chromatin in both macrophages and fibroblasts. Our data on EIAV will facilitate further understanding of lentivirus infection and the development of safer and more effective gene therapy vectors.

10.
Genome Biol ; 25(1): 60, 2024 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-38409096

RESUMEN

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma , Programas Informáticos
11.
Heliyon ; 9(1): e12895, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36643900

RESUMEN

The present research aimed to evaluate the diversity of all monkeypox virus strains with a special focus on recently isolated ones by a comprehensive phylogenetic analysis of all available sequences, based on the concatenate of four viral genes. Almost all current strains from 2022 showed a high level of similarity to each other on the analyzed stretches: 218 strains shared identical sequence. Among all analyzed strains, the highest number of differences was counted compared to a RefSeq strain (Zaire-96-I-16) on the whole concatenate. Our analysis supported the distinction between Clade I (formerly Congo Basin clade), IIa and IIb (together formerly West African clade) strains and classified all 2022 strains in the last one. The high number of differences and long branch observable concerning strain Zaire-96-I-16 is most probably caused by a sequencing error. As this strain represents one of the two available reference sequences in GenBank, it is recommendable to confirm or exclude the concerning mutation. The developed method, based on four gene sequences, reflected the established whole-genome-based intraspecies classification. Although this method provides significantly less information about the strains compared to whole genome analyses, since its resolution is much lower, it still enables the rapid subspecies classification of the strains into the established clades. The genes in the analyzed concatenate are so conserved that further differentiation of contemporary strains is impossible; these strains are identical in the analyzed sections. On the other hand, since whole genome analyses are compute-intensive, the described method offers a simpler and more accessible alternative for monitoring and preliminary typing of newly sequenced monkeypox virus strains.

12.
bioRxiv ; 2023 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-37292984

RESUMEN

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination (0.16% of total bases), with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/.

13.
BMC Res Notes ; 16(1): 220, 2023 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-37710312

RESUMEN

OBJECTIVE: The 1,000 wheat exome project captured the single nucleotide variants in the coding regions of a diverse set of 890 wheat accessions to analyse the contribution of introgression to adaptation of wheat. However, this highly useful single nucleotide polymorphism (SNP) dataset is based on RefSeq v1.0 of the International Wheat Genome Sequencing Consortium (IWGSC) assembly of the bread wheat genome of Chinese Spring. This reference sequence has recently been updated using optical maps and long-read sequencing to produce the improved RefSeq v2.1. Our objective was to develop a reliable high-density SNP dataset positioned onto RefSeq v2.1 because it is the current standard reference sequence used by wheat researchers. RESULTS: The 3,039,822 SNPs originally positioned on RefSeq v1.0 were projected to v2.1 using Liftoff with four different flanking regions, and 2,946,536 SNPs were consistently lifted to the same location irrespective of the flanking region lengths. Of these, 2,799,166 were located on the '+' ve strand. The distribution of the SNPs across the 21 chromosomes on RefSeq v2.1 was similar to that of RefSeq v1.0. Among the SNPs that were based on unanchored scaffolds in RefSeq v1.0, 11,938 were projected to one of the 21 pseudomolecules in the upgraded assembly. This SNP dataset constitutes a much-needed standardized resource for the wheat research community.


Asunto(s)
Exoma , Triticum , Mapeo Cromosómico , Polimorfismo de Nucleótido Simple , Triticum/genética
14.
Front Plant Sci ; 13: 903819, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35845653

RESUMEN

Accelerating breeding efforts for developing biofortified bread wheat varieties necessitates understanding the genetic control of grain zinc concentration (GZnC) and grain iron concentration (GFeC). Hence, the major objective of this study was to perform genome-wide association mapping to identify consistently significant genotyping-by-sequencing markers associated with GZnC and GFeC using a large panel of 5,585 breeding lines from the International Maize and Wheat Improvement Center. These lines were grown between 2018 and 2021 in an optimally irrigated environment at Obregon, Mexico, while some of them were also grown in a water-limiting drought-stressed environment and a space-limiting small plot environment and evaluated for GZnC and GFeC. The lines showed a large and continuous variation for GZnC ranging from 27 to 74.5 ppm and GFeC ranging from 27 to 53.4 ppm. We performed 742,113 marker-traits association tests in 73 datasets and identified 141 markers consistently associated with GZnC and GFeC in three or more datasets, which were located on all wheat chromosomes except 3A and 7D. Among them, 29 markers were associated with both GZnC and GFeC, indicating a shared genetic basis for these micronutrients and the possibility of simultaneously improving both. In addition, several significant GZnC and GFeC associated markers were common across the irrigated, water-limiting drought-stressed, and space-limiting small plots environments, thereby indicating the feasibility of indirect selection for these micronutrients in either of these environments. Moreover, the many significant markers identified had minor effects on GZnC and GFeC, suggesting a quantitative genetic control of these traits. Our findings provide important insights into the complex genetic basis of GZnC and GFeC in bread wheat while implying limited prospects for marker-assisted selection and the need for using genomic selection.

15.
Front Microbiol ; 12: 755101, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34745061

RESUMEN

Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.

16.
PeerJ ; 9: e11348, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33996287

RESUMEN

TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].

17.
Genome Biol ; 21(1): 115, 2020 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-32398145

RESUMEN

Genomic analyses are sensitive to contamination in public databases caused by incorrectly labeled reference sequences. Here, we describe Conterminator, an efficient method to detect and remove incorrectly labeled sequences by an exhaustive all-against-all sequence comparison. Our analysis reports contamination of 2,161,746, 114,035, and 14,148 sequences in the RefSeq, GenBank, and NR databases, respectively, spanning the whole range from draft to "complete" model organism genomes. Our method scales linearly with input size and can process 3.3 TB in 12 days on a 32-core computer. Conterminator can help ensure the quality of reference databases. Source code (GPLv3): https://github.com/martin-steinegger/conterminator.


Asunto(s)
Contaminación de ADN , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Animales , Genoma , Humanos , Ratones
18.
Front Cell Infect Microbiol ; 10: 527102, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33194784

RESUMEN

Whole genome sequencing has become a powerful tool in modern microbiology. Especially bacterial genomes are sequenced in high numbers. Whole genome sequencing is not only used in research projects, but also in surveillance projects and outbreak investigations. Many whole genome analysis workflows begins with the production of a genome assembly. To accomplish this, a number of different sequencing technologies and assembly methods are available. Here, a summarization is provided over the most frequently used sequence technology and genome assembly approaches reported for the bacterial RefSeq genomes and for the bacterial genomes submitted as belonging to a surveillance project. The data is presented both in total and broken up on a per year basis. Information associated with over 400,000 publically available genomes dated April 2020 and prior were used. The information summarized include (i) the most frequently used sequencing technologies, (ii) the most common combinations of sequencing technologies, (iii) the most reported sequencing depth, and (iv) the most frequently used assembly software solutions. In all, this mini review provides an overview of the currently most common workflows for producing bacterial whole genome sequence assemblies.


Asunto(s)
Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Tecnología , Secuenciación Completa del Genoma
19.
mSphere ; 5(6)2020 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-33148820

RESUMEN

Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy identifiers. To prevent this, urgent efforts should be undertaken by submitters of metagenomic data sets as well as by database managers.


Asunto(s)
Bases de Datos Genéticas/normas , Metagenoma , Proteínas/genética , Algoritmos , Bases de Datos Genéticas/estadística & datos numéricos , Metagenómica/métodos , Metagenómica/normas
20.
Front Microbiol ; 11: 1701, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32849358

RESUMEN

Mycobacterium avium comprises four subspecies that contain both human and veterinary pathogens. At the inception of this study, twenty-eight M. avium genomes had been annotated as RefSeq genomes, facilitating direct comparisons. These genomes represent strains from around the world and provided a unique opportunity to examine genome dynamics in this species. Each genome was confirmed to be classified correctly based on SNP genotyping, nucleotide identity and presence/absence of repetitive elements or other typing methods. The Mycobacterium avium subspecies paratuberculosis (Map) genome size and organization was remarkably consistent, averaging 4.8 Mb with a variance of only 29.6 kb among the 13 strains. Comparing recombination events along with the larger genome size and variance observed among Mycobacterium avium subspecies avium (Maa) and Mycobacterium avium subspecies hominissuis (Mah) strains (collectively termed non-Map) suggests horizontal gene transfer occurs in non-Map, but not in Map strains. Overall, M. avium subspecies could be divided into two major sub-divisions, with the Map type II (bovine strains) clustering tightly on one end of a phylogenetic spectrum and Mah strains clustering more loosely together on the other end. The most evolutionarily distinct Map strain was an ovine strain, designated Telford, which had >1,000 SNPs and showed large rearrangements compared to the bovine type II strains. The Telford strain clustered with Maa strains as an intermediate between Map type II and Mah. SNP analysis and genome organization analyses repeatedly demonstrated the conserved nature of Map versus the mosaic nature of non-Map M. avium strains. Finally, core and pangenomes were developed for Map and non-Map strains. A total of 80% Map genes belonged to the Map core genome, while only 40% of non-Map genes belonged to the non-Map core genome. These genomes provide a more complete and detailed comparison of these subspecies strains as well as a blueprint for how genetic diversity originated.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA