Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Plant J ; 2024 Jun 14.
Artículo en Inglés | MEDLINE | ID: mdl-38872506

RESUMEN

Tea, one of the most widely consumed beverages globally, exhibits remarkable genomic diversity in its underlying flavour and health-related compounds. In this study, we present the construction and analysis of a tea pangenome comprising a total of 11 genomes, with a focus on three newly sequenced genomes comprising the purple-leaved assamica cultivar "Zijuan", the temperature-sensitive sinensis cultivar "Anjibaicha" and the wild accession "L618" whose assemblies exhibited excellent quality scores as they profited from latest sequencing technologies. Our analysis incorporates a detailed investigation of transposon complement across the tea pangenome, revealing shared patterns of transposon distribution among the studied genomes and improved transposon resolution with long read technologies, as shown by long terminal repeat (LTR) Assembly Index analysis. Furthermore, our study encompasses a gene-centric exploration of the pangenome, exploring the genomic landscape of the catechin pathway with our study, providing insights on copy number alterations and gene-centric variants, especially for Anthocyanidin synthases. We constructed a gene-centric pangenome by structurally and functionally annotating all available genomes using an identical pipeline, which both increased gene completeness and allowed for a high functional annotation rate. This improved and consistently annotated gene set will allow for a better comparison between tea genomes. We used this improved pangenome to capture the core and dispensable gene repertoire, elucidating the functional diversity present within the tea species. This pangenome resource might serve as a valuable resource for understanding the fundamental genetic basis of traits such as flavour, stress tolerance, and disease resistance, with implications for tea breeding programmes.

2.
Nat Methods ; 19(4): 429-440, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35396482

RESUMEN

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Asunto(s)
Metagenoma , Metagenómica , Archaea/genética , Metagenómica/métodos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos
3.
Bioinformatics ; 36(22-23): 5548-5550, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33326008

RESUMEN

SUMMARY: We present NCBI-taxonomist-a command-line tool written in Python that collects and manages taxonomic data from the National Center for Biotechnology Information (NCBI). NCBI-taxonomist does not depend on a pre-downloaded taxonomic database but can store data locally. NCBI-taxonomist has six commands to map, collect, extract, resolve, import and group taxonomic data that can be linked together to create powerful analytical pipelines. Because many lifescience databases use the same taxonomic information, the data managed by NCBI-taxonomist is not limited to NCBI and can be used to find data linked to taxonomic information present in other scientific databases. AVAILABILITY AND IMPLEMENTATION: NCBI-taxonomist is implemented in Python 3 (≥3.8) and available at https://gitlab.com/janpb/ncbi-taxonomist and via PyPi (https://pypi.org/project/ncbi-taxonomist/), as a Docker container (https://gitlab.com/janpb/ncbi-taxonomist/container_registry/) and Singularity (v3.5.3) image (https://cloud.sylabs.io/library/jpb/ncbi-taxonomist). NCBI-taxonomist is licensed under the GPLv3.

4.
Nature ; 540(7634): 539-543, 2016 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-27880757

RESUMEN

Current knowledge of RNA virus biodiversity is both biased and fragmentary, reflecting a focus on culturable or disease-causing agents. Here we profile the transcriptomes of over 220 invertebrate species sampled across nine animal phyla and report the discovery of 1,445 RNA viruses, including some that are sufficiently divergent to comprise new families. The identified viruses fill major gaps in the RNA virus phylogeny and reveal an evolutionary history that is characterized by both host switching and co-divergence. The invertebrate virome also reveals remarkable genomic flexibility that includes frequent recombination, lateral gene transfer among viruses and hosts, gene gain and loss, and complex genomic rearrangements. Together, these data present a view of the RNA virosphere that is more phylogenetically and genomically diverse than that depicted in current classification schemes and provide a more solid foundation for studies in virus ecology and evolution.

5.
Nature ; 524(7563): 102-4, 2015 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-26106863

RESUMEN

An epidemic of Ebola virus disease of unprecedented scale has been ongoing for more than a year in West Africa. As of 29 April 2015, there have been 26,277 reported total cases (of which 14,895 have been laboratory confirmed) resulting in 10,899 deaths. The source of the outbreak was traced to the prefecture of Guéckédou in the forested region of southeastern Guinea. The virus later spread to the capital, Conakry, and to the neighbouring countries of Sierra Leone, Liberia, Nigeria, Senegal and Mali. In March 2014, when the first cases were detected in Conakry, the Institut Pasteur of Dakar, Senegal, deployed a mobile laboratory in Donka hospital to provide diagnostic services to the greater Conakry urban area and other regions of Guinea. Through this process we sampled 85 Ebola viruses (EBOV) from patients infected from July to November 2014, and report their full genome sequences here. Phylogenetic analysis reveals the sustained transmission of three distinct viral lineages co-circulating in Guinea, including the urban setting of Conakry and its surroundings. One lineage is unique to Guinea and closely related to the earliest sampled viruses of the epidemic. A second lineage contains viruses probably reintroduced from neighbouring Sierra Leone on multiple occasions, while a third lineage later spread from Guinea to Mali. Each lineage is defined by multiple mutations, including non-synonymous changes in the virion protein 35 (VP35), glycoprotein (GP) and RNA-dependent RNA polymerase (L) proteins. The viral GP is characterized by a glycosylation site modification and mutations in the mucin-like domain that could modify the outer shape of the virion. These data illustrate the ongoing ability of EBOV to develop lineage-specific and potentially phenotypically important variation.


Asunto(s)
Ebolavirus/genética , Variación Genética/genética , Fiebre Hemorrágica Ebola/epidemiología , Fiebre Hemorrágica Ebola/virología , Mutación/genética , Filogenia , Ebolavirus/aislamiento & purificación , Evolución Molecular , Genoma Viral/genética , Glicoproteínas/genética , Glicoproteínas/metabolismo , Glicosilación , Guinea/epidemiología , Fiebre Hemorrágica Ebola/transmisión , Humanos , Malí/epidemiología , Datos de Secuencia Molecular , Mucinas/química , Proteínas de la Nucleocápside , Nucleoproteínas/genética , Estructura Terciaria de Proteína/genética , ARN Polimerasa Dependiente del ARN/genética , Sierra Leona/epidemiología , Proteínas del Núcleo Viral/genética
6.
Bioinformatics ; 35(21): 4511-4514, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31077305

RESUMEN

SUMMARY: Entrezpy is a Python library that automates the querying and downloading of data from the Entrez databases at National Center for Biotechnology Information by interacting with E-Utilities. Entrezpy implements complex queries by automatically creating E-Utility parameters from the results obtained that can then be used directly in subsequent queries. Entrezpy also allows the user to cache and retrieve results locally, implements interactions with all Entrez databases as part of an analysis pipeline and adjusts parameters within an ongoing query or using prior results. Entrezpy's modular design enables it to easily extend and adjust existing E-Utility functions. AVAILABILITY AND IMPLEMENTATION: Entrezpy is implemented in Python 3 (≥3.6) and depends only on the Python Standard Library. It is available via PyPi (https://pypi.org/project/entrezpy/) and at https://gitlab.com/ncbipy/entrezpy.git. Entrezpy is licensed under the LGPLv3 and also at http://entrezpy.readthedocs.io/.


Asunto(s)
Programas Informáticos , Bases de Datos Factuales
7.
Mol Biol Evol ; 35(10): 2572-2581, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-30099499

RESUMEN

Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.


Asunto(s)
Genes Sobrepuestos , Técnicas Genéticas , Genoma Viral , Sistemas de Lectura Abierta , Virus ARN/genética
8.
Theor Appl Genet ; 127(5): 1223-35, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24590356

RESUMEN

KEY MESSAGE: Combining several different approaches, we have examined the structure, variability, and distribution of Tvv1 retrotransposons. Tvv1 is an unusual example of a low-copy retrotransposon metapopulation dispersed unevenly among very distant species and is promising for the development of molecular markers. Retrotransposons are ubiquitous throughout the genomes of the vascular plants, but individual retrotransposon families tend to be confined to the level of plant genus or at most family. This restricts the general applicability of a family as molecular markers. Here, we characterize a new plant retrotransposon named Tvv1_Sdem, a member of the Copia superfamily of LTR retrotransposons, from the genome of the wild potato Solanum demissum. Comparative analyses based on structure and sequence showed a high level of similarity of Tvv1_Sdem with Tvv1-VB, a retrotransposon previously described in the grapevine genome Vitis vinifera. Extending the analysis to other species by in silico and in vitro approaches revealed the presence of Tvv1 family members in potato, tomato, and poplar genomes, and led to the identification of full-length copies of Tvv1 in these species. We were also able to identify polymorphism in UTL sequences between Tvv1_Sdem copies from wild and cultivated potatoes that are useful as molecular markers. Combining different approaches, our results suggest that the Tvv1 family of retrotransposons has a monophyletic origin and has been maintained in both the rosids and the asterids, the major clades of dicotyledonous plants, since their divergence about 100 MYA. To our knowledge, Tvv1 represents an unusual plant retrotransposon metapopulation comprising highly similar members disjointedly dispersed among very distant species. The twin features of Tvv1 presence in evolutionarily distant genomes and the diversity of its UTL region in each species make it useful as a source of robust molecular markers for diversity studies and breeding.


Asunto(s)
Genoma de Planta , Retroelementos/genética , Solanum/genética , Vitis/genética , Secuencia Conservada , Dosificación de Gen , Medicago truncatula/genética , Oryza/genética , Filogenia , Análisis de Secuencia de ADN , Zea mays/genética
9.
Plant J ; 71(4): 550-63, 2012 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-22448600

RESUMEN

Intergenic sequences evolve rapidly in plant genomes through a process known as genomic turnover. To investigate the influence of DNA transposons on genomic turnover, we compared 1 Mbp of orthologous genomic sequences from Brachypodium distachyon and Brachypodium sylvaticum. We found that B. distachyon and B. sylvaticum diverged approximately 1.7-2.0 million years ago. Of a total of 219 genes identified on the analyzed sequences, 211 were colinear. However, only 24 transposable elements of a total of 451 were orthologous (i.e. inserted in the common ancestor). We characterized in detail 59 insertions and 60 excisions of DNA transposons in one or other species, which altered 17% of the intergenic space. The DNA transposon excision sites showed complex and highly diagnostic sequence motifs for double-strand break (DSB) repair. DNA transposon excisions can lead to extensive deletions of hundreds of base pairs of flanking sequence if the DSB is repaired by 'single-strand annealing', or insertions of up to several hundred base pairs of 'filler DNA' if the DSB is repaired by 'synthesis-dependent strand annealing'. In some cases, DSBs were repaired by a combination of both methods. We present a model for the evolution of intergenic sequences in which repair of DSBs upon DNA transposon excision is a major factor in the rapid turnover and erosion of intergenic sequences.


Asunto(s)
Brachypodium/genética , Elementos Transponibles de ADN , ADN Intergénico , Genoma de Planta , Secuencia de Bases , Evolución Biológica , Secuencia Conservada , ADN , Reparación del ADN , Evolución Molecular , Modelos Genéticos , Datos de Secuencia Molecular , Polimorfismo Genético , Eliminación de Secuencia , Homología de Secuencia de Ácido Nucleico
10.
Genome Res ; 20(9): 1229-37, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20530251

RESUMEN

Colinearity of genes in plant genomes generally decreases with increasing evolutionary distance while the actual number of genes remains more or less constant. To characterize the molecular mechanisms of this "gene movement," we identified non-colinear genes by three-way comparison of the genomes of Brachypodium, rice, and sorghum. We found that genomic fragments of up to 50 kb containing the non-colinear genes are duplicated to acceptor sites elsewhere in the genome. Apparent movement of genes may usually be the result of subsequent deletions of genes in the donor region. Often, the duplicated fragments are precisely bordered by transposable elements (TEs) at the acceptor site. Highly diagnostic sequence motifs at these borders strongly suggest that these gene movements were the result of double-strand break (DSB) repair through synthesis-dependent strand annealing. In these cases, a copy of the foreign DNA fragment is used as filler DNA to repair the DSB linked with the transposition of TEs. Interestingly, most TEs we found associated with gene movement have a very low copy number in the genome and for several we did not find autonomous copies. This suggests that some of these elements spontaneously arose from unspecific interaction with TE proteins that are encoded by autonomous elements. Additionally, we found evidence that gene movements can also be caused when DSBs are repaired after template slippage or unequal crossing-over events. The observed frequency of gene movements can explain the erosion of gene colinearity between plant genomes during evolution.


Asunto(s)
Elementos Transponibles de ADN/genética , Genoma de Planta/genética , Brachypodium/genética , Roturas del ADN de Doble Cadena , Duplicación de Gen , Genes de Plantas , Oryza/genética , Sorghum/genética
11.
Plant Biotechnol J ; 11(1): 23-32, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23046423

RESUMEN

Agronomically important traits are frequently controlled by rare, genotype-specific alleles. Such genes can only be mapped in a population derived from the donor genotype. This requires the development of a specific genetic map, which is difficult in wheat because of the low level of polymorphism among elite cultivars. The absence of sufficient polymorphism, the complexity of the hexaploid wheat genome as well as the lack of complete sequence information make the construction of genetic maps with a high density of reproducible and polymorphic markers challenging. We developed a genotype-specific genetic map of chromosome 3B from winter wheat cultivars Arina and Forno. Chromosome 3B was isolated from the two cultivars and then sequenced to 10-fold coverage. This resulted in a single-nucleotide polymorphisms (SNP) database of the complete chromosome. Based on proposed synteny with the Brachypodium model genome and gene annotation, sequences close to coding regions were used for the development of 70 SNP-based markers. They were mapped on a Arina × Forno Recombinant Inbred Lines population and found to be spread over the complete chromosome 3B. While overall synteny was well maintained, numerous exceptions and inversions of syntenic gene order were identified. Additionally, we found that the majority of recombination events occurred in distal parts of chromosome 3B, particularly in hot-spot regions. Compared with the earlier map based on SSR and RFLP markers, the number of markers increased fourfold. The approach presented here allows fast development of genotype-specific polymorphic markers that can be used for mapping and marker-assisted selection.


Asunto(s)
Brachypodium/genética , Mapeo Cromosómico , Cromosomas de las Plantas , Polimorfismo de Nucleótido Simple , Triticum/genética , Genes de Plantas , Marcadores Genéticos , Genoma de Planta , Genotipo , Repeticiones de Microsatélite , Polimorfismo de Longitud del Fragmento de Restricción
12.
Virus Evol ; 8(2): veac082, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36533143

RESUMEN

Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral 'dusk matter': viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational resource-RdRp-scan-that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA-dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) and structural comparisons, we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10 per cent to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by the RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, some fell into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B, and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function, and evolution of viral polymerases.

13.
Fungal Genet Biol ; 48(3): 327-34, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20955813

RESUMEN

The two fungal pathogens Blumeria graminis f. sp. tritici (B.g. tritici) and hordei (B.g. hordei) cause powdery mildew specifically in wheat or barley. They have the same life cycle, but their growth is restricted to the respective host. Here, we compared the sequences of two loci in both cereal mildews to determine their divergence time and their relationship with the evolution of their hosts. We sequenced a total of 273.3kb derived from B.g. tritici BAC sequences and compared them with the orthologous regions in the B.g. hordei genome. Protein-coding genes were colinear and well conserved. In contrast, the intergenic regions showed very low conservation mostly due to different integration patterns of transposable elements. To estimate the divergence time of B.g. tritici and B.g. hordei, we used conserved intergenic sequences including orthologous transposable elements. This revealed that B.g. tritici and B.g. hordei have diverged about 10 million years ago (MYA), two million years after wheat and barley (12 MYA). These data suggest that B.g. tritici and B.g. hordei have co-evolved with their hosts during most of their evolutionary history after host divergence, possibly after a short phase of host expansion when the same pathogen could still grow on the two diverged hosts.


Asunto(s)
Ascomicetos/genética , Evolución Molecular , Hordeum/microbiología , Enfermedades de las Plantas/microbiología , Polimorfismo Genético , Triticum/microbiología , Elementos Transponibles de ADN , ADN de Hongos/química , ADN de Hongos/genética , ADN Intergénico , Especiación Genética , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Homología de Secuencia , Sintenía
14.
Funct Integr Genomics ; 10(4): 509-21, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20464438

RESUMEN

The barley mutant allele sdw3 confers a gibberellin-insensitive, semi-dwarf phenotype with potential for breeding of new semi-dwarfed barley cultivars. Towards map-based cloning, sdw3 was delimited by high-resolution genetic mapping to a 0.04 cM interval in a "cold spot" of recombination of the proximal region of the short arm of barley chromosome 2H. Extensive synteny between the barley Sdw3 locus (Hvu_sdw3) and the orthologous regions (Osa_sdw3, Sbi_sdw3, Bsy_sdw3) of three other grass species (Oryza sativa, Sorghum bicolor, Brachypodium sylvaticum) allowed for efficient synteny-based marker saturation in the target interval. Comparative sequence analysis revealed colinearity for 23 out of the 38, 35, and 29 genes identified in Brachypodium, rice, and Sorghum, respectively. Markers co-segregating with Hvu_sdw3 were generated from two of these genes. Initial attempts at chromosome walking in barley were performed with seven orthologous gene probes which were delimiting physical distances of 223, 123, and 127 kb in Brachypodium, rice, and Sorghum, respectively. Six non-overlapping small bacterial artificial chromosome (BAC) clone contigs (cumulative length of 670 kb) were obtained, which indicated a considerably larger physical size of Hvu_sdw3. Low-pass sequencing of selected BAC clones from these barley contigs exhibited a substantially lower gene frequency per physical distance and the presence of additional non-colinear genes. Four candidate genes for sdw3 were identified within barley BAC sequences that either co-segregated with the gene sdw3 or were located adjacent to these co-segregating genes. Identification of genic sequences in the sdw3 context provides tools for marker-assisted selection. Eventual identification of the actual gene will contribute new information for a basic understanding of the mechanisms underlying growth regulation in barley.


Asunto(s)
Mapeo Cromosómico , Genes de Plantas , Hordeum/genética , Sintenía , Secuencia de Bases , Brachypodium/genética , Cromosomas Artificiales Bacterianos , Cromosomas de las Plantas , Marcadores Genéticos , Genoma de Planta , Genotipo , Giberelinas/farmacología , Datos de Secuencia Molecular , Oryza/genética , Reguladores del Crecimiento de las Plantas/farmacología , Polimorfismo Genético , Plantones/efectos de los fármacos , Plantones/fisiología , Sorghum/genética
15.
Microbiol Resour Announc ; 9(2)2020 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-31919150

RESUMEN

Here, we report the detection of a novel alphavirus in Australian mosquitoes, provisionally named Yada Yada virus (YYV). Phylogenetic analysis indicated that YYV belongs to the mosquito-specific alphavirus complex. The assembled genome is 11,612 nucleotides in length and encodes two open reading frames.

16.
Virus Evol ; 6(2): veaa064, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33240526

RESUMEN

The Flaviviridae family of positive-sense RNA viruses contains important pathogens of humans and other animals, including Zika virus, dengue virus, and hepatitis C virus. The Flaviviridae are currently divided into four genera-Hepacivirus, Pegivirus, Pestivirus, and Flavivirus-each with a diverse host range. Members of the genus Hepacivirus are associated with an array of animal species, including humans, non-human primates, other mammalian species, as well as birds and fish, while the closely related pegiviruses have been identified in a variety of mammalian taxa, also including humans. Using a combination of total RNA and whole-genome sequencing we identified four novel hepaci-like viruses and one novel variant of a known hepacivirus in five species of Australian wildlife. The hosts infected comprised native Australian marsupials and birds, as well as a native gecko (Gehyra lauta). From these data we identified a distinct marsupial clade of hepaci-like viruses that also included an engorged Ixodes holocyclus tick collected while feeding on Australian long-nosed bandicoots (Perameles nasuta). Distinct lineages of hepaci-like viruses associated with geckos and birds were also identified. By mining the SRA database we similarly identified three new hepaci-like viruses from avian and primate hosts, as well as two novel pegi-like viruses associated with primates. The phylogenetic history of the hepaci- and pegi-like viruses as a whole, combined with co-phylogenetic analysis, provided support for virus-host co-divergence over the course of vertebrate evolution, although with frequent cross-species virus transmission. Overall, our work highlights the diversity of the Hepacivirus and Pegivirus genera as well as the uncertain phylogenetic distinction between.

17.
Genome Biol ; 21(1): 103, 2020 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-32345331

RESUMEN

There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.


Asunto(s)
Bacterias/clasificación , Eucariontes/clasificación , Hongos/clasificación , Metagenómica/métodos , Programas Informáticos , Animales , Archaea/clasificación , Archaea/genética , Bacterias/genética , Aves/microbiología , Eucariontes/genética , Hongos/genética , Perfilación de la Expresión Génica
18.
J Med Chem ; 63(17): 9590-9602, 2020 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-32787108

RESUMEN

Proline-rich antimicrobial peptides (PrAMPs) are promising lead compounds for developing new antimicrobials; however, their narrow spectrum of action is limiting. PrAMPs kill bacteria binding to their ribosomes and inhibiting protein synthesis. In this study, 133 derivatives of the PrAMP Bac7(1-16) were synthesized to identify the crucial residues for ribosome inactivation and antimicrobial activity. Then, five new Bac7(1-16) derivatives were conceived and characterized by antibacterial and membrane permeabilization assays, X-ray crystallography, and molecular dynamics simulations. Some derivatives displayed broad spectrum activity, encompassing Escherichia coli, Klebsiella pneumoniae, Acinetobacter baumanii, Pseudomonas aeruginosa, and Staphylococcus aureus. Two peptides out of five acquired a weak membrane-perturbing activity while maintaining the ability to inhibit protein synthesis. These derivatives became independent of the SbmA transporter, commonly used by native PrAMPs, suggesting that they obtained a novel route to enter bacterial cells. PrAMP-derived compounds could become new-generation antimicrobials to combat antibiotic-resistant pathogens.


Asunto(s)
Péptidos Catiónicos Antimicrobianos/química , Péptidos Catiónicos Antimicrobianos/farmacología , Bacterias/efectos de los fármacos , Bacterias/metabolismo , Prolina/química , Péptidos Catiónicos Antimicrobianos/metabolismo , Pruebas de Sensibilidad Microbiana , Permeabilidad , Ribosomas/efectos de los fármacos , Ribosomas/metabolismo
19.
Viruses ; 12(12)2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33322070

RESUMEN

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Metagenómica/métodos , Virus/genética , Biología Computacional/métodos , Variación Genética , Genoma Viral , Interacciones Huésped-Patógeno , Humanos , Interfaz Usuario-Computador , Proteínas Virales/genética , Proteínas Virales/metabolismo , Virus/metabolismo , Navegador Web
20.
Genes (Basel) ; 10(9)2019 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-31527408

RESUMEN

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.


Asunto(s)
Nube Computacional/normas , Genoma Viral , Metagenoma , Metagenómica/métodos , Macrodatos , Genoma Humano , Humanos , Metagenómica/normas , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA