Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 631
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 184(13): 3376-3393.e17, 2021 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-34043940

RESUMO

We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.


Assuntos
Farmacorresistência Bacteriana/genética , Metagenômica , Microbiota/genética , População Urbana , Biodiversidade , Bases de Dados Genéticas , Humanos
2.
Am J Hum Genet ; 109(1): 180-191, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34968422

RESUMO

Next-generation sequencing (NGS) technologies have transformed medical genetics. However, short-read lengths pose a limitation on identification of structural variants, sequencing repetitive regions, phasing of distant nucleotide changes, and distinguishing highly homologous genomic regions. Long-read sequencing technologies may offer improvements in the characterization of genes that are currently difficult to assess. We used a combination of targeted DNA capture, long-read sequencing, and a customized bioinformatics pipeline to fully assemble the RH region, which harbors variation relevant to red cell donor-recipient mismatch, particularly among patients with sickle cell disease. RHD and RHCE are a pair of duplicated genes located within an ∼175 kb region on human chromosome 1 that have high sequence similarity and frequent structural variations. To achieve the assembly, we utilized palindrome repeats in PacBio SMRT reads to obtain consensus sequences of 2.1 to 2.9 kb average length with over 99% accuracy. We used these long consensus sequences to identify 771 assembly markers and to phase the RHD-RHCE region with high confidence. The dataset enabled direct linkage between coding and intronic variants, phasing of distant SNPs to determine RHD-RHCE haplotypes, and identification of known and novel structural variations along with the breakpoints. A limiting factor in phasing is the frequency of heterozygous assembly markers and therefore was most successful in samples from African Black individuals with increased heterogeneity at the RH locus. Overall, this approach allows RH genotyping and de novo assembly in an unbiased and comprehensive manner that is necessary to expand application of NGS technology to high-resolution RH typing.


Assuntos
Transfusão de Sangue , Duplicação Gênica , Variação Genética , Sistema do Grupo Sanguíneo Rh-Hr/genética , Alelos , Anemia Falciforme/genética , Anemia Falciforme/terapia , Quebra Cromossômica , Biologia Computacional/métodos , Frequência do Gene , Heterogeneidade Genética , Ligação Genética , Genômica/métodos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos
3.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37798248

RESUMO

Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Haplótipos , Heterozigoto , Alelos
4.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37406192

RESUMO

Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Pseudogenes , Cromossomos
5.
Genomics ; 116(5): 110902, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39053612

RESUMO

A pioneering pink cultivar of Auricularia cornea, first commercially cultivated in 2022, lacks genomic data, hindering research in genetic breeding, gene discovery, and product development. Here, we report the de novo assembly of the pink A. cornea Fen-A1 genome and provide a detailed functional annotation. The genome is 73.17 Mb in size, contains 86 scaffolds (N50 âˆ¼ 5.49 Mb), 59.09% GC content and encodes 19,120 predicted genes with a BUSCO completeness of 92.60%. Comparative genomic analysis reveals the phylogenetic relatedness of Fen-A1 and remarkable gene family dynamics. Putative genes were found mapped to 3 antibiotic-related, 36 light-dependent and 25 terpene metabolites. In addition, 789 CAZymes genes were classified, revealing the dynamics of quality loss due to postharvest refrigeration. Overall, our work is the first report on a pink A. cornea genome and provides a comprehensive insight into its complex functions.

6.
BMC Genomics ; 25(1): 92, 2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38254015

RESUMO

BACKGROUND: Gorals Naemorhedus resemble both goats and antelopes, which prompts much debate about the intragenus species delimitation and phylogenetic status of the genus Naemorhedus within the subfamily Caprinae. Their evolution is believed to be linked to the uplift of the Qinghai-Tibet Plateau (QTP). To better understand its phylogenetics, the genetic information is worth being resolved. RESULTS: Based on a sample from the eastern margin of QTP, we constructed the first reference genome for Himalayan goral Naemorhedus goral, using PacBio long-read sequencing and Hi-C technology. The 2.59 Gb assembled genome had a contig N50 of 3.70 Mb and scaffold N50 of 106.66 Mb, which anchored onto 28 pseudo chromosomes. A total of 20,145 protein-coding genes were predicted in the assembled genome, of which 99.93% were functionally annotated. Phylogenetically, the goral was closely related to muskox on the mitochondrial genome level and nested into the takin-muskox clade on the genome tree, rather than other so-called goat-antelopes. The cladogenetic event among muskox, takin and goral occurred sequentially during the late Miocene (~ 11 - 5 Mya), when the QTP experienced a third dramatic uplift with consequent profound changes in climate and environment. Several chromosome fusions and translocations were observed between goral and takin/muskox. The expanded gene families in the goral genome were mainly related to the metabolism of drugs and diseases, so as the positive selected genes. The Ne of goral continued to decrease since ~ 1 Mya during the Pleistocene with active glaciations. CONCLUSION: The high-quality goral genome provides insights into the evolution and valuable information for the conservation of this threatened group.


Assuntos
Antílopes , Animais , Antílopes/genética , Filogenia , Cabras/genética , Rearranjo Gênico , Cromossomos
7.
Mol Biol Evol ; 40(3)2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36869750

RESUMO

As the accuracy and throughput of nanopore sequencing improve, it is increasingly common to perform long-read first de novo genome assemblies followed by polishing with accurate short reads. We briefly introduce FMLRC2, the successor to the original FM-index Long Read Corrector (FMLRC), and illustrate its performance as a fast and accurate de novo assembly polisher for both bacterial and eukaryotic genomes.


Assuntos
Eucariotos , Nanoporos , Análise de Sequência de DNA , Eucariotos/genética , Bactérias/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala
8.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35511110

RESUMO

BACKGROUND: The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production. METHODS: We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains. RESULTS: The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.


Assuntos
Genoma Fúngico , Sequenciamento de Nucleotídeos em Larga Escala , Saccharomyces cerevisiae , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA/métodos
9.
New Phytol ; 243(6): 2251-2264, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39073105

RESUMO

The shape of rice grains not only determines the thousand-grain weight but also correlates closely with the grain quality. Here we identified an ultra-large grain accession (ULG) with a thousand-grain weight exceeding 60 g. The integrated analysis of QTL, BSA, de novo genome assembled, transcription sequencing, and gene editing was conducted to dissect the molecular basis of the ULG formation. The ULG pyramided advantageous alleles from at least four known grain-shaping genes, OsLG3, OsMADS1, GS3, GL3.1, and one novel locus, qULG2-b, which encoded a leucine-rich repeat receptor-like kinase. The collective impacts of OsLG3, OsMADS1, GS3, and GL3.1 on grain size were confirmed in transgenic plants and near-isogenic lines. The transcriptome analysis identified 112 genes cooperatively regulated by these four genes that were prominently involved in photosynthesis and carbon metabolism. By leveraging the pleiotropy of these genes, we enhanced the grain yield, appearance, and stress tolerance of rice var. SN265. Beyond showcasing the pyramiding of multiple grain size regulation genes that can produce ULG, our study provides a theoretical framework and valuable genomic resources for improving rice variety by leveraging the pleiotropy of grain size regulated genes.


Assuntos
Grão Comestível , Regulação da Expressão Gênica de Plantas , Oryza , Locos de Características Quantitativas , Oryza/genética , Oryza/crescimento & desenvolvimento , Oryza/metabolismo , Grão Comestível/genética , Grão Comestível/crescimento & desenvolvimento , Locos de Características Quantitativas/genética , Genes de Plantas , Plantas Geneticamente Modificadas , Proteínas de Plantas/metabolismo , Proteínas de Plantas/genética , Fenótipo , Alelos , Estresse Fisiológico/genética
10.
Front Zool ; 21(1): 17, 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38902827

RESUMO

Many questions in biology benefit greatly from the use of a variety of model systems. High-throughput sequencing methods have been a triumph in the democratization of diverse model systems. They allow for the economical sequencing of an entire genome or transcriptome of interest, and with technical variations can even provide insight into genome organization and the expression and regulation of genes. The analysis and biological interpretation of such large datasets can present significant challenges that depend on the 'scientific status' of the model system. While high-quality genome and transcriptome references are readily available for well-established model systems, the establishment of such references for an emerging model system often requires extensive resources such as finances, expertise and computation capabilities. The de novo assembly of a transcriptome represents an excellent entry point for genetic and molecular studies in emerging model systems as it can efficiently assess gene content while also serving as a reference for differential gene expression studies. However, the process of de novo transcriptome assembly is non-trivial, and as a rule must be empirically optimized for every dataset. For the researcher working with an emerging model system, and with little to no experience with assembling and quantifying short-read data from the Illumina platform, these processes can be daunting. In this guide we outline the major challenges faced when establishing a reference transcriptome de novo and we provide advice on how to approach such an endeavor. We describe the major experimental and bioinformatic steps, provide some broad recommendations and cautions for the newcomer to de novo transcriptome assembly and differential gene expression analyses. Moreover, we provide an initial selection of tools that can assist in the journey from raw short-read data to assembled transcriptome and lists of differentially expressed genes.

11.
Plant Cell Rep ; 43(3): 77, 2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38386216

RESUMO

KEY MESSAGE: We reported the mitochondrial genome of Ventilago leiocarpa for the first time. Two and one sites lead to the generation of stop and stat codon through editing were verified. Ventilago leiocarpa, a member of the Rhamnaceae family, is frequently utilized in traditional medicine due to the medicinal properties of its roots. In this study, we successfully assembled the mitogenome of V. leiocarpa using both BGI short reads and Nanopore long reads. This mitogenome has a total length of 331,839 bp. The annotated results showed 36 unique protein-coding, 16 tRNA and 3 rRNA genes in this mitogenome. Furthermore, we confirmed the presence of a branched structure through the utilization of long reads mapping, PCR amplification, and Sanger sequencing. Specifically, the ctg1 can form a single circular molecule or combine with ctg4 to form a linear molecule. Likewise, ctg2 can form a single circular molecule or can be connected to ctg4 to form a linear molecule. Subsequently, through a comparative analysis of the mitogenome and cpgenome sequences, we identified ten mitochondrial plastid sequences (MTPTs), including two complete protein-coding genes and five complete tRNA genes. The existence of MTPTs was verified by long reads. Colinear analysis showed that the mitogenomes of Rosales were highly divergent in structure. Finally, we identified 545 RNA editing sites involving 36 protein-coding genes by Deepred-mt. To validate our findings, we conducted PCR amplification and Sanger sequencing, which confirmed the generation of stop codons in atp9-223 and rps10-391, as well as the generation of a start codon in nad4L-2. This project reported the complex structure and RNA editing event of the V. Leiocarpa mitogenome, which will provide valuable information for the study of mitochondrial gene expression.


Assuntos
Asteraceae , Genoma Mitocondrial , Rhamnaceae , Genoma Mitocondrial/genética , Expressão Gênica , RNA de Transferência/genética
12.
Proc Natl Acad Sci U S A ; 118(35)2021 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-34408075

RESUMO

Genomic structural variants (SVs) can play important roles in adaptation and speciation. Yet the overall fitness effects of SVs are poorly understood, partly because accurate population-level identification of SVs requires multiple high-quality genome assemblies. Here, we use 31 chromosome-scale, haplotype-resolved genome assemblies of Theobroma cacao-an outcrossing, long-lived tree species that is the source of chocolate-to investigate the fitness consequences of SVs in natural populations. Among the 31 accessions, we find over 160,000 SVs, which together cover eight times more of the genome than single-nucleotide polymorphisms and short indels (125 versus 15 Mb). Our results indicate that a vast majority of these SVs are deleterious: they segregate at low frequencies and are depleted from functional regions of the genome. We show that SVs influence gene expression, which likely impairs gene function and contributes to the detrimental effects of SVs. We also provide empirical support for a theoretical prediction that SVs, particularly inversions, increase genetic load through the accumulation of deleterious nucleotide variants as a result of suppressed recombination. Despite the overall detrimental effects, we identify individual SVs bearing signatures of local adaptation, several of which are associated with genes differentially expressed between populations. Genes involved in pathogen resistance are strongly enriched among these candidates, highlighting the contribution of SVs to this important local adaptation trait. Beyond revealing empirical evidence for the evolutionary importance of SVs, these 31 de novo assemblies provide a valuable resource for genetic and breeding studies in Tcacao.


Assuntos
Adaptação Fisiológica , Cacau/genética , Chocolate , Cromossomos de Plantas/genética , Genoma de Planta , Variação Estrutural do Genoma , Árvores/genética , Evolução Biológica , Cacau/crescimento & desenvolvimento , Fenótipo , Melhoramento Vegetal , Árvores/crescimento & desenvolvimento
13.
Biochem Genet ; 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38836961

RESUMO

Panax japonicus Meyer, a perennial herb of the dicotyledonaceae family Araliaceae, is a rare folk traditional Chinese medicine, known as "the king of herbal medicine" in China. To understand the genes involved in secondary pathways under drought and salt stress, the transcriptomic analysis of P. japonicus is of vital importance. The transcriptome of underground rhizomes, stems, and leaves under drought and salt stress in P. japonicus were performed using the Illumina HiSeq platform. After de novo assembly of transcripts, expression profiling and identified differentially expressed genes (DEGs) were performed. Furthermore, putative functions of identified DEGs correlated with ginsenoside in P. japonicus were explored using Gene Ontology terms and Kyoto Encyclopedia of Genes and Genome (KEGG) pathway enrichment analysis. A total of 221,804 unigenes were obtained from the transcriptome of P. japonicus. The further analysis revealed that 10,839 unigenes were mapped to 91 KEGG pathways. Furthermore, a total of two metabolic pathways of P. japonicus in response to drought and salt stress related to triterpene saponin synthesis were screened. The sesquiterpene and triterpene metabolic pathways were annotated and finally putatively involved in ginsenoside content and correlation analysis of the expression of these genes were analyzed to identify four genes, ß-amyrin synthase, isoprene synthase, squalene epoxidase, and 1-deoxy-D-ketose-5-phosphate synthase, respectively. Our results paves the way for screening highly expressed genes and mining genes related to triterpenoid saponin synthesis. It also provides valuable references for the study of genes involved in ginsenoside biosynthesis and signal pathway of P. japonicus.

14.
BMC Biol ; 21(1): 9, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36747166

RESUMO

BACKGROUND: In 1975, the mummified body of a female has been found in the Franciscan church in Basel, Switzerland. Molecular and genealogic analyses unveiled her identity as Anna Catharina Bischoff (ACB), a member of the upper class of post-reformed Basel, who died at the age of 68 years, in 1787. The reason behind her death is still a mystery, especially that toxicological analyses revealed high levels of mercury, a common treatment against infections at that time, in different body organs. The computed tomography (CT) and histological analysis showed bone lesions in the femurs, the rib cage, and the skull, which refers to a potential syphilis case. RESULTS: Although we could not detect any molecular signs of the syphilis-causing pathogen Treponema pallidum subsp. pallidum, we realized high prevalence of a nontuberculous mycobacterium (NTM) species in brain tissue sample. The genome analysis of this NTM displayed richness of virulence genes and toxins, and similarity to other infectious NTM, known to infect immunocompromised patients. In addition, it displayed potential resistance to mercury compounds, which might indicate a selective advantage against the applied treatment. This suggests that ACB might have suffered from an atypical mycobacteriosis during her life, which could explain the mummy's bone lesion and high mercury concentrations. CONCLUSIONS: The study of this mummy exemplifies the importance of employing differential diagnostic approaches in paleopathological analysis, by combining classical anthropological, radiological, histological, and toxicological observations with molecular analysis. It represents a proof-of-concept for the discovery of not-yet-described ancient pathogens in well-preserved specimens, using de novo metagenomic assembly.


Assuntos
Infecções por Mycobacterium não Tuberculosas , Sífilis , Humanos , Feminino , Idoso , Micobactérias não Tuberculosas/genética , Infecções por Mycobacterium não Tuberculosas/diagnóstico , Infecções por Mycobacterium não Tuberculosas/epidemiologia , Infecções por Mycobacterium não Tuberculosas/microbiologia , Suíça , Virulência
15.
Genomics ; 115(2): 110588, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36841311

RESUMO

Gall oak (Quercus infectoria) is a native tree of Iran, whose gall extract is used to treat many diseases. The presence of abundant secondary metabolites with various bioactivities in this plant has made it medically important. Despite its medicinal value, due to the lack of genomic information, the biosynthetic pathways of these compounds in this species are still unknown. The current research was aimed at observing, characterizing, and investigating the biosynthetic pathways of these compounds in Q.infectoria. De novo transcriptome assembly was conducted using the RNA sequencing technique. A total of 89,335 unigenes were generated, of which 6928 unigenes showed differential expression in leaves compared to root tissue. Gene ontology examination of DEGs revealed GO-term enrichment was related to cellular processes and enzyme activity. KEGG enrichment analysis for DEGs showed that most unigenes were related to metabolic pathways and biosynthesis of secondary metabolites. Moreover, 39 families of transcription factors were identified, of which the C2H2, bZIP, bHLH, and ERF TFs had the highest frequency. In the absence of a reference genome, the overall study of transcriptome will provide a reference for future functional and comparative studies. Moreover, the data obtained from sequencing and de novo assembly can be a valuable scientific resource for Q.infectoria.


Assuntos
Quercus , Quercus/genética , Anotação de Sequência Molecular , Perfilação da Expressão Gênica , Transcriptoma , Redes e Vias Metabólicas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Regulação da Expressão Gênica de Plantas
16.
BMC Genomics ; 24(1): 401, 2023 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-37460975

RESUMO

BACKGROUND: Bacteria of the Borrelia burgdorferi sensu lato (s.l.) complex can cause Lyme borreliosis. Different B. burgdorferi s.l. genospecies vary in their host and vector associations and human pathogenicity but the genetic basis for these adaptations is unresolved and requires completed and reliable genomes for comparative analyses. The de novo assembly of a complete Borrelia genome is challenging due to the high levels of complexity, represented by a high number of circular and linear plasmids that are dynamic, showing mosaic structure and sequence homology. Previous work demonstrated that even advanced approaches, such as a combination of short-read and long-read data, might lead to incomplete plasmid reconstruction. Here, using recently developed high-fidelity (HiFi) PacBio sequencing, we explored strategies to obtain gap-free, complete and high quality Borrelia genome assemblies. Optimizing genome assembly, quality control and refinement steps, we critically appraised existing techniques to create a workflow that lead to improved genome reconstruction. RESULTS: Despite the latest available technologies, stand-alone sequencing and assembly methods are insufficient for the generation of complete and high quality Borrelia genome assemblies. We developed a workflow pipeline for the de novo genome assembly for Borrelia using several types of sequence data and incorporating multiple assemblers to recover the complete genome including both circular and linear plasmid sequences. CONCLUSION: Our study demonstrates that, with HiFi data and an ensemble reconstruction pipeline with refinement steps, chromosomal and plasmid sequences can be fully resolved, even for complex genomes such as Borrelia. The presented pipeline may be of interest for the assembly of further complex microbial genomes.


Assuntos
Grupo Borrelia Burgdorferi , Borrelia burgdorferi , Borrelia , Doença de Lyme , Humanos , Borrelia/genética , Genoma Bacteriano , Filogenia , Borrelia burgdorferi/genética , Doença de Lyme/microbiologia , Grupo Borrelia Burgdorferi/genética
17.
BMC Biotechnol ; 23(1): 51, 2023 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-38049781

RESUMO

BACKGROUND: Goat rumen microbial communities are perceived as one of the most potential biochemical reservoirs of multi-functional enzymes, which are applicable to enhance wide array of bioprocesses such as the hydrolysis of cellulose and hemi-cellulose into fermentable sugar for biofuel and other value-added biochemical production. Even though, the limited understanding of rumen microbial genetic diversity and the absence of effective screening culture methods have impeded the full utilization of these potential enzymes. In this study, we applied culture independent metagenomics sequencing approach to isolate, and identify microbial communities in goat rumen, meanwhile, clone and functionally characterize novel cellulase and xylanase genes in goat rumen bacterial communities. RESULTS: Bacterial DNA samples were extracted from goat rumen fluid. Three genomic libraries were sequenced using Illumina HiSeq 2000 for paired-end 100-bp (PE100) and Illumina HiSeq 2500 for paired-end 125-bp (PE125). A total of 435gb raw reads were generated. Taxonomic analysis using Graphlan revealed that Fibrobacter, Prevotella, and Ruminococcus are the most abundant genera of bacteria in goat rumen. SPAdes assembly and prodigal annotation were performed. The contigs were also annotated using the DOE-JGI pipeline. In total, 117,502 CAZymes, comprising endoglucanases, exoglucanases, beta-glucosidases, xylosidases, and xylanases, were detected in all three samples. Two genes with predicted cellulolytic/xylanolytic activities were cloned and expressed in E. coli BL21(DE3). The endoglucanases and xylanase enzymatic activities of the recombinant proteins were confirmed using substrate plate assay and dinitrosalicylic acid (DNS) analysis. The 3D structures of endoglucanase A and endo-1,4-beta xylanase was predicted using the Swiss Model. Based on the 3D structure analysis, the two enzymes isolated from goat's rumen metagenome are unique with only 56-59% similarities to those homologous proteins in protein data bank (PDB) meanwhile, the structures of the enzymes also displayed greater stability, and higher catalytic activity. CONCLUSIONS: In summary, this study provided the database resources of bacterial metagenomes from goat's rumen fluid, including gene sequences with annotated functions and methods for gene isolation and over-expression of cellulolytic enzymes; and a wealth of genes in the metabolic pathways affecting food and nutrition of ruminant animals.


Assuntos
Celulase , Celulases , Animais , Celulase/metabolismo , Metagenoma , Cabras/genética , Cabras/metabolismo , Cabras/microbiologia , Rúmen/metabolismo , Rúmen/microbiologia , Escherichia coli/genética , Bactérias , Celulases/genética , Celulose
18.
BMC Plant Biol ; 23(1): 94, 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36782126

RESUMO

The indica rice variety XYXZ carries elite traits including appearance and eating quality. Here, we report the de novo assembly of XYXZ using Illumine paired-end whole-genome shotgun sequencing and Nanopore sequencing. We annotated 39,722 protein-coding genes in the 395.04 Mb assembly. In comparison to other cultivars, XYXZ showed a larger gene size including the transcripts and introns, and more exons per gene. And hundreds of ultra-long genes were also detected. A total of 4362 complete LTRs were annotated, and among them, many were located next to or in protein-coding genes including several genes related to rice quality. We observed the different distributions of LTRs in these genes among XYXZ, Nipponbare, and R498, implying these LTRs might potentially affect expressions of the proximal genes and rice quality. Overall, This chromosome-length genome assembly of XYXZ provides a valuable resource for gene discovery, genetic variation and evolution, and the breeding of high-quality rice.


Assuntos
Genoma de Planta , Oryza , Oryza/genética , Melhoramento Vegetal , Sequenciamento Completo do Genoma , Cromossomos
19.
Plant Biotechnol J ; 21(1): 202-218, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36196761

RESUMO

Temperate japonica/geng (GJ) rice yield has significantly improved due to intensive breeding efforts, dramatically enhancing global food security. However, little is known about the underlying genomic structural variations (SVs) responsible for this improvement. We compared 58 long-read assemblies comprising cultivated and wild rice species in the present study, revealing 156 319 SVs. The phylogenomic analysis based on the SV dataset detected the putatively selected region of GJ sub-populations. A significant portion of the detected SVs overlapped with genic regions were found to influence the expression of involved genes inside GJ assemblies. Integrating the SVs and causal genetic variants underlying agronomic traits into the analysis enables the precise identification of breeding signatures resulting from complex breeding histories aimed at stress tolerance, yield potential and quality improvement. Further, the results demonstrated genomic and genetic evidence that the SV in the promoter of LTG1 is accounting for chilling sensitivity, and the increased copy numbers of GNP1 were associated with positive effects on grain number. In summary, the current study provides genomic resources for retracing the properties of SVs-shaped agronomic traits during previous breeding procedures, which will assist future genetic, genomic and breeding research on rice.


Assuntos
Oryza , Oryza/genética , Melhoramento Vegetal , Genômica/métodos , Fenótipo , Grão Comestível
20.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822878

RESUMO

BACKGROUND: Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic following its initial emergence in China. SARS-CoV-2 has a positive-sense single-stranded RNA virus genome of around 30Kb. Using next-generation sequencing technologies, a large number of SARS-CoV-2 genomes are being sequenced at an unprecedented rate and being deposited in public repositories. For the de novo assembly of the SARS-CoV-2 genomes, a myriad of assemblers is being used, although their impact on the assembly quality has not been characterized for this virus. In this study, we aim to understand the variabilities on assembly qualities due to the choice of the assemblers. RESULTS: We performed 6648 de novo assemblies of 416 SARS-CoV-2 samples using eight different assemblers with different k-mer lengths. We used Illumina paired-end sequencing reads and compared the assembly quality of those assemblers. We showed that the choice of assembler plays a significant role in reconstructing the SARS-CoV-2 genome. Two metagenomic assemblers, e.g. MEGAHIT and metaSPAdes, performed better compared with others in most of the assembly quality metrics including, recovery of a larger fraction of the genome, constructing larger contigs and higher N50, NA50 values, etc. We showed that at least 09% (259/2873) of the variants present in the assemblies between MEGAHIT and metaSPAdes are unique to one of the assembly methods. CONCLUSION: Our analyses indicate the critical role of assembly methods for assembling SARS-CoV-2 genome using short reads and their impact on variant characterization. This study could help guide future studies to determine the best-suited assembler for the de novo assembly of virus genomes.


Assuntos
Genoma Viral , Mutação , SARS-CoV-2/genética , COVID-19/virologia , Bases de Dados Genéticas , Sequências de Repetição em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA