Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.725
Filtrar
1.
BMC Bioinformatics ; 21(1): 358, 2020 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-32795263

RESUMO

BACKGROUND: The dramatic decrease in sequencing costs over the last decade has boosted the adoption of high-throughput sequencing applications as a standard tool for the analysis of environmental microbial communities. Nowadays even small research groups can easily obtain raw sequencing data. After that, however, non-specialists are faced with the double challenge of choosing among an ever-increasing array of analysis methodologies, and navigating the vast amounts of results returned by these approaches. RESULTS: Here we present a workflow that relies on the SqueezeMeta software for the automated processing of raw reads into annotated contigs and reconstructed genomes (bins). A set of custom scripts seamlessly integrates the output into the anvi'o analysis platform, allowing filtering and visual exploration of the results. Furthermore, we provide a software package with utility functions to expose the SqueezeMeta results to the R analysis environment. CONCLUSIONS: Altogether, our workflow allows non-expert users to go from raw sequencing reads to custom plots with only a few powerful, flexible and well-documented commands.


Assuntos
Biologia Computacional/métodos , Software , Mapeamento de Sequências Contíguas , Bases de Dados Factuais , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica
2.
PLoS Comput Biol ; 16(7): e1008104, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32735589

RESUMO

High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.


Assuntos
Mapeamento de Sequências Contíguas , Genoma Helmíntico , Heterozigoto , Anotação de Sequência Molecular/métodos , Nematoides/genética , Membro 1 da Subfamília B de Cassetes de Ligação de ATP/metabolismo , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Funções Verossimilhança , Proteoma , Análise de Sequência de DNA
3.
PLoS One ; 15(5): e0225808, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32396560

RESUMO

Peronospora effusa (previously known as P. farinosa f. sp. spinaciae, and here referred to as Pfs) is an obligate biotrophic oomycete that causes downy mildew on spinach (Spinacia oleracea). To combat this destructive many disease resistant cultivars have been bred and used. However, new Pfs races rapidly break the employed resistance genes. To get insight into the gene repertoire of Pfs and identify infection-related genes, the genome of the first reference race, Pfs1, was sequenced, assembled, and annotated. Due to the obligate biotrophic nature of this pathogen, material for DNA isolation can only be collected from infected spinach leaves that, however, also contain many other microorganisms. The obtained sequences can, therefore, be considered a metagenome. To filter and obtain Pfs sequences we utilized the CAT tool to taxonomically annotate ORFs residing on long sequences of a genome pre-assembly. This study is the first to show that CAT filtering performs well on eukaryotic contigs. Based on the taxonomy, determined on multiple ORFs, contaminating long sequences and corresponding reads were removed from the metagenome. Filtered reads were re-assembled to provide a clean and improved Pfs genome sequence of 32.4 Mbp consisting of 8,635 scaffolds. Transcript sequencing of a range of infection time points aided the prediction of a total of 13,277 gene models, including 99 RxLR(-like) effector, and 14 putative Crinkler genes. Comparative analysis identified common features in the predicted secretomes of different obligate biotrophic oomycetes, regardless of their phylogenetic distance. Their secretomes are generally smaller, compared to hemi-biotrophic and necrotrophic oomycete species. We observe a reduction in proteins involved in cell wall degradation, in Nep1-like proteins (NLPs), proteins with PAN/apple domains, and host translocated effectors. The genome of Pfs1 will be instrumental in studying downy mildew virulence and for understanding the molecular adaptations by which new isolates break spinach resistance.


Assuntos
Metagenoma , Peronospora/genética , Doenças das Plantas/microbiologia , Spinacia oleracea/microbiologia , Mapeamento de Sequências Contíguas/métodos , Peronospora/patogenicidade , Virulência
4.
Cytogenet Genome Res ; 160(2): 85-93, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32235117

RESUMO

From an economic point of view, Bovidae represent the most important family of the Ruminantia suborder. Thus, the mitochondrial and nuclear genomes of Bos taurus were among the first genomes to be sequenced after the sequencing of the human genomes. Over the millennia, the evolution of the genomes of the 3 main species belonging to the Bovidae family - B. taurus (BTA), Ovis aries (OAR), and Capra hircus (CHI) - has led to few chromosome rearrangements. Certainly, the availability and free access to the animal genomes significantly contributed to the improvement of animal genetics; however, some errors may exist due to the high automation in the genomic assembly construction process. In this work, some differences between the genomes of cattle, goat, and sheep highlighted by bioinformatics analysis have been verified by FISH, confirming that some errors persist even in the most recent genome assemblies. This type of approach has allowed us to detect a misassembly of a region belonging to BTA16 and to the homologues OAR12 and CHI16, a misassembly of a short tract in BTA22, OAR19, and CHI22, an incorrect mapping of a region of BTA21 and of CHI27 and OAR26, a discrepancy in the BTA26, OAR22, and CHI26 assemblies, a missed inversion in CHI1 compared to BTA1 and OAR1, and the exact assembly of a region of about 7 Mb in OAR10 and CHI12. Incorrect positioning of genomic tracts can cause unintended consequences in genetic analyses, especially when the data represent a starting point for the construction of genetic tools. In the new genomic assemblies published after the conclusion of our experiments, however, the accuracy in the construction of animal assemblies has been much improved, even if the new assemblies present more extended unmapped portions than the previous versions. The gap could be filled by comparative analyses between similar species or FISH.


Assuntos
Bovinos/genética , Cromossomos de Mamíferos/genética , Biologia Computacional/métodos , Cabras/genética , Ovinos/genética , Animais , Mapeamento de Sequências Contíguas , Evolução Molecular , Variação Genética , Genômica , Hibridização in Situ Fluorescente
5.
PLoS One ; 15(4): e0232005, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32343733

RESUMO

Transcriptome resources can facilitate to increase yield and quality of walnuts. Finding the best transcriptome assembly has not been the subject of walnuts research as yet. This research generated 240,179,782 reads from 11 walnut leaves according to cDNA libraries. The reads provided a complete de novo transcriptome assembly. Fifteen different transcriptome assemblies were constructed from five different well-known assemblers used in scientific literature with different k-mer lengths (Bridger, BinPacker, SOAPdenovo-Trans, Trinity and SPAdes) as well as two merging approaches (EvidentialGene and Transfuse). Based on the four quality metrics of assembly, the results indicated an efficiency in the process of merging the assemblies after being generated by de novo assemblers. Finally, EvidentialGene was recognized as the best assembler for the de novo assembly of the leaf transcriptome in walnut. Among a total number of 183,191 transcripts which were generated by EvidentialGene, there were 109,413 transcripts capable of protein potential (59.72%) and 104,926 were recognized as ORFs (57.27%). In addition, 79,185 transcripts were predicted to exist with at least one hit to the Pfam database. A number of 3,931 transcription factors were identified by BLAST searching against PlnTFDB. Furthermore, 6,591 of the predicted peptide sequences contained signaling peptides, while 92,704 contained transmembrane domains. Comparison of the assembled transcripts with transcripts of the walnut and published genome assembly for the 'Chandler' cultivar using the BLAST algorithm led to identify a total number of 27,304 and 19,178 homologue transcripts, respectively. De novo transcriptomes in walnut leaves can be developed for the future studies in functional genomics and genetic studies of walnuts.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Perfilação da Expressão Gênica/métodos , Juglans/genética , Biologia Computacional/métodos , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular , Folhas de Planta/genética , Proteínas de Plantas/genética , Análise de Sequência de RNA/métodos
6.
BMC Genomics ; 21(Suppl 3): 243, 2020 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-32241258

RESUMO

BACKGROUND: The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics. RESULTS: Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome. CONCLUSIONS: Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.


Assuntos
Callithrix/genética , Mapeamento Cromossômico/métodos , Genoma/genética , Animais , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência
7.
Mol Plant Microbe Interact ; 33(6): 794-797, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32129709

RESUMO

Phytophthora ramorum, P. kernoviae, and P. melonis are each species of current regulatory concern in the United States, the United Kingdom, and other areas of the world. Ex-type material are cultures and duplicates of the type that was used to describe each species and that are deposited in additional culture collections. Using these type specimens as references is essential to designing correct molecular identification and diagnostic systems. Here, we report a whole genome sequence for the Ex-type material of P. ramorum, P. kernoviae, and P. melonis generated using high-throughput sequencing via the MinION third generation platform from Oxford Nanopore Technology. We assembled the quality filtered reads into contigs for each species. We assembled the continuous contigs of P. ramorum, P. kernoviae, and P. melonis (1,322, 545, and 2,091 contigs, respectively). The ab initio prediction of genes from these species reveals that there are 16,838, 12,793, and 34,580 genes in P. ramorum, P. kernoviae, and P. melonis, respectively. Of the 34,580 P. melonis genes, 10,164 genes were conserved among all three of these Phytophthora species which may include pathogenicity genes. We compared the ex-type of P. ramorum EU1 lineage assembly with another selected isolate of EU1 available at the National Center for Biotechnology Information and found 251,859 single nucleotide polymorphisms (SNPs) genome-wide; the comparison with the EU2 lineage genome isolate revealed 441,859 SNPs genome-wide. This genome resource of the ex-types of P. ramorum, and P. kernoviae is a significant contribution as these species are among the most important pathogens of regulatory concern in different regions of the world.


Assuntos
Genoma , Sequenciamento por Nanoporos , Phytophthora/genética , Doenças das Plantas/parasitologia , Mapeamento de Sequências Contíguas , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único
8.
Mol Plant Microbe Interact ; 33(6): 782-786, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32150511

RESUMO

Powdery mildew of sweet pepper (Capsicum annuum) is an economically important disease. It is caused by Leveillula taurica, an obligate biotrophic ascomycete with a partly endophytic mycelium and haustoria, i.e., feeding structures formed in the mesophyll cells of infected host plant tissues. The molecular basis of its pathogenesis is largely unknown because genomic resources only exist for epiphytically growing powdery mildew fungi with haustoria formed exclusively in epidermal cells of their plant hosts. Here, we present the first reference genome assembly for an isolate of L. taurica isolated from sweet pepper in Hungary. The short read-based assembly consists of 23,599 contigs with a total length of 187.2 Mbp; the scaffold N50 is 13,899 kbp and N90 is 3,522 kbp; and the average GC content is 39.2%. We detected at least 92,881 transposable elements covering 55.5 Mbp (30.4%). BRAKER predicted 19,751 protein-coding gene models in this assembly. Our reference genome assembly of L. taurica is the first resource to study the molecular pathogenesis and evolution of a powdery mildew fungus with a partly endophytic lifestyle.


Assuntos
Ascomicetos/genética , Capsicum/microbiologia , Genoma Fúngico , Doenças das Plantas/microbiologia , Composição de Bases , Mapeamento de Sequências Contíguas , Elementos de DNA Transponíveis
9.
PLoS One ; 15(2): e0228199, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32040520

RESUMO

In the present study, we identified salt stress tolerant genes from the marine bacterium Staphylococcus sp. strain P-TSB-70 through transcriptome sequencing. In favour of whole-genome transcriptome profiling of Staphylococcus sp. strain P-TSB-70 (GenBank Accn. No. KP117091) which tolerated upto 20% NaCl stress, the strain was cultured in the laboratory condition with 20% NaCl stress. Transcriptome analyses were performed by SOLiD4.0 sequencing technology from which 10280 and 9612 transcripts for control and treated, respectively, were obtained. The coverage per base (CPB) statistics were analyzed for both the samples. Gene ontology (GO) analysis has been categorized at varied graph levels based on three primary ontology studies viz. cellular components, biological processes, and molecular functions. The KEGG analysis of the assembled transcripts using KAAS showed presumed components of metabolic pathways which perhaps implicated in diverse metabolic pathways responsible for salt tolerance viz. glycolysis/gluconeogenesis, oxidative phosphorylation, glutathione metabolism, etc. further involving in salt tolerance. Overall, 90 salt stress tolerant genes were identified as of 186 salt-related transcripts. Several genes have been found executing normally in the TCA cycle pathway, integral membrane proteins, generation of the osmoprotectants, enzymatic pathway associated with salt tolerance. Recognized genes fit diverse groups of salt stress genes viz. abc transporter, betaine, sodium antiporter, sodium symporter, trehalose, ectoine, and choline, that belong to different families of genes involved in the pathway of salt stress. The control sample of the bacterium showed elevated high proportion of transcript contigs (29%) while upto 20% salt stress treated sample of the bacterium showed a higher percentage of transcript contigs (31.28%). A total of 1,288 and 1,133 transcript contigs were measured entirely as novel transcript contigs in both control and treated samples, respectively. The structure and function of 10 significant salt stress tolerant genes of Staphylococcus sp. have been analyzed in this study. The information acquired in the present study possibly used to recognize and clone the salt stress tolerant genes and support in developing the salt stress-tolerant plant varieties to expand the agricultural productivity in the saline system.


Assuntos
Tolerância ao Sal/genética , Água do Mar/microbiologia , Staphylococcus/genética , Transcriptoma , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Mapeamento de Sequências Contíguas , Perfilação da Expressão Gênica , Genoma Bacteriano , Índia , Redes e Vias Metabólicas/genética , Estrutura Terciária de Proteína , RNA Bacteriano/química , RNA Bacteriano/metabolismo , Staphylococcus/isolamento & purificação
10.
BMC Genomics ; 21(1): 148, 2020 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-32046653

RESUMO

BACKGROUND: RNA-Seq is the preferred method to explore transcriptomes and to estimate differential gene expression. When an organism has a well-characterized and annotated genome, reads obtained from RNA-Seq experiments can be directly mapped to that genome to estimate the number of transcripts present and relative expression levels of these transcripts. However, for unknown genomes, de novo assembly of RNA-Seq reads must be performed to generate a set of contigs that represents the transcriptome. These contig sets contain multiple transcripts, including immature mRNAs, spliced transcripts and allele variants, as well as products of close paralogs or gene families that can be difficult to distinguish. Thus, tools are needed to select a set of less redundant contigs to represent the transcriptome for downstream analyses. Here we describe the development of Compacta to produce contig sets from de novo assemblies. RESULTS: Compacta is a fast and flexible computational tool that allows selection of a representative set of contigs from de novo assemblies. Using a graph-based algorithm, Compacta groups contigs into clusters based on the proportion of shared reads. The user can determine the minimum coverage of the contigs to be clustered, as well as a threshold for the proportion of shared reads in the clustered contigs, thus providing a dynamic range of transcriptome compression that can be adapted according to experimental aims. We compared the performance of Compacta against state of the art clustering algorithms on assemblies from Arabidopsis, mouse and mango, and found that Compacta yielded more rapid results and had competitive precision and recall ratios. We describe and demonstrate a pipeline to tailor Compacta parameters to specific experimental aims. CONCLUSIONS: Compacta is a fast and flexible algorithm for the determination of optimum contig sets that represent the transcriptome for downstream analyses.


Assuntos
Mapeamento de Sequências Contíguas/métodos , RNA-Seq/métodos , Software , Algoritmos , Arabidopsis/genética , Análise por Conglomerados
11.
Sci Data ; 7(1): 66, 2020 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-32094352

RESUMO

Vulnerable populations of wild yak (Bos mutus), the wild ancestral species of domestic yak, survive in extremely cold, harsh and oxygen-poor regions of the Qinghai-Tibetan Plateau (QTP) and adjacent high-altitude regions. In this study, we sequenced and assembled its genome de novo. In total, six different insert-size libraries were sequenced, and 662 Gb of clean data were generated. The assembled wild yak genome is 2.83 Gb in length, with an N50 contig size of 63.2 kb and a scaffold size of 16.3 Mb. BUSCO assessment indicated that 93.8% of the highly conserved mammal genes were completely present in the genome assembly. Annotation of the wild yak genome assembly identified 1.41 Gb (49.65%) of repetitive sequences and a total of 22,910 protein-coding genes, including 20,660 (90.18%) annotated with functional terms. This first construction of the wild yak genome provides a variable genetic resource that will facilitate further study of the genetic diversity of bovine species and accelerate yak breeding efforts.


Assuntos
Bovinos/genética , Genoma , Animais , Animais Selvagens/genética , Mapeamento de Sequências Contíguas , Biblioteca Gênica , Análise de Sequência de DNA
12.
BMC Genomics ; 21(1): 108, 2020 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-32005147

RESUMO

BACKGROUND: Siberian musk deer, one of the seven species, is distributed in coniferous forests of Asia. Worldwide, the population size of Siberian musk deer is threatened by severe illegal poaching for commercially valuable musk and meat, habitat losses, and forest fire. At present, this species is categorized as Vulnerable on the IUCN Red List. However, the genetic information of Siberian musk deer is largely unexplored. RESULTS: Here, we produced 3.10 Gb draft assembly of wild Siberian musk deer with a contig N50 of 29,145 bp and a scaffold N50 of 7,955,248 bp. We annotated 19,363 protein-coding genes and estimated 44.44% of the genome to be repetitive. Our phylogenetic analysis reveals that wild Siberian musk deer is closer to Bovidae than to Cervidae. Comparative analyses showed that the genetic features of Siberian musk deer adapted in cold and high-altitude environments. We sequenced two additional genomes of Siberian musk deer constructed demographic history indicated that changes in effective population size corresponded with recent glacial epochs. Finally, we identified several candidate genes that may play a role in the musk secretion based on transcriptome analysis. CONCLUSIONS: Here, we present a high-quality draft genome of wild Siberian musk deer, which will provide a valuable genetic resource for further investigations of this economically important musk deer.


Assuntos
Mapeamento de Sequências Contíguas/veterinária , Cervos/genética , Perfilação da Expressão Gênica/veterinária , Sequenciamento Completo do Genoma/veterinária , Adaptação Biológica , Animais , Cervos/classificação , Evolução Molecular , Feminino , Tamanho do Genoma , Anotação de Sequência Molecular , Filogenia , Densidade Demográfica , Análise de Sequência de RNA/veterinária
13.
Bioinformatics ; 36(8): 2359-2364, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31913460

RESUMO

MOTIVATION: Linkage mapping provides a practical way to anchor de novo genome assemblies into chromosomes and to detect chimeric or otherwise erroneous contigs. Such anchoring improves with higher number of markers and individuals, as long as the mapping software can handle all the information. Recent software Lep-MAP3 can robustly construct linkage maps for millions of genotyped markers and on thousands of individuals, providing optimal maps for genome anchoring. For such large datasets, automated and robust genome anchoring tool is especially valuable and can significantly reduce intensive computational and manual work involved. RESULTS: Here, we present a software Lep-Anchor (LA) to anchor genome assemblies automatically using dense linkage maps. As the main novelty, it takes into account the uncertainty of the linkage map positions caused by low recombination regions, cross type or poor mapping data quality. Furthermore, it can automatically detect and cut chimeric contigs, and use contig-contig, single read or alternative genome assembly alignments as additional information on contig order and orientations and to collapse haplotype contigs. We demonstrate the performance of LA using real data and show that it outperforms ALLMAPS on anchoring completeness and speed. Accuracy-wise LA and ALLMAPS are about equal, but at the expense of lower completeness of ALLMAPS. The software Chromonomer was faster than the other two methods but has major limitations and is lower in accuracy. We also show that with additional information, such as contig-contig and read alignments, the anchoring completeness can be improved by up to 70% without significant loss in accuracy. Based on simulated data, we conclude that the anchoring accuracy can be improved by utilizing information about map position uncertainty. Accuracy is the rate of contigs in correct orientation and completeness is the number contigs with inferred orientation. AVAILABILITY AND IMPLEMENTATION: Lep-Anchor is available with the source code under GNU general public license from http://sourceforge.net/projects/lep-anchor. All the scripts and code used to produce the reported results are included with Lep-Anchor.


Assuntos
Genoma , Software , Mapeamento Cromossômico , Mapeamento de Sequências Contíguas , Ligação Genética , Haploidia , Humanos
14.
Arch Virol ; 165(1): 227-231, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31659444

RESUMO

Three viral contig sequences, which represented complete genome of a novel virus with three dsRNAs of 1,712 nucleotides (nt) (dsRNA1), 1,504 nt (dsRNA2) and 1,353 nt (dsRNA3), were found in tea-oil camellia plants by high-throughput sequencing analysis. The three dsRNAs were re-sequenced by RT-PCR cloning. The largest dsRNA, dsRNA1, had a single open reading frame (ORF) that encoded a putative 52.7-kDa protein of a putative viral RNA-dependent RNA polymerase (RdRp). DsRNA2 and dsRNA3 were predicted to encode putative capsid proteins (CPs) of 40.47 kDa and 40.59 kDa, respectively. The virus, which is provisionally named "tea-oil camellia deltapartitivirus 1",  shared amino acid sequence itentities of 36.09-69.18% with members of the genus Deltapartitivirus on RdRp. Phylogenetic analysis based on RdRp also placed the new virus and other deltapartitiviruses together in a group, suggesting that this virus should be considered a new member of the genus Deltapartitivirus.


Assuntos
Camellia/virologia , Vírus de RNA/genética , Sequenciamento Completo do Genoma/métodos , Mapeamento de Sequências Contíguas , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Fases de Leitura Aberta , Filogenia , Vírus de RNA/classificação , RNA de Cadeia Dupla/genética
15.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31794015

RESUMO

BACKGROUND: Long DNA reads produced by single-molecule and pore-based sequencers are more suitable for assembly and structural variation discovery than short-read DNA fragments. For de novo assembly, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are the favorite options. However, PacBio's SMRT sequencing is expensive for a full human genome assembly and costs more than $40,000 US for 30× coverage as of 2019. ONT PromethION sequencing, on the other hand, is 1/12 the price of PacBio for the same coverage. This study aimed to compare the cost-effectiveness of ONT PromethION and PacBio's SMRT sequencing in relation to the quality. FINDINGS: We performed whole-genome de novo assemblies and comparison to construct an improved version of KOREF, the Korean reference genome, using sequencing data produced by PromethION and PacBio. With PromethION, an assembly using sequenced reads with 64× coverage (193 Gb, 3 flowcell sequencing) resulted in 3,725 contigs with N50s of 16.7 Mb and a total genome length of 2.8 Gb. It was comparable to a KOREF assembly constructed using PacBio at 62× coverage (188 Gb, 2,695 contigs, and N50s of 17.9 Mb). When we applied Hi-C-derived long-range mapping data, an even higher quality assembly for the 64× coverage was achieved, resulting in 3,179 scaffolds with an N50 of 56.4 Mb. CONCLUSION: The pore-based PromethION approach provided a high-quality chromosome-scale human genome assembly at a low cost with long maximum contig and scaffold lengths and was more cost-effective than PacBio at comparable quality measurements.


Assuntos
Cromossomos Humanos/genética , Mapeamento de Sequências Contíguas/economia , Sequenciamento Completo do Genoma/métodos , Mapeamento de Sequências Contíguas/métodos , Análise Custo-Benefício , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , República da Coreia , Imagem Individual de Molécula , Sequenciamento Completo do Genoma/economia
16.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31816089

RESUMO

BACKGROUND: We report an improved assembly and scaffolding of the European pear (Pyrus communis L.) genome (referred to as BartlettDHv2.0), obtained using a combination of Pacific Biosciences RSII long-read sequencing, Bionano optical mapping, chromatin interaction capture (Hi-C), and genetic mapping. The sample selected for sequencing is a double haploid derived from the same "Bartlett" reference pear that was previously sequenced. Sequencing of di-haploid plants makes assembly more tractable in highly heterozygous species such as P. communis. FINDINGS: A total of 496.9 Mb corresponding to 97% of the estimated genome size were assembled into 494 scaffolds. Hi-C data and a high-density genetic map allowed us to anchor and orient 87% of the sequence on the 17 pear chromosomes. Approximately 50% (247 Mb) of the genome consists of repetitive sequences. Gene annotation confirmed the presence of 37,445 protein-coding genes, which is 13% fewer than previously predicted. CONCLUSIONS: We showed that the use of a doubled-haploid plant is an effective solution to the problems presented by high levels of heterozygosity and duplication for the generation of high-quality genome assemblies. We present a high-quality chromosome-scale assembly of the European pear Pyrus communis and demostrate its high degree of synteny with the genomes of Malus x Domestica and Pyrus x bretschneideri.


Assuntos
Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas/métodos , Pyrus/genética , Tamanho do Genoma , Haploidia , Anotação de Sequência Molecular , Melhoramento Vegetal , Análise de Sequência de DNA , Sintenia
17.
BMC Bioinformatics ; 20(Suppl 9): 367, 2019 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-31757198

RESUMO

MOTIVATION: Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths. RESULTS: In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas , Metagenoma , Metagenômica , Probabilidade , Estatística como Assunto , Análise por Conglomerados , Bases de Dados Genéticas , Microbiota/genética
18.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31782791

RESUMO

BACKGROUND: Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10-13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing. RESULTS: Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2-6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 "subgenomes" after their divergence ∼3.8-4.6 million years ago and reveals single-nucleotide variants that may underlie their differences. CONCLUSIONS: This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Glucosiltransferases/genética , Fenilalanina Amônia-Liase/genética , Saccharum/crescimento & desenvolvimento , Biomassa , Produtos Agrícolas/genética , Produtos Agrícolas/crescimento & desenvolvimento , Variação Genética , Tamanho do Genoma , Genoma de Planta , Família Multigênica , Proteínas de Plantas/genética , Poliploidia , Regiões Promotoras Genéticas , Saccharum/genética
19.
Nat Commun ; 10(1): 5360, 2019 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-31767853

RESUMO

The abundant repetitive sequences in complex eukaryotic genomes cause fragmented assemblies, which lose value as reference genomes, often due to incomplete gene sequences and unanchored or mispositioned contigs on chromosomes. Here we report a genome assembly method HERA, which resolves repeats efficiently by constructing a connection graph from an overlap graph. We test HERA on the genomes of rice, maize, human, and Tartary buckwheat with single-molecule sequencing and mapping data. HERA correctly assembles most of the previously unassembled regions, resulting in dramatically improved, highly contiguous genome assemblies with newly assembled gene sequences. For example, the maize contig N50 size reaches 61.2 Mb and the Tartary buckwheat genome comprises only 20 contigs. HERA can also be used to fill gaps and fix errors in reference genomes. The application of HERA will greatly improve the quality of new or existing assemblies of complex genomes.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Cromossomos de Plantas/genética , Genoma de Planta/genética , Humanos , Oryza/genética , Reprodutibilidade dos Testes , Zea mays/genética
20.
G3 (Bethesda) ; 9(11): 3547-3554, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31540974

RESUMO

Although normally a harmless commensal, Candida albicans, it is also one of the most common causes of bloodstream infections in the U.S. Candida albicans has long been considered an obligate commensal, however, recent studies suggest it can live outside animal hosts. Here, we have generated PacBio sequences and phased genome assemblies for three C. albicans strains from oak trees (NCYC 4144, NCYC 4145, and NCYC 4146). PacBio datasets are high depth (over 400 fold coverage) and more than half of the sequencing data are contained in reads longer than 15 kb. Primary assemblies showed high contiguity with several chromosomes for each strain recovered as single contigs, and greater than half of the alternative haplotype sequence was assembled in haplotigs at least 174 kb long. Using these assemblies we were able to identify structural polymorphisms, including a polymorphic inversion over 100 kb in length. These results show that phased de novo diploid assemblies for C. albicans can enable the study of genomic variation within and among strains of an important fungal pathogen.


Assuntos
Candida albicans/genética , Genoma Fúngico , Quercus/microbiologia , Candida albicans/isolamento & purificação , Centrômero/genética , Mapeamento de Sequências Contíguas , Diploide , Haplótipos , Análise de Sequência de DNA , Telômero/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA