Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.801
Filtrar
1.
BMC Bioinformatics ; 25(1): 241, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39014300

RESUMO

BACKGROUND: Using next-generation sequencing technologies, scientists can sequence complex microbial communities directly from the environment. Significant insights into the structure, diversity, and ecology of microbial communities have resulted from the study of metagenomics. The assembly of reads into longer contigs, which are then binned into groups of contigs that correspond to different species in the metagenomic sample, is a crucial step in the analysis of metagenomics. It is necessary to organize these contigs into operational taxonomic units (OTUs) for further taxonomic profiling and functional analysis. For binning, which is synonymous with the clustering of OTUs, the tetra-nucleotide frequency (TNF) is typically utilized as a compositional feature for each OTU. RESULTS: In this paper, we present AFIT, a new l-mer statistic vector for each contig, and AFITBin, a novel method for metagenomic binning based on AFIT and a matrix factorization method. To evaluate the performance of the AFIT vector, the t-SNE algorithm is used to compare species clustering based on AFIT and TNF information. In addition, the efficacy of AFITBin is demonstrated on both simulated and real datasets in comparison to state-of-the-art binning methods such as MetaBAT 2, MaxBin 2.0, CONCOT, MetaCon, SolidBin, BusyBee Web, and MetaBinner. To further analyze the performance of the purposed AFIT vector, we compare the barcodes of the AFIT vector and the TNF vector. CONCLUSION: The results demonstrate that AFITBin shows superior performance in taxonomic identification compared to existing methods, leveraging the AFIT vector for improved results in metagenomic binning. This approach holds promise for advancing the analysis of metagenomic data, providing more reliable insights into microbial community composition and function. AVAILABILITY: A python package is available at: https://github.com/SayehSobhani/AFITBin .


Assuntos
Algoritmos , Metagenômica , Metagenômica/métodos , Nucleotídeos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Microbiota/genética , Análise de Sequência de DNA/métodos , Análise por Conglomerados , Mapeamento de Sequências Contíguas/métodos , Metagenoma/genética
2.
BMC Genomics ; 24(1): 117, 2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36927511

RESUMO

BACKGROUND: Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS: HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS: Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.


Assuntos
Biodiversidade , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Genômica/métodos , Genômica/normas , Genômica/tendências , Insetos/classificação , Insetos/genética , Fibroínas/genética , Mapeamento de Sequências Contíguas , Genoma de Inseto/genética , Animais , Bases de Dados de Ácidos Nucleicos , Reprodutibilidade dos Testes , Metanálise como Assunto , Conjuntos de Dados como Assunto , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Plantas/genética , Genoma de Planta/genética
3.
J Glob Antimicrob Resist ; 30: 155-162, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35671989

RESUMO

BACKGROUND: Colibacillosis, caused by avian pathogenic Escherichia coli (APEC), is one of the most significant infectious diseases affecting poultry worldwide. OBJECTIVES: This study aimed to determine the genomic diversity, virulence factor genes (VFGs), and antimicrobial resistance genes (ARGs) in the APEC MTR_BAU02 strain isolated from a layer chicken using whole-genome sequencing (WGS). METHODS: Paired-end (2 × 250) WGS was performed using Illumina MiSeq sequencer (Illumina, San Diego, CA) and de novo assembly was performed using SPAdes. Core genome multilocus sequence typing (cgMLST) analysis between APEC MTR_BAU02 and all of the ST1196 E. coli strains retrieved from the National Center for Biotechnology Information (NCBI) GenBank database was performed using the BacWGSTdb 2.0 server. We utilized different databases to detect ARGs, VFGs, and genomic functional features of the APEC MTR_BAU02 strain. RESULTS: The complete genome of APEC MTR_BAU02 consists of 94 contigs comprising 4,924,680 bp (51.1% guanine-cytosine [GC] content), including 4681 protein-coding sequences, one chromosome, and one plasmid, and was assigned to ST1196. The closest relatives of APEC MTR_BAU02 were four isolates originating from human clinical specimens (diarrhetic stool) in Bangladesh and two clinical isolates originating from chicken in India, which differed by 694 core genome multilocus sequence typing (cgMLST) alleles. One hundred and twenty-two ARGs and 92 VFGs were identified in the APEC MTR_BAU02 genome. Metabolic functional annotations detected 380 SEED subsystems including genes coding for carbohydrate metabolism, protein metabolism, cofactors, vitamins, prosthetic groups and pigments, respiration, membrane transport, stress response, motility and chemotaxis, and virulence, disease, and defense. CONCLUSION: This study reports the genome sequence of a multidrug-resistant APEC strain isolated from layer birds in Bangladesh. The ARGs and VFGs, widespread in APEC MTR_BAU02, are similar to those found in human isolates, and highlight the growing threat of antimicrobial resistance in both poultry and humans.


Assuntos
Farmacorresistência Bacteriana Múltipla , Infecções por Escherichia coli , Escherichia coli , Doenças das Aves Domésticas , Animais , Bangladesh , Galinhas , Mapeamento de Sequências Contíguas , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Escherichia coli/patogenicidade , Infecções por Escherichia coli/veterinária , Fazendas , Variação Genética , Genoma Bacteriano , Genômica , Humanos , Doenças das Aves Domésticas/microbiologia , Virulência/genética , Fatores de Virulência/genética
4.
Nucleic Acids Res ; 50(13): e76, 2022 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-35536293

RESUMO

As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.


Assuntos
Microbiologia Ambiental , Metagenômica , Software , Fluxo de Trabalho , Mapeamento de Sequências Contíguas , Conjuntos de Dados como Assunto , Genoma , Metagenoma , Análise de Célula Única/métodos
5.
Bioinformatics ; 38(10): 2675-2682, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561180

RESUMO

MOTIVATION: Crucial to the correctness of a genome assembly is the accuracy of the underlying scaffolds that specify the orders and orientations of contigs together with the gap distances between contigs. The current methods construct scaffolds based on the alignments of 'linking' reads against contigs. We found that some 'optimal' alignments are mistaken due to factors such as the contig boundary effect, particularly in the presence of repeats. Occasionally, the incorrect alignments can even overwhelm the correct ones. The detection of the incorrect linking information is challenging in any existing methods. RESULTS: In this study, we present a novel scaffolding method RegScaf. It first examines the distribution of distances between contigs from read alignment by the kernel density. When multiple modes are shown in a density, orientation-supported links are grouped into clusters, each of which defines a linking distance corresponding to a mode. The linear model parameterizes contigs by their positions on the genome; then each linking distance between a pair of contigs is taken as an observation on the difference of their positions. The parameters are estimated by minimizing a global loss function, which is a version of trimmed sum of squares. The least trimmed squares estimate has such a high breakdown value that it can automatically remove the mistaken linking distances. The results on both synthetic and real datasets demonstrate that RegScaf outperforms some popular scaffolders, especially in the accuracy of gap estimates by substantially reducing extremely abnormal errors. Its strength in resolving repeat regions is exemplified by a real case. Its adaptability to large genomes and TGS long reads is validated as well. AVAILABILITY AND IMPLEMENTATION: RegScaf is publicly available at https://github.com/lemontealala/RegScaf.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Mapeamento de Sequências Contíguas/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
6.
Genome Biol ; 23(1): 29, 2022 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-35057847

RESUMO

Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Viral , Haplótipos , SARS-CoV-2/genética , Software , Benchmarking , COVID-19/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SARS-CoV-2/classificação , Análise de Sequência de DNA
7.
Nat Commun ; 12(1): 6858, 2021 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-34824214

RESUMO

Muntjac deer have experienced drastic karyotype changes during their speciation, making it an ideal model for studying mechanisms and functional consequences of mammalian chromosome evolution. Here we generated chromosome-level genomes for Hydropotes inermis (2n = 70), Muntiacus reevesi (2n = 46), female and male M. crinifrons (2n = 8/9) and a contig-level genome for M. gongshanensis (2n = 8/9). These high-quality genomes combined with Hi-C data allowed us to reveal the evolution of 3D chromatin architectures during mammalian chromosome evolution. We find that the chromosome fusion events of muntjac species did not alter the A/B compartment structure and topologically associated domains near the fusion sites, but new chromatin interactions were gradually established across the fusion sites. The recently borne neo-Y chromosome of M. crinifrons, which underwent male-specific inversions, has dramatically restructured chromatin compartments, recapitulating the early evolution of canonical mammalian Y chromosomes. We also reveal that a complex structure containing unique centromeric satellite, truncated telomeric and palindrome repeats might have mediated muntjacs' recurrent chromosome fusions. These results provide insights into the recurrent chromosome tandem fusion in muntjacs, early evolution of mammalian sex chromosomes, and reveal how chromosome rearrangements can reshape the 3D chromatin regulatory conformations during species evolution.


Assuntos
Aberrações Cromossômicas/veterinária , Cromossomos de Mamíferos/genética , Cervo Muntjac/genética , Animais , Cromatina/genética , Aberrações Cromossômicas/estatística & dados numéricos , Mapeamento de Sequências Contíguas , Cervos/classificação , Cervos/genética , Demografia , Evolução Molecular , Feminino , Genoma/genética , Masculino , Cervo Muntjac/classificação , Filogenia , Cromossomos Sexuais/genética , Sintenia
8.
PLoS Biol ; 19(10): e3001428, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34644300

RESUMO

To overcome CRISPR-Cas defense systems, many phages and mobile genetic elements (MGEs) encode CRISPR-Cas inhibitors called anti-CRISPRs (Acrs). Nearly all characterized Acrs directly bind Cas proteins to inactivate CRISPR immunity. Here, using functional metagenomic selection, we describe AcrIIA22, an unconventional Acr found in hypervariable genomic regions of clostridial bacteria and their prophages from human gut microbiomes. AcrIIA22 does not bind strongly to SpyCas9 but nonetheless potently inhibits its activity against plasmids. To gain insight into its mechanism, we obtained an X-ray crystal structure of AcrIIA22, which revealed homology to PC4-like nucleic acid-binding proteins. Based on mutational analyses and functional assays, we deduced that acrIIA22 encodes a DNA nickase that relieves torsional stress in supercoiled plasmids. This may render them less susceptible to SpyCas9, which uses free energy from negative supercoils to form stable R-loops. Modifying DNA topology may provide an additional route to CRISPR-Cas resistance in phages and MGEs.


Assuntos
Proteínas de Bactérias/metabolismo , Proteína 9 Associada à CRISPR/metabolismo , DNA/metabolismo , Proteínas de Bactérias/química , Mapeamento de Sequências Contíguas , DNA Super-Helicoidal/metabolismo , Genoma Bacteriano , Metagenômica , Plasmídeos , Prófagos/genética , Multimerização Proteica
9.
BMC Bioinformatics ; 22(1): 533, 2021 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-34717539

RESUMO

BACKGROUND: Optical maps record locations of specific enzyme recognition sites within long genome fragments. This long-distance information enables aligning genome assembly contigs onto optical maps and ordering contigs into scaffolds. The generated scaffolds, however, often contain a large amount of gaps. To fill these gaps, a feasible way is to search genome assembly graph for the best-matching contig paths that connect boundary contigs of gaps. The combination of searching and evaluation procedures might be "searching followed by evaluation", which is infeasible for long gaps, or "searching by evaluation", which heavily relies on heuristics and thus usually yields unreliable contig paths. RESULTS: We here report an accurate and efficient approach to filling gaps of genome scaffolds with aids of optical maps. Using simulated data from 12 species and real data from 3 species, we demonstrate the successful application of our approach in gap filling with improved accuracy and completeness of genome scaffolds. CONCLUSION: Our approach applies a sequential Bayesian updating technique to measure the similarity between optical maps and candidate contig paths. Using this similarity to guide path searching, our approach achieves higher accuracy than the existing "searching by evaluation" strategy that relies on heuristics. Furthermore, unlike the "searching followed by evaluation" strategy enumerating all possible paths, our approach prunes the unlikely sub-paths and extends the highly-probable ones only, thus significantly increasing searching efficiency.


Assuntos
Algoritmos , Genoma , Teorema de Bayes , Mapeamento de Sequências Contíguas , Mapeamento por Restrição , Análise de Sequência de DNA
10.
Sci Rep ; 11(1): 15592, 2021 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-34341414

RESUMO

A near-complete diploid nuclear genome and accompanying circular mitochondrial and chloroplast genomes have been assembled from the elite commercial diatom species Nitzschia inconspicua. The 50 Mbp haploid size of the nuclear genome is nearly double that of model diatom Phaeodactylum tricornutum, but 30% smaller than closer relative Fragilariopsis cylindrus. Diploid assembly, which was facilitated by low levels of allelic heterozygosity (2.7%), included 14 candidate chromosome pairs composed of long, syntenic contigs, covering 93% of the total assembly. Telomeric ends were capped with an unusual 12-mer, G-rich, degenerate repeat sequence. Predicted proteins were highly enriched in strain-specific marker domains associated with cell-surface adhesion, biofilm formation, and raphe system gliding motility. Expanded species-specific families of carbonic anhydrases suggest potential enhancement of carbon concentration efficiency, and duplicated glycolysis and fatty acid synthesis pathways across cytosolic and organellar compartments may enhance peak metabolic output, contributing to competitive success over other organisms in mixed cultures. The N. inconspicua genome delivers a robust new reference for future functional and transcriptomic studies to illuminate the physiology of benthic pennate diatoms and harness their unique adaptations to support commercial algae biomass and bioproduct production.


Assuntos
Biomassa , Diatomáceas/genética , Diploide , Genoma , Anidrases Carbônicas/genética , Mapeamento de Sequências Contíguas , Diatomáceas/classificação , Tamanho do Genoma , Genoma de Cloroplastos , Genoma Mitocondrial , Fases de Leitura Aberta/genética , Filogenia , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA , Sintenia/genética
11.
Nucleic Acids Res ; 49(20): e117, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34417615

RESUMO

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Análise de Sequência de DNA/métodos , Software , Bactérias , Humanos , Funções Verossimilhança
12.
Theor Appl Genet ; 134(11): 3577-3594, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34365519

RESUMO

KEY MESSAGE: We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today's genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules.


Assuntos
Chenopodium quinoa/genética , Variação Genética , Genoma de Planta , Arabidopsis/genética , Bolívia , Chile , Mapeamento de Sequências Contíguas , Marcadores Genéticos , Genética Populacional , Haplótipos , Peru
13.
PLoS Genet ; 17(8): e1009705, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34437539

RESUMO

Whole-genome duplication and genome compaction are thought to have played important roles in teleost fish evolution. Ayu (or sweetfish), Plecoglossus altivelis, belongs to the superorder Stomiati, order Osmeriformes. Stomiati is phylogenetically classified as sister taxa of Neoteleostei. Thus, ayu holds an important position in the fish tree of life. Although ayu is economically important for the food industry and recreational fishing in Japan, few genomic resources are available for this species. To address this problem, we produced a draft genome sequence of ayu by whole-genome shotgun sequencing and constructed linkage maps using a genotyping-by-sequencing approach. Syntenic analyses of ayu and other teleost fish provided information about chromosomal rearrangements during the divergence of Stomiati, Protacanthopterygii and Neoteleostei. The size of the ayu genome indicates that genome compaction occurred after the divergence of the family Osmeridae. Ayu has an XX/XY sex-determination system for which we identified sex-associated loci by a genome-wide association study by genotyping-by-sequencing and whole-genome resequencing using wild populations. Genome-wide association mapping using wild ayu populations revealed three sex-linked scaffolds (total, 2.03 Mb). Comparison of whole-genome resequencing mapping coverage between males and females identified male-specific regions in sex-linked scaffolds. A duplicate copy of the anti-Müllerian hormone type-II receptor gene (amhr2bY) was found within these male-specific regions, distinct from the autosomal copy of amhr2. Expression of the Y-linked amhr2 gene was male-specific in sox9b-positive somatic cells surrounding germ cells in undifferentiated gonads, whereas autosomal amhr2 transcripts were detected in somatic cells in sexually undifferentiated gonads of both genetic males and females. Loss-of-function mutation for amhr2bY induced male to female sex reversal. Taken together with the known role of Amh and Amhr2 in sex differentiation, these results indicate that the paralog of amhr2 on the ayu Y chromosome determines genetic sex, and the male-specific amh-amhr2 pathway is critical for testicular differentiation in ayu.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Osmeriformes/genética , Receptores de Peptídeos/genética , Receptores de Fatores de Crescimento Transformadores beta/genética , Sequenciamento Completo do Genoma/métodos , Animais , Feminino , Proteínas de Peixes/genética , Mutação com Perda de Função , Masculino , Caracteres Sexuais , Sintenia
14.
Sci Rep ; 11(1): 14944, 2021 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-34294764

RESUMO

Picrorhiza kurrooa is an endangered medicinal herb which is distributed across the Himalayan region at an altitude between 3000-5000 m above mean sea level. The medicinal properties of P. kurrooa are attributed to monoterpenoid picrosides present in leaf, rhizome and root of the plant. However, no genomic information is currently available for P. kurrooa, which limits our understanding about its molecular systems and associated responses. The present study brings the first assembled draft genome of P. kurrooa by using 227 Gb of raw data generated by Illumina and PacBio RS II sequencing platforms. The assembled genome has a size of n = ~ 1.7 Gb with 12,924 scaffolds. Four pronged assembly quality validations studies, including experimentally reported ESTs mapping and directed sequencing of the assembled contigs, confirmed high reliability of the assembly. About 76% of the genome is covered by complex repeats alone. Annotation revealed 24,798 protein coding and 9789 non-coding genes. Using the assembled genome, a total of 710 miRNAs were discovered, many of which were found responsible for molecular response against temperature changes. The miRNAs and targets were validated experimentally. The availability of draft genome sequence will aid in genetic improvement and conservation of P. kurrooa. Also, this study provided an efficient approach for assembling complex genomes while dealing with repeats when regular assemblers failed to progress due to repeats.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Picrorhiza/genética , Análise de Sequência de DNA/métodos , Espécies em Perigo de Extinção , Tamanho do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Plantas Medicinais/genética
16.
Genome Biol ; 22(1): 202, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34253237

RESUMO

GRIDSS2 is the first structural variant caller to explicitly report single breakends-breakpoints in which only one side can be unambiguously determined. By treating single breakends as a fundamental genomic rearrangement signal on par with breakpoints, GRIDSS2 can explain 47% of somatic centromere copy number changes using single breakends to non-centromere sequence. On a cohort of 3782 deeply sequenced metastatic cancers, GRIDSS2 achieves an unprecedented 3.1% false negative rate and 3.3% false discovery rate and identifies a novel 32-100 bp duplication signature. GRIDSS2 simplifies complex rearrangement interpretation through phasing of structural variants with 16% of somatic calls phasable using paired-end sequencing.


Assuntos
Pontos de Quebra do Cromossomo , Variações do Número de Cópias de DNA , Neoplasias/genética , Software , Mapeamento de Sequências Contíguas , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma Humano , Genômica , Humanos , Metástase Neoplásica , Neoplasias/patologia
17.
Genome Biol ; 22(1): 214, 2021 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-34311761

RESUMO

We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.


Assuntos
Algoritmos , Genoma Bacteriano , Metagenoma , Consórcios Microbianos/genética , Software , Teorema de Bayes , Mapeamento de Sequências Contíguas , Haplótipos , Metagenômica/métodos , Análise de Sequência de DNA
18.
Sci Rep ; 11(1): 15345, 2021 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-34321531

RESUMO

The Eurasian plant Stipa capillata is the most widespread species within feather grasses. Many taxa of the genus are dominants in steppe plant communities and can be used for their classification and in studies related to climate change. Moreover, some species are of economic importance mainly as fodder plants and can be used for soil remediation processes. Although large-scale molecular data has begun to appear, there is still no complete or draft genome for any Stipa species. Thus, here we present a single-molecule long-read sequencing dataset generated using the Pacific Biosciences Sequel System. A draft genome of about 1004 Mb was obtained with a contig N50 length of 351 kb. Importantly, here we report 81,224 annotated protein-coding genes, present 77,614 perfect and 58 unique imperfect SSRs, reveal the putative allopolyploid nature of S. capillata, investigate the evolutionary history of the genus, demonstrate structural heteroplasmy of the chloroplast genome and announce for the first time the mitochondrial genome in Stipa. The assembled nuclear, mitochondrial and chloroplast genomes provide a significant source of genetic data for further works on phylogeny, hybridisation and population studies within Stipa and the grass family Poaceae.


Assuntos
Genoma de Cloroplastos , Genoma Mitocondrial , Genoma de Planta , Proteínas de Plantas/genética , Poaceae/genética , Mapeamento de Sequências Contíguas , Europa (Continente) , Tamanho do Genoma , Heteroplasmia , Repetições de Microssatélites , Filogenia , Melhoramento Vegetal/métodos , Proteínas de Plantas/classificação , Ploidias , Poaceae/classificação
19.
PLoS Comput Biol ; 17(6): e1009078, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34153026

RESUMO

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).


Assuntos
Mapeamento de Sequências Contíguas/estatística & dados numéricos , Alinhamento de Sequência/estatística & dados numéricos , Software , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Programação Linear , Análise de Sequência de DNA
20.
Genes (Basel) ; 12(5)2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-34066304

RESUMO

Trachidermus fasciatus is a roughskin sculpin fish widespread across the coastal areas of East Asia. Due to environmental destruction and overfishing, the population of this species is under threat. In order to protect this endangered species, it is important to have the genome sequenced. Reference genomes are essential for studying population genetics, domestic farming, and genetic resource protection. However, currently, no reference genome is available for Trachidermus fasciatus, and this has greatly hindered the research on this species. In this study, we integrated nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C methods to thoroughly assemble the Trachidermus fasciatus genome. Our results provided a chromosome-level high-quality genome assembly with a predicted genome size of 542.6 Mbp (2n = 40) and a scaffold N50 of 24.9 Mbp. The BUSCO value for genome assembly completeness was higher than 96%, and the single-base accuracy was 99.997%. Based on EVM-StringTie genome annotation, a total of 19,147 protein-coding genes were identified, including 35,093 mRNA transcripts. In addition, a novel gene-finding strategy named RNR was introduced, and in total, 51 (82) novel genes (transcripts) were identified. Lastly, we present here the first reference genome for Trachidermus fasciatus; this sequence is expected to greatly facilitate future research on this species.


Assuntos
Peixes/genética , Genoma , Animais , Mapeamento de Sequências Contíguas , Proteínas de Peixes/genética , Sequenciamento por Nanoporos , RNA Mensageiro/genética , Sequenciamento Completo do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...