Pesquisa | Portal Regional da BVS

1.

GAEP: a comprehensive genome assembly evaluating pipeline.

Zhang, Yong; Lu, Hong-Wei; Ruan, Jue.

J Genet Genomics ; 2023 May 26.

Artigo em Inglês | MEDLINE | ID: mdl-37245652

RESUMO

With the rapid development of sequencing technologies, especially the maturity of third-generation sequencing technologies, there has been a significant increase in the number and quality of published genome assemblies. The emergence of these high-quality genomes has raised higher requirements for genome evaluation. Although numerous computational methods have been developed to evaluate assembly quality from various perspectives, the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality. To address this issue, we have developed the Genome Assembly Evaluating Pipeline (GAEP), which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives, including continuity, completeness, and correctness. Additionally, GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy, which performs well in our testing. GAEP is publicly available at https://github.com/zy-optimistic/GAEP under the GPL3.0 License. With GAEP, users can quickly obtain accurate and reliable evaluation results, facilitating the comparison and selection of high-quality genome assemblies.

2.

Recognition of the CCT5 di-Glu degron by CRL4^DCAF12 is dependent on TRiC assembly.

Pla-Prats, Carlos; Cavadini, Simone; Kempf, Georg; Thomä, Nicolas H.

EMBO J ; 42(4): e112253, 2023 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-36715408

RESUMO

Assembly Quality Control (AQC) E3 ubiquitin ligases target incomplete or incorrectly assembled protein complexes for degradation. The CUL4-RBX1-DDB1-DCAF12 (CRL4DCAF12 ) E3 ligase preferentially ubiquitinates proteins that carry a C-terminal double glutamate (di-Glu) motif. Reported CRL4DCAF12 di-Glu-containing substrates include CCT5, a subunit of the TRiC chaperonin. How DCAF12 engages its substrates and the functional relationship between CRL4DCAF12 and CCT5/TRiC is currently unknown. Here, we present the cryo-EM structure of the DDB1-DCAF12-CCT5 complex at 2.8 Å resolution. DCAF12 serves as a canonical WD40 DCAF substrate receptor and uses a positively charged pocket at the center of the ß-propeller to bind the C-terminus of CCT5. DCAF12 specifically reads out the CCT5 di-Glu side chains, and contacts other visible degron amino acids through Van der Waals interactions. The CCT5 C-terminus is inaccessible in an assembled TRiC complex, and functional assays demonstrate that DCAF12 binds and ubiquitinates monomeric CCT5, but not CCT5 assembled into TRiC. Our biochemical and structural results suggest a previously unknown role for the CRL4DCAF12 E3 ligase in overseeing the assembly of a key cellular complex.

Assuntos

Proteínas de Transporte , Ubiquitina-Proteína Ligases , Proteínas de Transporte/metabolismo , Ubiquitina-Proteína Ligases/metabolismo , Chaperonina com TCP-1/metabolismo

3.

BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data.

Steenwyk, Jacob L; Buida, Thomas J; Gonçalves, Carla; Goltz, Dayna C; Morales, Grace; Mead, Matthew E; LaBella, Abigail L; Chavez, Christina M; Schmitz, Jonathan E; Hadjifrangiskou, Maria; Li, Yuanning; Rokas, Antonis.

Genetics ; 221(3)2022 07 04.

Artigo em Inglês | MEDLINE | ID: mdl-35536198

RESUMO

Bioinformatic analysis-such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis-is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).

Assuntos

Biologia Computacional , Software , Códon , Alinhamento de Sequência , Análise de Sequência de DNA

4.

EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality.

MacDonald, Madolyn L; Lee, Kelvin H.

BMC Bioinformatics ; 22(1): 570, 2021 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-34837948

RESUMO

BACKGROUND: To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment. RESULTS: EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study. CONCLUSIONS: EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species.

Assuntos

Genômica , Software , Animais , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Aprendizado de Máquina , Análise de Sequência de DNA

5.

Constructing a de novo transcriptome and a reference proteome for the bivalve Scrobicularia plana: Comparative analysis of different assembly strategies and proteomic analysis.

Amil-Ruiz, Francisco; Herruzo-Ruiz, Ana María; Fuentes-Almagro, Carlos; Baena-Angulo, Casimiro; Jiménez-Pastor, José Manuel; Blasco, Julián; Alhama, José; Michán, Carmen.

Genomics ; 113(3): 1543-1553, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33774165

RESUMO

Scrobicularia plana is a coastal and estuarine bivalve widely used in ecotoxicological studies. However, the underlying molecular mechanisms for S. plana pollutant responses are hardly known due to the lack of molecular databases. Thus, in this study we present a holistic approach to assess a robust reference transcriptome and proteome of this clam. A mixture of control and metal-exposed individuals was used for mRNA isolation. Four sets of high quality filtered preprocessed reads were generated (two quality scores and two sequenced lengths) and assembled with Mira, Ray and Trinity algorithms. The sixty-four generated assemblies were refined, filtered and evaluated for their proteomic quality. Eight assemblies presented top Detonate scores but one was selected due to its compactness and biological representation, which was generated: (i) from the highest quality dataset (Q20L100), (ii) using Trinity algorithm with all k-mers (AtKa), (iii) removing redundancy by CD-HIT (RR80), and (iv) filtering out poor contigs (F), that was subsequently named Q20L100AtKaRR80F. S. plana proteomic analysis revealed 10,017 peptide groups that corresponded to 2066 proteins with a wide coverage of molecular functions and biological processes, confirming the strength of the database generated.

Assuntos

Bivalves , Proteoma , Animais , Bivalves/genética , Sequenciamento de Nucleotídeos em Larga Escala , Proteômica , Transcriptoma

6.

Signal, bias, and the role of transcriptome assembly quality in phylogenomic inference.

Spillane, Jennifer L; LaPolice, Troy M; MacManes, Matthew D; Plachetzki, David C.

BMC Ecol Evol ; 21(1): 43, 2021 03 16.

Artigo em Inglês | MEDLINE | ID: mdl-33726665

RESUMO

BACKGROUND: Phylogenomic approaches have great power to reconstruct evolutionary histories, however they rely on multi-step processes in which each stage has the potential to affect the accuracy of the final result. Many studies have empirically tested and established methodology for resolving robust phylogenies, including selecting appropriate evolutionary models, identifying orthologs, or isolating partitions with strong phylogenetic signal. However, few have investigated errors that may be initiated at earlier stages of the analysis. Biases introduced during the generation of the phylogenomic dataset itself could produce downstream effects on analyses of evolutionary history. Transcriptomes are widely used in phylogenomics studies, though there is little understanding of how a poor-quality assembly of these datasets could impact the accuracy of phylogenomic hypotheses. Here we examined how transcriptome assembly quality affects phylogenomic inferences by creating independent datasets from the same input data representing high-quality and low-quality transcriptome assembly outcomes. RESULTS: By studying the performance of phylogenomic datasets derived from alternative high- and low-quality assembly inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with a greater number of unique partitions than low-quality assemblies. High-quality assemblies also give rise to partitions that have lower alignment ambiguity and less compositional bias. In addition, high-quality partitions hold stronger phylogenetic signal than their low-quality transcriptome assembly counterparts in both concatenation- and coalescent-based analyses. CONCLUSIONS: Our findings demonstrate the importance of transcriptome assembly quality in phylogenomic analyses and suggest that a portion of the uncertainty observed in such studies could be alleviated at the assembly stage.

Assuntos

Genômica , Transcriptoma , Viés , Evolução Biológica , Filogenia

7.

A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues.

Colella, Jocelyn P; Tigano, Anna; MacManes, Matthew D.

Mol Ecol Resour ; 20(4): 856-870, 2020 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-32153100

RESUMO

High-throughput sequencing technologies are a proposed solution for accessing the molecular data in historical specimens. However, degraded DNA combined with the computational demands of short-read assemblies has posed significant laboratory and bioinformatics challenges for de novo genome assembly. Linked-read or "synthetic long-read" sequencing technologies, such as 10× Genomics, may provide a cost-effective alternative solution to assemble higher quality de novo genomes from degraded tissue samples. Here, we compare assembly quality (e.g., genome contiguity and completeness, presence of orthogroups) between four new deer mouse (Peromyscus spp.) genomes assembled using linked-read technology and four published genomes assembled from a single shotgun library. At a similar price-point, these approaches produce vastly different assemblies, with linked-read assemblies having overall higher contiguity and completeness, measured by larger N50 values and greater number of genes assembled, respectively. As a proof-of-concept, we used annotated genes from the four Peromyscus linked-read assemblies and eight additional rodent taxa to generate a phylogeny, which reconstructed the expected relationships among species with 100% support. Although not without caveats, our results suggest that linked-read sequencing approaches are a viable option to build de novo genomes from degraded tissues, which may prove particularly valuable for taxa that are extinct, rare or difficult to collect.

Assuntos

Genoma/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Peromyscus/genética , Animais , Biologia Computacional/métodos , Biblioteca Gênica , Anotação de Sequência Molecular/métodos , Filogenia , Análise de Sequência de DNA/métodos

8.

Comparative Analysis of Strategies for De Novo Transcriptome Assembly in Prokaryotes: Streptomyces clavuligerus as a Case Study.

Caicedo-Montoya, Carlos; Pinilla, Laura; Toro, León F; Yepes-García, Jeferyd; Ríos-Estepa, Rigoberto.

High Throughput ; 8(4)2019 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-31801255

RESUMO

The performance of software tools for de novo transcriptome assembly greatly depends on the selection of software parameters. Up to now, the development of de novo transcriptome assembly for prokaryotes has not been as remarkable as that for eukaryotes. In this contribution, Rockhopper2 was used to perform a comparative transcriptome analysis of Streptomyces clavuligerus exposed to diverse environmental conditions. The study focused on assessing the incidence of software parameters on software performance for the identification of differentially expressed genes as a final goal. For this, a statistical optimization was performed using the Transrate Assembly Score (TAS). TAS was also used for evaluating the software performance and for comparing it with related tools, e.g., Trinity. Transcriptome redundancy and completeness were also considered for this analysis. Rockhopper2 and Trinity reached a TAS value of 0.55092 and 0.58337, respectively. Trinity assembles transcriptomes with high redundancy, with 55.6% of transcripts having some duplicates. Additionally, we observed that the total number of differentially expressed genes (DEG) and their annotation greatly depends on the method used for removing redundancy and the tools used for transcript quantification. To our knowledge, this is the first work aimed at assessing de novo assembly software for prokaryotic organisms.

9.

dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies.

Yavas, Gokhan; Hong, Huixiao; Xiao, Wenming.

BMC Genomics ; 20(1): 706, 2019 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-31510940

RESUMO

BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.

Assuntos

Genômica/métodos , Benchmarking , Mapeamento de Sequências Contíguas , Software

10.

From Short Reads to Chromosome-Scale Genome Assemblies.

Fletcher, Kyle; Michelmore, Richard.

Methods Mol Biol ; 1848: 151-197, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30182236

RESUMO

A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.

Assuntos

Cromossomos , Biologia Computacional , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Biologia Computacional/métodos , Genômica/métodos , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Software , Fluxo de Trabalho

11.

Inferring synteny between genome assemblies: a systematic evaluation.

Liu, Dang; Hunt, Martin; Tsai, Isheng J.

BMC Bioinformatics ; 19(1): 26, 2018 01 30.

Artigo em Inglês | MEDLINE | ID: mdl-29382321

RESUMO

BACKGROUND: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis. RESULTS: We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. CONCLUSIONS: Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.

Assuntos

Genoma , Genômica/métodos , Algoritmos , Animais , Caenorhabditis elegans/genética , Nematoides/genética

12.

The relationship between physical workload and quality within line-based assembly.

Ivarsson, Anna; Eek, Frida.

Ergonomics ; 59(7): 913-23, 2016 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-27626887

RESUMO

Reducing costs and improvement of product quality are considered important to ensure productivity within a company. Quality deviations during production processes and ergonomics have previously shown to be associated. This study explored the relationship between physical workload and real (found during production processes) and potential (need of extra time and assistance to complete tasks) quality deviations in a line-based assembly plant. The physical workload on and the work rotation between 52 workstations were assessed. As the outcome, real and potential quality deviations were studied during 10 weeks. Results show that workstations with higher physical workload had significantly more real deviations compared to lower workload stations. Static work posture had significantly more potential deviations. Rotation between high and low workload was related to fewer quality deviations compared to rotation between only high workload stations. In conclusion, physical ergonomics seems to be related to real and potential quality deviation within line-based assembly. Practitioner Summary: To ensure good productivity in manufacturing industries, it is important to reduce costs and improve product quality. This study shows that high physical workload is associated with quality deviations and need of extra time and assistance to complete tasks within line-based assembly, which can be financially expensive for a company.

Assuntos

Indústria Manufatureira/organização & administração , Controle de Qualidade , Carga de Trabalho , Adulto , Eficiência , Ergonomia , Feminino , Humanos , Masculino , Indústria Manufatureira/normas , Veículos Automotores , Análise e Desempenho de Tarefas

13.

NxRepair: error correction in de novo sequence assembly using Nextera mate pairs.

Murphy, Rebecca R; O'Connell, Jared; Cox, Anthony J; Schulz-Trieglaff, Ole.

PeerJ ; 3: e996, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26056623

RESUMO

Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.

14.

Quality Assessment of Domesticated Animal Genome Assemblies.

Seemann, Stefan E; Anthon, Christian; Palasca, Oana; Gorodkin, Jan.

Bioinform Biol Insights ; 9(Suppl 4): 49-58, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-27279738

RESUMO

The era of high-throughput sequencing has made it relatively simple to sequence genomes and transcriptomes of individuals from many species. In order to analyze the resulting sequencing data, high-quality reference genome assemblies are required. However, this is still a major challenge, and many domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNAseq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data and genome assembly quality. We rank the genomes by their assembly quality and discuss the implications for genotype analyses.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA