Pesquisa | BVS IEC

Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes.

Rabanal, Fernando A; Gräff, Maike; Lanz, Christa; Fritschi, Katrin; Llaca, Victor; Lang, Michelle; Carbonell-Bejerano, Pablo; Henderson, Ian; Weigel, Detlef.

Nucleic Acids Res ; 50(21): 12309-12327, 2022 11 28.

Artigo em Inglês | MEDLINE | ID: mdl-36453992

RESUMO

Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.

Assuntos

Arabidopsis , Análise de Sequência de DNA , Arabidopsis/genética , Sequenciamento de Nucleotídeos em Larga Escala , Centrômero/genética , DNA Ribossômico

Multicopper oxidases: modular structure, sequence space, and evolutionary relationships.

Gräff, Maike; Buchholz, Patrick C F; Le Roes-Hill, Marilize; Pleiss, Jürgen.

Proteins ; 88(10): 1329-1339, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32447824

RESUMO

Multicopper oxidases (MCOs) use copper ions as cofactors to oxidize a variety of substrates while reducing oxygen to water. MCOs have been identified in various taxa, with notable occurrences in fungi. The role of these fungal MCOs in lignin degradation sparked an interest due to their potential for application in biofuel production and various other industries. MCOs consist of different protein domains, which led to their classification into two-, three-, and six-domain MCOs. The previously established Laccase and Multicopper Oxidase Engineering Database (https://lcced.biocatnet.de) was updated and now includes 51 058 sequences and 229 structures of MCOs. Sequences and structures of all MCOs were systematically compared. All MCOs consist of cupredoxin-like domains. Two-domain MCOs are formed by the N- and C-terminal domain (domain N and C), while three-domain MCOs have an additional domain (M) in between, homologous to domain C. The six-domain MCOs consist of alternating domains N and C, each three times. Two standard numbering schemes were developed for the copper-binding domains N and C, which facilitated the identification of conserved positions and a comparison to previously reported results from mutagenesis studies. Two sequence motifs for the copper binding sites were identified per domain. Their modularity, depending on the placement of the T1-copper binding site, was demonstrated. Protein sequence networks showed relationships between two- and three-domain MCOs, allowing for family-specific annotation and inference of evolutionary relationships.

Assuntos

Azurina/química , Coenzimas/química , Cobre/química , Proteínas Fúngicas/química , Oxirredutases/química , Sequência de Aminoácidos , Azurina/metabolismo , Sítios de Ligação , Coenzimas/metabolismo , Cobre/metabolismo , Mineração de Dados , Bases de Dados de Proteínas , Evolução Molecular , Proteínas Fúngicas/classificação , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Fungos/química , Fungos/enzimologia , Modelos Moleculares , Oxirredução , Oxirredutases/classificação , Oxirredutases/genética , Oxirredutases/metabolismo , Oxigênio/química , Oxigênio/metabolismo , Ligação Proteica , Engenharia de Proteínas , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade , Especificidade por Substrato , Água/química , Água/metabolismo

The Short-chain Dehydrogenase/Reductase Engineering Database (SDRED): A classification and analysis system for a highly diverse enzyme family.

Gräff, Maike; Buchholz, Patrick C F; Stockinger, Peter; Bommarius, Bettina; Bommarius, Andreas S; Pleiss, Jürgen.

Proteins ; 87(6): 443-451, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30714194

RESUMO

The Short-chain Dehydrogenases/Reductases Engineering Database (SDRED) covers one of the largest known protein families (168 150 proteins). Assignment to the superfamilies of Classical and Extended SDRs was achieved by global sequence similarity and by identification of family-specific sequence motifs. Two standard numbering schemes were established for Classical and Extended SDRs that allow for the determination of conserved amino acid residues, such as cofactor specificity determining positions or superfamily specific sequence motifs. The comprehensive sequence dataset of the SDRED facilitates the refinement of family-specific sequence motifs. The glycine-rich motifs for Classical and Extended SDRs were refined to improve the precision of superfamily classification. In each superfamily, the majority of sequences formed a tightly connected sequence network and belonged to a large homologous family. Despite their different sequence motifs and their different sequence length, the two sequence networks of Classical and Extended SDRs are not separate, but connected by edges at a threshold of 40% sequence similarity, indicating that all SDRs belong to a large, connected network. The SDRED is accessible at https://sdred.biocatnet.de/.

Assuntos

Ácido Graxo Sintases/metabolismo , NADH NADPH Oxirredutases/metabolismo , Animais , Bases de Dados Genéticas , Ácido Graxo Sintases/genética , Humanos , NADH NADPH Oxirredutases/genética , Engenharia de Proteínas/métodos

The ω-transaminase engineering database (oTAED): A navigation tool in protein sequence and structure space.

Buß, Oliver; Buchholz, Patrick C F; Gräff, Maike; Klausmann, Peter; Rudat, Jens; Pleiss, Jürgen.

Proteins ; 86(5): 566-580, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29423963

RESUMO

The ω-Transaminase Engineering Database (oTAED) was established as a publicly accessible resource on sequences and structures of the biotechnologically relevant ω-transaminases (ω-TAs) from Fold types I and IV. The oTAED integrates sequence and structure data, provides a classification based on fold type and sequence similarity, and applies a standard numbering scheme to identify equivalent positions in homologous proteins. The oTAED includes 67 210 proteins (114 655 sequences) which are divided into 169 homologous families based on global sequence similarity. The 44 and 39 highly conserved positions which were identified in Fold type I and IV, respectively, include the known catalytic residues and a large fraction of glycines and prolines in loop regions, which might have a role in protein folding and stability. However, for most of the conserved positions the function is still unknown. Literature information on positions that mediate substrate specificity and stereoselectivity was systematically examined. The standard numbering schemes revealed that many positions which have been described in different enzymes are structurally equivalent. For some positions, multiple functional roles have been suggested based on experimental data in different enzymes. The proposed standard numbering schemes for Fold type I and IV ω-TAs assist with analysis of literature data, facilitate annotation of ω-TAs, support prediction of promising mutation sites, and enable navigation in ω-TA sequence space. Thus, it is a useful tool for enzyme engineering and the selection of novel ω-TA candidates with desired biochemical properties.

Assuntos

Proteínas de Bactérias/química , Bases de Dados de Proteínas , Transaminases/química , Transaminases/classificação , Sequência de Aminoácidos , Aminoácidos/química , Bactérias , Domínio Catalítico , Sequência Conservada , Modelos Moleculares , Mutação , Conformação Proteica , Dobramento de Proteína , Relação Estrutura-Atividade , Especificidade por Substrato

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA