Pesquisa | Biblioteca Virtual em Saúde

1.

RECQL5 controls transcript elongation and suppresses genome instability associated with transcription stress.

Saponaro, Marco; Kantidakis, Theodoros; Mitter, Richard; Kelly, Gavin P; Heron, Mark; Williams, Hannah; Söding, Johannes; Stewart, Aengus; Svejstrup, Jesper Q.

Cell ; 157(5): 1037-49, 2014 May 22.

Artigo em Inglês | MEDLINE | ID: mdl-24836610

RESUMO

RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes.

Assuntos

Instabilidade Genômica , RecQ Helicases/metabolismo , Elongação da Transcrição Genética , Transcrição Gênica , Genoma Humano , Células HEK293 , Humanos , RNA Polimerase II/metabolismo

2.

A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning.

Erijman, Ariel; Kozlowski, Lukasz; Sohrabi-Jahromi, Salma; Fishburn, James; Warfield, Linda; Schreiber, Jacob; Noble, William S; Söding, Johannes; Hahn, Steven.

Mol Cell ; 78(5): 890-902.e6, 2020 06 04.

Artigo em Inglês | MEDLINE | ID: mdl-32416068

RESUMO

Acidic transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that promote their dynamic behavior, "fuzzy" interactions, and target specificity. We screened a large set of random 30-mer peptides for AD function in yeast and trained a deep neural network (ADpred) on the AD-positive and -negative sequences. ADpred identifies known acidic ADs within transcription factors and accurately predicts the consequences of mutations. Our work reveals that strong acidic ADs contain multiple clusters of hydrophobic residues near acidic side chains, explaining why ADs often have a biased amino acid composition. ADs likely use a binding mechanism similar to avidity where a minimum number of weak dynamic interactions are required between activator and target to generate biologically relevant affinity and in vivo function. This mechanism explains the basis for fuzzy binding observed between acidic ADs and targets.

Assuntos

Ensaios de Triagem em Larga Escala/métodos , Fatores de Transcrição/genética , Ativação Transcricional/genética , Sequência de Aminoácidos/genética , Fatores de Transcrição de Zíper de Leucina Básica/genética , Proteínas de Ligação a DNA/metabolismo , Aprendizado Profundo , Ligação Proteica , Domínios Proteicos/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transativadores/genética , Transativadores/metabolismo , Fatores de Transcrição/metabolismo , Ativação Transcricional/fisiologia

3.

The Mre11:Rad50 structure shows an ATP-dependent molecular clamp in DNA double-strand break repair.

Lammens, Katja; Bemeleit, Derk J; Möckel, Carolin; Clausing, Emanuel; Schele, Alexandra; Hartung, Sophia; Schiller, Christian B; Lucas, Maria; Angermüller, Christof; Söding, Johannes; Strässer, Katja; Hopfner, Karl-Peter.

Cell ; 145(1): 54-66, 2011 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-21458667

RESUMO

The MR (Mre11 nuclease and Rad50 ABC ATPase) complex is an evolutionarily conserved sensor for DNA double-strand breaks, highly genotoxic lesions linked to cancer development. MR can recognize and process DNA ends even if they are blocked and misfolded. To reveal its mechanism, we determined the crystal structure of the catalytic head of Thermotoga maritima MR and analyzed ATP-dependent conformational changes. MR adopts an open form with a central Mre11 nuclease dimer and two peripheral Rad50 molecules, a form suited for sensing obstructed breaks. The Mre11 C-terminal helix-loop-helix domain binds Rad50 and attaches flexibly to the nuclease domain, enabling large conformational changes. ATP binding to the two Rad50 subunits induces a rotation of the Mre11 helix-loop-helix and Rad50 coiled-coil domains, creating a clamp conformation with increased DNA-binding activity. The results suggest that MR is an ATP-controlled transient molecular clamp at DNA double-strand breaks.

Assuntos

Trifosfato de Adenosina/metabolismo , Proteínas de Bactérias/química , Enzimas Reparadoras do DNA/química , Reparo do DNA , Proteínas de Ligação a DNA/química , Thermotoga maritima/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Cristalografia por Raios X , Quebras de DNA de Cadeia Dupla , Enzimas Reparadoras do DNA/genética , Enzimas Reparadoras do DNA/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Endodesoxirribonucleases/química , Endodesoxirribonucleases/metabolismo , Exodesoxirribonucleases/química , Exodesoxirribonucleases/metabolismo , Modelos Moleculares , Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Espalhamento a Baixo Ângulo , Thermotoga maritima/metabolismo , Difração de Raios X

4.

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; Steinegger, Martin; Wang, Duolin; Wang, Kui; Xu, Dong; Zhang, Jian; Kurgan, Lukasz.

Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37933852

RESUMO

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Assuntos

Aminoácidos , Proteoma , Proteoma/química , Bases de Dados Factuais

5.

Genome-wide Analysis of RNA Polymerase II Termination at Protein-Coding Genes.

Baejen, Carlo; Andreani, Jessica; Torkler, Phillipp; Battaglia, Sofia; Schwalb, Bjoern; Lidschreiber, Michael; Maier, Kerstin C; Boltendahl, Andrea; Rus, Petra; Esslinger, Stephanie; Söding, Johannes; Cramer, Patrick.

Mol Cell ; 66(1): 38-49.e6, 2017 Apr 06.

Artigo em Inglês | MEDLINE | ID: mdl-28318822

RESUMO

At the end of protein-coding genes, RNA polymerase (Pol) II undergoes a concerted transition that involves 3'-processing of the pre-mRNA and transcription termination. Here, we present a genome-wide analysis of the 3'-transition in budding yeast. We find that the 3'-transition globally requires the Pol II elongation factor Spt5 and factors involved in the recognition of the polyadenylation (pA) site and in endonucleolytic RNA cleavage. Pol II release from DNA occurs in a narrow termination window downstream of the pA site and requires the "torpedo" exonuclease Rat1 (XRN2 in human). The Rat1-interacting factor Rai1 contributes to RNA degradation downstream of the pA site. Defects in the 3'-transition can result in increased transcription at downstream genes.

Assuntos

DNA Fúngico/metabolismo , Processamento de Terminações 3' de RNA , RNA Polimerase II/metabolismo , Precursores de RNA/biossíntese , RNA Fúngico/biossíntese , RNA Mensageiro/biossíntese , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/enzimologia , Sítios de Ligação , Proteínas Cromossômicas não Histona/genética , Proteínas Cromossômicas não Histona/metabolismo , DNA Fúngico/genética , Exorribonucleases/genética , Exorribonucleases/metabolismo , Modelos Genéticos , Ligação Proteica , RNA Polimerase II/genética , Precursores de RNA/genética , RNA Fúngico/genética , RNA Mensageiro/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Elongação da Transcrição/genética , Fatores de Elongação da Transcrição/metabolismo , Fatores de Poliadenilação e Clivagem de mRNA/genética , Fatores de Poliadenilação e Clivagem de mRNA/metabolismo

6.

Modulations of DNA Contacts by Linker Histones and Post-translational Modifications Determine the Mobility and Modifiability of Nucleosomal H3 Tails.

Stützer, Alexandra; Liokatis, Stamatios; Kiesel, Anja; Schwarzer, Dirk; Sprangers, Remco; Söding, Johannes; Selenko, Philipp; Fischle, Wolfgang.

Mol Cell ; 61(2): 247-59, 2016 Jan 21.

Artigo em Inglês | MEDLINE | ID: mdl-26778125

RESUMO

Post-translational histone modifications and linker histone incorporation regulate chromatin structure and genome activity. How these systems interface on a molecular level is unclear. Using biochemistry and NMR spectroscopy, we deduced mechanistic insights into the modification behavior of N-terminal histone H3 tails in different nucleosomal contexts. We find that linker histones generally inhibit modifications of different H3 sites and reduce H3 tail dynamics in nucleosomes. These effects are caused by modulations of electrostatic interactions of H3 tails with linker DNA and largely depend on the C-terminal domains of linker histones. In agreement, linker histone occupancy and H3 tail modifications segregate on a genome-wide level. Charge-modulating modifications such as phosphorylation and acetylation weaken transient H3 tail-linker DNA interactions, increase H3 tail dynamics, and, concomitantly, enhance general modifiability. We propose that alterations of H3 tail-linker DNA interactions by linker histones and charge-modulating modifications execute basal control mechanisms of chromatin function.

Assuntos

DNA/metabolismo , Histonas/metabolismo , Nucleossomos/metabolismo , Processamento de Proteína Pós-Traducional , Acetilação , Sequência de Aminoácidos , Animais , Genoma , Histonas/química , Dados de Sequência Molecular , Fosforilação , Ligação Proteica , Xenopus laevis

7.

DescribePROT: database of amino acid-level protein structure and function predictions.

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; Söding, Johannes; Steinegger, Martin; Zhou, Yaoqi; Kurgan, Lukasz.

Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33119734

RESUMO

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Assuntos

Aminoácidos/química , Bases de Dados de Proteínas , Genoma , Proteínas/genética , Proteoma/genética , Software , Sequência de Aminoácidos , Aminoácidos/metabolismo , Animais , Archaea/genética , Archaea/metabolismo , Bactérias/genética , Bactérias/metabolismo , Sítios de Ligação , Sequência Conservada , Fungos/genética , Fungos/metabolismo , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Células Procarióticas/metabolismo , Ligação Proteica , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo , Proteoma/química , Proteoma/metabolismo , Análise de Sequência de Proteína , Vírus/genética , Vírus/metabolismo

8.

Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold.

Steinegger, Martin; Mirdita, Milot; Söding, Johannes.

Nat Methods ; 16(7): 603-606, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31235882

RESUMO

The open-source de novo protein-level assembler, Plass ( https://plass.mmseqs.com ), assembles six-frame-translated sequencing reads into protein sequences. It recovers 2-10 times more protein sequences from complex metagenomes and can assemble huge datasets. We assembled two redundancy-filtered reference protein catalogs, 2 billion sequences from 640 soil samples (soil reference protein catalog) and 292 million sequences from 775 marine eukaryotic metatranscriptomes (marine eukaryotic reference catalog), the largest free collections of protein sequences.

Assuntos

Metagenômica , Proteínas/química , Sequência de Aminoácidos , Códon , Fases de Leitura Aberta

9.

Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins.

Sohrabi-Jahromi, Salma; Söding, Johannes.

Bioinformatics ; 37(Suppl_1): i308-i316, 2021 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-34252974

RESUMO

MOTIVATION: Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. RESULTS: We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions. AVAILABILITY AND IMPLEMENTATION: BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteínas de Ligação a RNA , Software , Sítios de Ligação , Ligação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Termodinâmica

10.

SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts.

Zhang, Ruoshi; Mirdita, Milot; Levy Karin, Eli; Norroy, Clovis; Galiez, Clovis; Söding, Johannes.

Bioinformatics ; 37(19): 3364-3366, 2021 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-33792634

RESUMO

SUMMARY: SpacePHARER (CRISPR Spacer Phage-Host Pair Finder) is a sensitive and fast tool for de novo prediction of phage-host relationships via identifying phage genomes that match CRISPR spacers in genomic or metagenomic data. SpacePHARER gains sensitivity by comparing spacers and phages at the protein level, optimizing its scores for matching very short sequences, and combining evidence from multiple matches, while controlling for false positives. We demonstrate SpacePHARER by searching a comprehensive spacer list against all complete phage genomes. AVAILABILITY AND IMPLEMENTATION: SpacePHARER is available as an open-source (GPLv3), user-friendly command-line software for Linux and macOS: https://github.com/soedinglab/spacepharer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.

Transcriptome maps of mRNP biogenesis factors define pre-mRNA recognition.

Baejen, Carlo; Torkler, Phillipp; Gressel, Saskia; Essig, Katharina; Söding, Johannes; Cramer, Patrick.

Mol Cell ; 55(5): 745-57, 2014 Sep 04.

Artigo em Inglês | MEDLINE | ID: mdl-25192364

RESUMO

Biogenesis of eukaryotic messenger ribonucleoprotein complexes (mRNPs) involves the synthesis, splicing, and 3' processing of pre-mRNA, and the assembly of mature mRNPs for nuclear export. We mapped 23 mRNP biogenesis factors onto the yeast transcriptome, providing 10(4)-10(6) high-confidence RNA interaction sites per factor. The data reveal how mRNP biogenesis factors recognize pre-mRNA elements in vivo. They define conserved interactions between splicing factors and pre-mRNA introns, including the recognition of intron-exon junctions and the branchpoint. They also identify a unified arrangement of 3' processing factors at pre-mRNA polyadenylation (pA) sites in yeast and human, which results from an A-U sequence bias at pA sites. Global data analysis indicates that 3' processing factors have roles in splicing and RNA surveillance, and that they couple mRNP biogenesis events to restrict nuclear export to mature mRNPs.

Assuntos

Modelos Genéticos , Precursores de RNA/metabolismo , RNA Mensageiro/metabolismo , Ribonucleoproteínas/biossíntese , Transporte Ativo do Núcleo Celular , Perfilação da Expressão Gênica , Humanos , Íntrons , Precursores de RNA/química , Splicing de RNA , RNA Mensageiro/química , Saccharomyces cerevisiae/genética

12.

Bayesian multiple logistic regression for case-control GWAS.

Banerjee, Saikat; Zeng, Lingyao; Schunkert, Heribert; Söding, Johannes.

PLoS Genet ; 14(12): e1007856, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30596640

RESUMO

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.

Assuntos

Teorema de Bayes , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Logísticos , Modelos Genéticos , Estudos de Casos e Controles , Simulação por Computador , Doença da Artéria Coronariana/genética , Variação Genética , Humanos , Funções Verossimilhança , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único , Software

13.

PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes.

Papadopoulos, Nikolaos; Gonzalo, Parra R; Söding, Johannes.

Bioinformatics ; 35(18): 3517-3519, 2019 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-30715210

RESUMO

SUMMARY: Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. PROSSTT can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees. AVAILABILITY AND IMPLEMENTATION: https://github.com/soedinglab/prosstt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Diferenciação Celular , Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única

14.

MMseqs2 desktop and local web server app for fast, interactive sequence searches.

Mirdita, Milot; Steinegger, Martin; Söding, Johannes.

Bioinformatics ; 35(16): 2856-2858, 2019 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-30615063

RESUMO

SUMMARY: The MMseqs2 desktop and web server app facilitates interactive sequence searches through custom protein sequence and profile databases on personal workstations. By eliminating MMseqs2's runtime overhead, we reduced response times to a few seconds at sensitivities close to BLAST. AVAILABILITY AND IMPLEMENTATION: The app is easy to install for non-experts. GPLv3-licensed code, pre-built desktop app packages for Windows, MacOS and Linux, Docker images for the web server application and a demo web server are available at https://search.mmseqs.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Computadores , Software , Sequência de Aminoácidos , Bases de Dados Factuais

15.

The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes.

Nucleic Acids Res ; 46(W1): W215-W220, 2018 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-29846656

RESUMO

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.

Assuntos

Motivos de Nucleotídeos , Sequências Reguladoras de Ácido Nucleico , Software , Animais , Teorema de Bayes , Bases de Dados de Ácidos Nucleicos , Humanos , Internet , Cadeias de Markov , Camundongos , Ratos , Análise de Sequência , Fatores de Transcrição/metabolismo

16.

HH-suite3 for fast remote homology detection and deep protein annotation.

Steinegger, Martin; Meier, Markus; Mirdita, Milot; Vöhringer, Harald; Haunsberger, Stephan J; Söding, Johannes.

BMC Bioinformatics ; 20(1): 473, 2019 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-31521110

RESUMO

BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is â¼10× faster than PSI-BLAST and â¼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite . CONCLUSION: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.

Assuntos

Anotação de Sequência Molecular/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Cadeias de Markov

17.

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Vorberg, Susann; Seemayer, Stefan; Söding, Johannes.

PLoS Comput Biol ; 14(11): e1006526, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30395601

RESUMO

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Assuntos

Proteínas/química , Alinhamento de Sequência , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Entropia , Ruído , Homologia de Sequência de Aminoácidos

18.

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis; Martin, Maria J; Söding, Johannes; Steinegger, Martin.

Nucleic Acids Res ; 45(D1): D170-D176, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899574

RESUMO

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Software , Análise por Conglomerados , Ontologia Genética , Anotação de Sequência Molecular , Navegador

19.

WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs.

Galiez, Clovis; Siebert, Matthias; Enault, François; Vincent, Jonathan; Söding, Johannes.

Bioinformatics ; 33(19): 3113-3114, 2017 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-28957499

RESUMO

SUMMARY: WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies. AVAILABILITY AND IMPLEMENTATION: OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish. CONTACT: clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bacteriófagos/genética , Metagenômica/métodos , Software , Archaea/virologia , Bactérias/virologia , Mapeamento de Sequências Contíguas

20.

DBIRD complex integrates alternative mRNA splicing with RNA polymerase II transcript elongation.

Close, Pierre; East, Philip; Dirac-Svejstrup, A Barbara; Hartmann, Holger; Heron, Mark; Maslen, Sarah; Chariot, Alain; Söding, Johannes; Skehel, Mark; Svejstrup, Jesper Q.

Nature ; 484(7394): 386-9, 2012 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-22446626

RESUMO

Alternative messenger RNA splicing is the main reason that vast mammalian proteomic complexity can be achieved with a limited number of genes. Splicing is physically and functionally coupled to transcription, and is greatly affected by the rate of transcript elongation. As the nascent pre-mRNA emerges from transcribing RNA polymerase II (RNAPII), it is assembled into a messenger ribonucleoprotein (mRNP) particle; this is the functional form of the nascent pre-mRNA and determines the fate of the mature transcript. However, factors that connect the transcribing polymerase with the mRNP particle and help to integrate transcript elongation with mRNA splicing remain unclear. Here we characterize the human interactome of chromatin-associated mRNP particles. This led us to identify deleted in breast cancer 1 (DBC1) and ZNF326 (which we call ZNF-protein interacting with nuclear mRNPs and DBC1 (ZIRD)) as subunits of a novel protein complex--named DBIRD--that binds directly to RNAPII. DBIRD regulates alternative splicing of a large set of exons embedded in (A + T)-rich DNA, and is present at the affected exons. RNA-interference-mediated DBIRD depletion results in region-specific decreases in transcript elongation, particularly across areas encompassing affected exons. Together, these data indicate that the DBIRD complex acts at the interface between mRNP particles and RNAPII, integrating transcript elongation with the regulation of alternative splicing.

Assuntos

Processamento Alternativo , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , RNA Polimerase II/metabolismo , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Transcrição Gênica , Proteínas Adaptadoras de Transdução de Sinal/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Animais , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Cromatina/genética , Cromatina/metabolismo , Éxons/genética , Células HEK293 , Ribonucleoproteínas Nucleares Heterogêneas/deficiência , Ribonucleoproteínas Nucleares Heterogêneas/metabolismo , Humanos , Camundongos , Complexos Multiproteicos/genética , Interferência de RNA , RNA Mensageiro/metabolismo , Ribonucleoproteínas/química , Ribonucleoproteínas/genética , Ribonucleoproteínas/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA