Búsqueda | Portal de Búsqueda de la BVS Ecuador

1.

RECQL5 controls transcript elongation and suppresses genome instability associated with transcription stress.

Saponaro, Marco; Kantidakis, Theodoros; Mitter, Richard; Kelly, Gavin P; Heron, Mark; Williams, Hannah; Söding, Johannes; Stewart, Aengus; Svejstrup, Jesper Q.

Cell ; 157(5): 1037-49, 2014 May 22.

Artículo en Inglés | MEDLINE | ID: mdl-24836610

RESUMEN

RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes.

Asunto(s)

Inestabilidad Genómica , RecQ Helicasas/metabolismo , Elongación de la Transcripción Genética , Transcripción Genética , Genoma Humano , Células HEK293 , Humanos , ARN Polimerasa II/metabolismo

2.

A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning.

Erijman, Ariel; Kozlowski, Lukasz; Sohrabi-Jahromi, Salma; Fishburn, James; Warfield, Linda; Schreiber, Jacob; Noble, William S; Söding, Johannes; Hahn, Steven.

Mol Cell ; 78(5): 890-902.e6, 2020 06 04.

Artículo en Inglés | MEDLINE | ID: mdl-32416068

RESUMEN

Acidic transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that promote their dynamic behavior, "fuzzy" interactions, and target specificity. We screened a large set of random 30-mer peptides for AD function in yeast and trained a deep neural network (ADpred) on the AD-positive and -negative sequences. ADpred identifies known acidic ADs within transcription factors and accurately predicts the consequences of mutations. Our work reveals that strong acidic ADs contain multiple clusters of hydrophobic residues near acidic side chains, explaining why ADs often have a biased amino acid composition. ADs likely use a binding mechanism similar to avidity where a minimum number of weak dynamic interactions are required between activator and target to generate biologically relevant affinity and in vivo function. This mechanism explains the basis for fuzzy binding observed between acidic ADs and targets.

Asunto(s)

Ensayos Analíticos de Alto Rendimiento/métodos , Factores de Transcripción/genética , Activación Transcripcional/genética , Secuencia de Aminoácidos/genética , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Proteínas de Unión al ADN/metabolismo , Aprendizaje Profundo , Unión Proteica , Dominios Proteicos/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transactivadores/genética , Transactivadores/metabolismo , Factores de Transcripción/metabolismo , Activación Transcripcional/fisiología

3.

The Mre11:Rad50 structure shows an ATP-dependent molecular clamp in DNA double-strand break repair.

Lammens, Katja; Bemeleit, Derk J; Möckel, Carolin; Clausing, Emanuel; Schele, Alexandra; Hartung, Sophia; Schiller, Christian B; Lucas, Maria; Angermüller, Christof; Söding, Johannes; Strässer, Katja; Hopfner, Karl-Peter.

Cell ; 145(1): 54-66, 2011 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-21458667

RESUMEN

The MR (Mre11 nuclease and Rad50 ABC ATPase) complex is an evolutionarily conserved sensor for DNA double-strand breaks, highly genotoxic lesions linked to cancer development. MR can recognize and process DNA ends even if they are blocked and misfolded. To reveal its mechanism, we determined the crystal structure of the catalytic head of Thermotoga maritima MR and analyzed ATP-dependent conformational changes. MR adopts an open form with a central Mre11 nuclease dimer and two peripheral Rad50 molecules, a form suited for sensing obstructed breaks. The Mre11 C-terminal helix-loop-helix domain binds Rad50 and attaches flexibly to the nuclease domain, enabling large conformational changes. ATP binding to the two Rad50 subunits induces a rotation of the Mre11 helix-loop-helix and Rad50 coiled-coil domains, creating a clamp conformation with increased DNA-binding activity. The results suggest that MR is an ATP-controlled transient molecular clamp at DNA double-strand breaks.

Asunto(s)

Adenosina Trifosfato/metabolismo , Proteínas Bacterianas/química , Enzimas Reparadoras del ADN/química , Reparación del ADN , Proteínas de Unión al ADN/química , Thermotoga maritima/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Cristalografía por Rayos X , Roturas del ADN de Doble Cadena , Enzimas Reparadoras del ADN/genética , Enzimas Reparadoras del ADN/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Endodesoxirribonucleasas/química , Endodesoxirribonucleasas/metabolismo , Exodesoxirribonucleasas/química , Exodesoxirribonucleasas/metabolismo , Modelos Moleculares , Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Dispersión del Ángulo Pequeño , Thermotoga maritima/metabolismo , Difracción de Rayos X

4.

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; Steinegger, Martin; Wang, Duolin; Wang, Kui; Xu, Dong; Zhang, Jian; Kurgan, Lukasz.

Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-37933852

RESUMEN

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Asunto(s)

Aminoácidos , Proteoma , Proteoma/química , Bases de Datos Factuales

5.

Genome-wide Analysis of RNA Polymerase II Termination at Protein-Coding Genes.

Baejen, Carlo; Andreani, Jessica; Torkler, Phillipp; Battaglia, Sofia; Schwalb, Bjoern; Lidschreiber, Michael; Maier, Kerstin C; Boltendahl, Andrea; Rus, Petra; Esslinger, Stephanie; Söding, Johannes; Cramer, Patrick.

Mol Cell ; 66(1): 38-49.e6, 2017 Apr 06.

Artículo en Inglés | MEDLINE | ID: mdl-28318822

RESUMEN

At the end of protein-coding genes, RNA polymerase (Pol) II undergoes a concerted transition that involves 3'-processing of the pre-mRNA and transcription termination. Here, we present a genome-wide analysis of the 3'-transition in budding yeast. We find that the 3'-transition globally requires the Pol II elongation factor Spt5 and factors involved in the recognition of the polyadenylation (pA) site and in endonucleolytic RNA cleavage. Pol II release from DNA occurs in a narrow termination window downstream of the pA site and requires the "torpedo" exonuclease Rat1 (XRN2 in human). The Rat1-interacting factor Rai1 contributes to RNA degradation downstream of the pA site. Defects in the 3'-transition can result in increased transcription at downstream genes.

Asunto(s)

ADN de Hongos/metabolismo , Procesamiento de Término de ARN 3' , ARN Polimerasa II/metabolismo , Precursores del ARN/biosíntesis , ARN de Hongos/biosíntesis , ARN Mensajero/biosíntesis , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/enzimología , Sitios de Unión , Proteínas Cromosómicas no Histona/genética , Proteínas Cromosómicas no Histona/metabolismo , ADN de Hongos/genética , Exorribonucleasas/genética , Exorribonucleasas/metabolismo , Modelos Genéticos , Unión Proteica , ARN Polimerasa II/genética , Precursores del ARN/genética , ARN de Hongos/genética , ARN Mensajero/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Factores de Elongación Transcripcional/genética , Factores de Elongación Transcripcional/metabolismo , Factores de Escisión y Poliadenilación de ARNm/genética , Factores de Escisión y Poliadenilación de ARNm/metabolismo

6.

Modulations of DNA Contacts by Linker Histones and Post-translational Modifications Determine the Mobility and Modifiability of Nucleosomal H3 Tails.

Stützer, Alexandra; Liokatis, Stamatios; Kiesel, Anja; Schwarzer, Dirk; Sprangers, Remco; Söding, Johannes; Selenko, Philipp; Fischle, Wolfgang.

Mol Cell ; 61(2): 247-59, 2016 Jan 21.

Artículo en Inglés | MEDLINE | ID: mdl-26778125

RESUMEN

Post-translational histone modifications and linker histone incorporation regulate chromatin structure and genome activity. How these systems interface on a molecular level is unclear. Using biochemistry and NMR spectroscopy, we deduced mechanistic insights into the modification behavior of N-terminal histone H3 tails in different nucleosomal contexts. We find that linker histones generally inhibit modifications of different H3 sites and reduce H3 tail dynamics in nucleosomes. These effects are caused by modulations of electrostatic interactions of H3 tails with linker DNA and largely depend on the C-terminal domains of linker histones. In agreement, linker histone occupancy and H3 tail modifications segregate on a genome-wide level. Charge-modulating modifications such as phosphorylation and acetylation weaken transient H3 tail-linker DNA interactions, increase H3 tail dynamics, and, concomitantly, enhance general modifiability. We propose that alterations of H3 tail-linker DNA interactions by linker histones and charge-modulating modifications execute basal control mechanisms of chromatin function.

Asunto(s)

ADN/metabolismo , Histonas/metabolismo , Nucleosomas/metabolismo , Procesamiento Proteico-Postraduccional , Acetilación , Secuencia de Aminoácidos , Animales , Genoma , Histonas/química , Datos de Secuencia Molecular , Fosforilación , Unión Proteica , Xenopus laevis

7.

DescribePROT: database of amino acid-level protein structure and function predictions.

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; Söding, Johannes; Steinegger, Martin; Zhou, Yaoqi; Kurgan, Lukasz.

Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33119734

RESUMEN

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Asunto(s)

Aminoácidos/química , Bases de Datos de Proteínas , Genoma , Proteínas/genética , Proteoma/genética , Programas Informáticos , Secuencia de Aminoácidos , Aminoácidos/metabolismo , Animales , Archaea/genética , Archaea/metabolismo , Bacterias/genética , Bacterias/metabolismo , Sitios de Unión , Secuencia Conservada , Hongos/genética , Hongos/metabolismo , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Células Procariotas/metabolismo , Unión Proteica , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/metabolismo , Proteoma/química , Proteoma/metabolismo , Análisis de Secuencia de Proteína , Virus/genética , Virus/metabolismo

8.

Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold.

Steinegger, Martin; Mirdita, Milot; Söding, Johannes.

Nat Methods ; 16(7): 603-606, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-31235882

RESUMEN

The open-source de novo protein-level assembler, Plass ( https://plass.mmseqs.com ), assembles six-frame-translated sequencing reads into protein sequences. It recovers 2-10 times more protein sequences from complex metagenomes and can assemble huge datasets. We assembled two redundancy-filtered reference protein catalogs, 2 billion sequences from 640 soil samples (soil reference protein catalog) and 292 million sequences from 775 marine eukaryotic metatranscriptomes (marine eukaryotic reference catalog), the largest free collections of protein sequences.

Asunto(s)

Metagenómica , Proteínas/química , Secuencia de Aminoácidos , Codón , Sistemas de Lectura Abierta

9.

Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins.

Sohrabi-Jahromi, Salma; Söding, Johannes.

Bioinformatics ; 37(Suppl_1): i308-i316, 2021 07 12.

Artículo en Inglés | MEDLINE | ID: mdl-34252974

RESUMEN

MOTIVATION: Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. RESULTS: We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions. AVAILABILITY AND IMPLEMENTATION: BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Proteínas de Unión al ARN , Programas Informáticos , Sitios de Unión , Unión Proteica , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Termodinámica

10.

SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts.

Zhang, Ruoshi; Mirdita, Milot; Levy Karin, Eli; Norroy, Clovis; Galiez, Clovis; Söding, Johannes.

Bioinformatics ; 37(19): 3364-3366, 2021 Oct 11.

Artículo en Inglés | MEDLINE | ID: mdl-33792634

RESUMEN

SUMMARY: SpacePHARER (CRISPR Spacer Phage-Host Pair Finder) is a sensitive and fast tool for de novo prediction of phage-host relationships via identifying phage genomes that match CRISPR spacers in genomic or metagenomic data. SpacePHARER gains sensitivity by comparing spacers and phages at the protein level, optimizing its scores for matching very short sequences, and combining evidence from multiple matches, while controlling for false positives. We demonstrate SpacePHARER by searching a comprehensive spacer list against all complete phage genomes. AVAILABILITY AND IMPLEMENTATION: SpacePHARER is available as an open-source (GPLv3), user-friendly command-line software for Linux and macOS: https://github.com/soedinglab/spacepharer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.

Transcriptome maps of mRNP biogenesis factors define pre-mRNA recognition.

Baejen, Carlo; Torkler, Phillipp; Gressel, Saskia; Essig, Katharina; Söding, Johannes; Cramer, Patrick.

Mol Cell ; 55(5): 745-57, 2014 Sep 04.

Artículo en Inglés | MEDLINE | ID: mdl-25192364

RESUMEN

Biogenesis of eukaryotic messenger ribonucleoprotein complexes (mRNPs) involves the synthesis, splicing, and 3' processing of pre-mRNA, and the assembly of mature mRNPs for nuclear export. We mapped 23 mRNP biogenesis factors onto the yeast transcriptome, providing 10(4)-10(6) high-confidence RNA interaction sites per factor. The data reveal how mRNP biogenesis factors recognize pre-mRNA elements in vivo. They define conserved interactions between splicing factors and pre-mRNA introns, including the recognition of intron-exon junctions and the branchpoint. They also identify a unified arrangement of 3' processing factors at pre-mRNA polyadenylation (pA) sites in yeast and human, which results from an A-U sequence bias at pA sites. Global data analysis indicates that 3' processing factors have roles in splicing and RNA surveillance, and that they couple mRNP biogenesis events to restrict nuclear export to mature mRNPs.

Asunto(s)

Modelos Genéticos , Precursores del ARN/metabolismo , ARN Mensajero/metabolismo , Ribonucleoproteínas/biosíntesis , Transporte Activo de Núcleo Celular , Perfilación de la Expresión Génica , Humanos , Intrones , Precursores del ARN/química , Empalme del ARN , ARN Mensajero/química , Saccharomyces cerevisiae/genética

12.

Bayesian multiple logistic regression for case-control GWAS.

Banerjee, Saikat; Zeng, Lingyao; Schunkert, Heribert; Söding, Johannes.

PLoS Genet ; 14(12): e1007856, 2018 12.

Artículo en Inglés | MEDLINE | ID: mdl-30596640

RESUMEN

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.

Asunto(s)

Teorema de Bayes , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Modelos Logísticos , Modelos Genéticos , Estudios de Casos y Controles , Simulación por Computador , Enfermedad de la Arteria Coronaria/genética , Variación Genética , Humanos , Funciones de Verosimilitud , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos

13.

PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes.

Papadopoulos, Nikolaos; Gonzalo, Parra R; Söding, Johannes.

Bioinformatics ; 35(18): 3517-3519, 2019 09 15.

Artículo en Inglés | MEDLINE | ID: mdl-30715210

RESUMEN

SUMMARY: Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. PROSSTT can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees. AVAILABILITY AND IMPLEMENTATION: https://github.com/soedinglab/prosstt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Programas Informáticos , Diferenciación Celular , Perfilación de la Expresión Génica , RNA-Seq , Análisis de la Célula Individual

14.

MMseqs2 desktop and local web server app for fast, interactive sequence searches.

Mirdita, Milot; Steinegger, Martin; Söding, Johannes.

Bioinformatics ; 35(16): 2856-2858, 2019 08 15.

Artículo en Inglés | MEDLINE | ID: mdl-30615063

RESUMEN

SUMMARY: The MMseqs2 desktop and web server app facilitates interactive sequence searches through custom protein sequence and profile databases on personal workstations. By eliminating MMseqs2's runtime overhead, we reduced response times to a few seconds at sensitivities close to BLAST. AVAILABILITY AND IMPLEMENTATION: The app is easy to install for non-experts. GPLv3-licensed code, pre-built desktop app packages for Windows, MacOS and Linux, Docker images for the web server application and a demo web server are available at https://search.mmseqs.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Computadores , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos Factuales

15.

The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes.

Nucleic Acids Res ; 46(W1): W215-W220, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29846656

RESUMEN

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.

Asunto(s)

Motivos de Nucleótidos , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Animales , Teorema de Bayes , Bases de Datos de Ácidos Nucleicos , Humanos , Internet , Cadenas de Markov , Ratones , Ratas , Análisis de Secuencia , Factores de Transcripción/metabolismo

16.

HH-suite3 for fast remote homology detection and deep protein annotation.

Steinegger, Martin; Meier, Markus; Mirdita, Milot; Vöhringer, Harald; Haunsberger, Stephan J; Söding, Johannes.

BMC Bioinformatics ; 20(1): 473, 2019 Sep 14.

Artículo en Inglés | MEDLINE | ID: mdl-31521110

RESUMEN

BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is â¼10× faster than PSI-BLAST and â¼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite . CONCLUSION: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.

Asunto(s)

Anotación de Secuencia Molecular/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Cadenas de Markov

17.

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Vorberg, Susann; Seemayer, Stefan; Söding, Johannes.

PLoS Comput Biol ; 14(11): e1006526, 2018 11.

Artículo en Inglés | MEDLINE | ID: mdl-30395601

RESUMEN

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Asunto(s)

Proteínas/química , Alineación de Secuencia , Algoritmos , Secuencia de Aminoácidos , Sitios de Unión , Entropía , Ruido , Homología de Secuencia de Aminoácido

18.

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Mirdita, Milot; von den Driesch, Lars; Galiez, Clovis; Martin, Maria J; Söding, Johannes; Steinegger, Martin.

Nucleic Acids Res ; 45(D1): D170-D176, 2017 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-27899574

RESUMEN

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Análisis por Conglomerados , Ontología de Genes , Anotación de Secuencia Molecular , Navegador Web

19.

WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs.

Galiez, Clovis; Siebert, Matthias; Enault, François; Vincent, Jonathan; Söding, Johannes.

Bioinformatics ; 33(19): 3113-3114, 2017 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-28957499

RESUMEN

SUMMARY: WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies. AVAILABILITY AND IMPLEMENTATION: OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish. CONTACT: clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bacteriófagos/genética , Metagenómica/métodos , Programas Informáticos , Archaea/virología , Bacterias/virología , Mapeo Contig

20.

DBIRD complex integrates alternative mRNA splicing with RNA polymerase II transcript elongation.

Close, Pierre; East, Philip; Dirac-Svejstrup, A Barbara; Hartmann, Holger; Heron, Mark; Maslen, Sarah; Chariot, Alain; Söding, Johannes; Skehel, Mark; Svejstrup, Jesper Q.

Nature ; 484(7394): 386-9, 2012 Mar 25.

Artículo en Inglés | MEDLINE | ID: mdl-22446626

RESUMEN

Alternative messenger RNA splicing is the main reason that vast mammalian proteomic complexity can be achieved with a limited number of genes. Splicing is physically and functionally coupled to transcription, and is greatly affected by the rate of transcript elongation. As the nascent pre-mRNA emerges from transcribing RNA polymerase II (RNAPII), it is assembled into a messenger ribonucleoprotein (mRNP) particle; this is the functional form of the nascent pre-mRNA and determines the fate of the mature transcript. However, factors that connect the transcribing polymerase with the mRNP particle and help to integrate transcript elongation with mRNA splicing remain unclear. Here we characterize the human interactome of chromatin-associated mRNP particles. This led us to identify deleted in breast cancer 1 (DBC1) and ZNF326 (which we call ZNF-protein interacting with nuclear mRNPs and DBC1 (ZIRD)) as subunits of a novel protein complex--named DBIRD--that binds directly to RNAPII. DBIRD regulates alternative splicing of a large set of exons embedded in (A + T)-rich DNA, and is present at the affected exons. RNA-interference-mediated DBIRD depletion results in region-specific decreases in transcript elongation, particularly across areas encompassing affected exons. Together, these data indicate that the DBIRD complex acts at the interface between mRNP particles and RNAPII, integrating transcript elongation with the regulation of alternative splicing.

Asunto(s)

Empalme Alternativo , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , ARN Polimerasa II/metabolismo , ARN Mensajero/biosíntesis , ARN Mensajero/genética , Transcripción Genética , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Animales , Proteínas Portadoras/genética , Proteínas Portadoras/metabolismo , Cromatina/genética , Cromatina/metabolismo , Exones/genética , Células HEK293 , Ribonucleoproteínas Nucleares Heterogéneas/deficiencia , Ribonucleoproteínas Nucleares Heterogéneas/metabolismo , Humanos , Ratones , Complejos Multiproteicos/genética , Interferencia de ARN , ARN Mensajero/metabolismo , Ribonucleoproteínas/química , Ribonucleoproteínas/genética , Ribonucleoproteínas/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA