Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 83
Filter
1.
Cell ; 157(5): 1037-49, 2014 May 22.
Article in English | MEDLINE | ID: mdl-24836610

ABSTRACT

RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes.


Subject(s)
Genomic Instability , RecQ Helicases/metabolism , Transcription Elongation, Genetic , Transcription, Genetic , Genome, Human , HEK293 Cells , Humans , RNA Polymerase II/metabolism
2.
Mol Cell ; 78(5): 890-902.e6, 2020 06 04.
Article in English | MEDLINE | ID: mdl-32416068

ABSTRACT

Acidic transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that promote their dynamic behavior, "fuzzy" interactions, and target specificity. We screened a large set of random 30-mer peptides for AD function in yeast and trained a deep neural network (ADpred) on the AD-positive and -negative sequences. ADpred identifies known acidic ADs within transcription factors and accurately predicts the consequences of mutations. Our work reveals that strong acidic ADs contain multiple clusters of hydrophobic residues near acidic side chains, explaining why ADs often have a biased amino acid composition. ADs likely use a binding mechanism similar to avidity where a minimum number of weak dynamic interactions are required between activator and target to generate biologically relevant affinity and in vivo function. This mechanism explains the basis for fuzzy binding observed between acidic ADs and targets.


Subject(s)
High-Throughput Screening Assays/methods , Transcription Factors/genetics , Transcriptional Activation/genetics , Amino Acid Sequence/genetics , Basic-Leucine Zipper Transcription Factors/genetics , DNA-Binding Proteins/metabolism , Deep Learning , Protein Binding , Protein Domains/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Trans-Activators/genetics , Trans-Activators/metabolism , Transcription Factors/metabolism , Transcriptional Activation/physiology
3.
Cell ; 145(1): 54-66, 2011 Apr 01.
Article in English | MEDLINE | ID: mdl-21458667

ABSTRACT

The MR (Mre11 nuclease and Rad50 ABC ATPase) complex is an evolutionarily conserved sensor for DNA double-strand breaks, highly genotoxic lesions linked to cancer development. MR can recognize and process DNA ends even if they are blocked and misfolded. To reveal its mechanism, we determined the crystal structure of the catalytic head of Thermotoga maritima MR and analyzed ATP-dependent conformational changes. MR adopts an open form with a central Mre11 nuclease dimer and two peripheral Rad50 molecules, a form suited for sensing obstructed breaks. The Mre11 C-terminal helix-loop-helix domain binds Rad50 and attaches flexibly to the nuclease domain, enabling large conformational changes. ATP binding to the two Rad50 subunits induces a rotation of the Mre11 helix-loop-helix and Rad50 coiled-coil domains, creating a clamp conformation with increased DNA-binding activity. The results suggest that MR is an ATP-controlled transient molecular clamp at DNA double-strand breaks.


Subject(s)
Adenosine Triphosphate/metabolism , Bacterial Proteins/chemistry , DNA Repair Enzymes/chemistry , DNA Repair , DNA-Binding Proteins/chemistry , Thermotoga maritima/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Crystallography, X-Ray , DNA Breaks, Double-Stranded , DNA Repair Enzymes/genetics , DNA Repair Enzymes/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Endodeoxyribonucleases/chemistry , Endodeoxyribonucleases/metabolism , Exodeoxyribonucleases/chemistry , Exodeoxyribonucleases/metabolism , Models, Molecular , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Scattering, Small Angle , Thermotoga maritima/metabolism , X-Ray Diffraction
4.
Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37933852

ABSTRACT

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Subject(s)
Amino Acids , Proteome , Proteome/chemistry , Databases, Factual
5.
Mol Cell ; 66(1): 38-49.e6, 2017 Apr 06.
Article in English | MEDLINE | ID: mdl-28318822

ABSTRACT

At the end of protein-coding genes, RNA polymerase (Pol) II undergoes a concerted transition that involves 3'-processing of the pre-mRNA and transcription termination. Here, we present a genome-wide analysis of the 3'-transition in budding yeast. We find that the 3'-transition globally requires the Pol II elongation factor Spt5 and factors involved in the recognition of the polyadenylation (pA) site and in endonucleolytic RNA cleavage. Pol II release from DNA occurs in a narrow termination window downstream of the pA site and requires the "torpedo" exonuclease Rat1 (XRN2 in human). The Rat1-interacting factor Rai1 contributes to RNA degradation downstream of the pA site. Defects in the 3'-transition can result in increased transcription at downstream genes.


Subject(s)
DNA, Fungal/metabolism , RNA 3' End Processing , RNA Polymerase II/metabolism , RNA Precursors/biosynthesis , RNA, Fungal/biosynthesis , RNA, Messenger/biosynthesis , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/enzymology , Binding Sites , Chromosomal Proteins, Non-Histone/genetics , Chromosomal Proteins, Non-Histone/metabolism , DNA, Fungal/genetics , Exoribonucleases/genetics , Exoribonucleases/metabolism , Models, Genetic , Protein Binding , RNA Polymerase II/genetics , RNA Precursors/genetics , RNA, Fungal/genetics , RNA, Messenger/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Transcriptional Elongation Factors/genetics , Transcriptional Elongation Factors/metabolism , mRNA Cleavage and Polyadenylation Factors/genetics , mRNA Cleavage and Polyadenylation Factors/metabolism
6.
Mol Cell ; 61(2): 247-59, 2016 Jan 21.
Article in English | MEDLINE | ID: mdl-26778125

ABSTRACT

Post-translational histone modifications and linker histone incorporation regulate chromatin structure and genome activity. How these systems interface on a molecular level is unclear. Using biochemistry and NMR spectroscopy, we deduced mechanistic insights into the modification behavior of N-terminal histone H3 tails in different nucleosomal contexts. We find that linker histones generally inhibit modifications of different H3 sites and reduce H3 tail dynamics in nucleosomes. These effects are caused by modulations of electrostatic interactions of H3 tails with linker DNA and largely depend on the C-terminal domains of linker histones. In agreement, linker histone occupancy and H3 tail modifications segregate on a genome-wide level. Charge-modulating modifications such as phosphorylation and acetylation weaken transient H3 tail-linker DNA interactions, increase H3 tail dynamics, and, concomitantly, enhance general modifiability. We propose that alterations of H3 tail-linker DNA interactions by linker histones and charge-modulating modifications execute basal control mechanisms of chromatin function.


Subject(s)
DNA/metabolism , Histones/metabolism , Nucleosomes/metabolism , Protein Processing, Post-Translational , Acetylation , Amino Acid Sequence , Animals , Genome , Histones/chemistry , Molecular Sequence Data , Phosphorylation , Protein Binding , Xenopus laevis
7.
Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33119734

ABSTRACT

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Subject(s)
Amino Acids/chemistry , Databases, Protein , Genome , Proteins/genetics , Proteome/genetics , Software , Amino Acid Sequence , Amino Acids/metabolism , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Binding Sites , Conserved Sequence , Fungi/genetics , Fungi/metabolism , Humans , Internet , Plants/genetics , Plants/metabolism , Prokaryotic Cells/metabolism , Protein Binding , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification , Proteins/metabolism , Proteome/chemistry , Proteome/metabolism , Sequence Analysis, Protein , Viruses/genetics , Viruses/metabolism
8.
Nat Methods ; 16(7): 603-606, 2019 07.
Article in English | MEDLINE | ID: mdl-31235882

ABSTRACT

The open-source de novo protein-level assembler, Plass ( https://plass.mmseqs.com ), assembles six-frame-translated sequencing reads into protein sequences. It recovers 2-10 times more protein sequences from complex metagenomes and can assemble huge datasets. We assembled two redundancy-filtered reference protein catalogs, 2 billion sequences from 640 soil samples (soil reference protein catalog) and 292 million sequences from 775 marine eukaryotic metatranscriptomes (marine eukaryotic reference catalog), the largest free collections of protein sequences.


Subject(s)
Metagenomics , Proteins/chemistry , Amino Acid Sequence , Codon , Open Reading Frames
9.
Bioinformatics ; 37(Suppl_1): i308-i316, 2021 07 12.
Article in English | MEDLINE | ID: mdl-34252974

ABSTRACT

MOTIVATION: Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. RESULTS: We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions. AVAILABILITY AND IMPLEMENTATION: BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
RNA-Binding Proteins , Software , Binding Sites , Protein Binding , RNA/metabolism , RNA-Binding Proteins/metabolism , Thermodynamics
10.
Bioinformatics ; 37(19): 3364-3366, 2021 Oct 11.
Article in English | MEDLINE | ID: mdl-33792634

ABSTRACT

SUMMARY: SpacePHARER (CRISPR Spacer Phage-Host Pair Finder) is a sensitive and fast tool for de novo prediction of phage-host relationships via identifying phage genomes that match CRISPR spacers in genomic or metagenomic data. SpacePHARER gains sensitivity by comparing spacers and phages at the protein level, optimizing its scores for matching very short sequences, and combining evidence from multiple matches, while controlling for false positives. We demonstrate SpacePHARER by searching a comprehensive spacer list against all complete phage genomes. AVAILABILITY AND IMPLEMENTATION: SpacePHARER is available as an open-source (GPLv3), user-friendly command-line software for Linux and macOS: https://github.com/soedinglab/spacepharer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Mol Cell ; 55(5): 745-57, 2014 Sep 04.
Article in English | MEDLINE | ID: mdl-25192364

ABSTRACT

Biogenesis of eukaryotic messenger ribonucleoprotein complexes (mRNPs) involves the synthesis, splicing, and 3' processing of pre-mRNA, and the assembly of mature mRNPs for nuclear export. We mapped 23 mRNP biogenesis factors onto the yeast transcriptome, providing 10(4)-10(6) high-confidence RNA interaction sites per factor. The data reveal how mRNP biogenesis factors recognize pre-mRNA elements in vivo. They define conserved interactions between splicing factors and pre-mRNA introns, including the recognition of intron-exon junctions and the branchpoint. They also identify a unified arrangement of 3' processing factors at pre-mRNA polyadenylation (pA) sites in yeast and human, which results from an A-U sequence bias at pA sites. Global data analysis indicates that 3' processing factors have roles in splicing and RNA surveillance, and that they couple mRNP biogenesis events to restrict nuclear export to mature mRNPs.


Subject(s)
Models, Genetic , RNA Precursors/metabolism , RNA, Messenger/metabolism , Ribonucleoproteins/biosynthesis , Active Transport, Cell Nucleus , Gene Expression Profiling , Humans , Introns , RNA Precursors/chemistry , RNA Splicing , RNA, Messenger/chemistry , Saccharomyces cerevisiae/genetics
12.
PLoS Genet ; 14(12): e1007856, 2018 12.
Article in English | MEDLINE | ID: mdl-30596640

ABSTRACT

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.


Subject(s)
Bayes Theorem , Genome-Wide Association Study/statistics & numerical data , Logistic Models , Models, Genetic , Case-Control Studies , Computer Simulation , Coronary Artery Disease/genetics , Genetic Variation , Humans , Likelihood Functions , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide , Software
13.
Bioinformatics ; 35(18): 3517-3519, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30715210

ABSTRACT

SUMMARY: Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. PROSSTT can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees. AVAILABILITY AND IMPLEMENTATION: https://github.com/soedinglab/prosstt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Cell Differentiation , Gene Expression Profiling , RNA-Seq , Single-Cell Analysis
14.
Bioinformatics ; 35(16): 2856-2858, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30615063

ABSTRACT

SUMMARY: The MMseqs2 desktop and web server app facilitates interactive sequence searches through custom protein sequence and profile databases on personal workstations. By eliminating MMseqs2's runtime overhead, we reduced response times to a few seconds at sensitivities close to BLAST. AVAILABILITY AND IMPLEMENTATION: The app is easy to install for non-experts. GPLv3-licensed code, pre-built desktop app packages for Windows, MacOS and Linux, Docker images for the web server application and a demo web server are available at https://search.mmseqs.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computers , Software , Amino Acid Sequence , Databases, Factual
15.
Nucleic Acids Res ; 46(W1): W215-W220, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29846656

ABSTRACT

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.


Subject(s)
Nucleotide Motifs , Regulatory Sequences, Nucleic Acid , Software , Animals , Bayes Theorem , Databases, Nucleic Acid , Humans , Internet , Markov Chains , Mice , Rats , Sequence Analysis , Transcription Factors/metabolism
16.
BMC Bioinformatics ; 20(1): 473, 2019 Sep 14.
Article in English | MEDLINE | ID: mdl-31521110

ABSTRACT

BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous proteins. RESULTS: We developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. These accelerated the search methods HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ∼10× faster than PSI-BLAST and ∼20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over cluster servers using OpenMP and message passing interface (MPI). The free, open-source, GPLv3-licensed software is available at https://github.com/soedinglab/hh-suite . CONCLUSION: The added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.


Subject(s)
Molecular Sequence Annotation/methods , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Algorithms , Markov Chains
17.
PLoS Comput Biol ; 14(11): e1006526, 2018 11.
Article in English | MEDLINE | ID: mdl-30395601

ABSTRACT

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.


Subject(s)
Proteins/chemistry , Sequence Alignment , Algorithms , Amino Acid Sequence , Binding Sites , Entropy , Noise , Sequence Homology, Amino Acid
18.
Nucleic Acids Res ; 45(D1): D170-D176, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899574

ABSTRACT

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Software , Cluster Analysis , Gene Ontology , Molecular Sequence Annotation , Web Browser
19.
Bioinformatics ; 33(19): 3113-3114, 2017 Oct 01.
Article in English | MEDLINE | ID: mdl-28957499

ABSTRACT

SUMMARY: WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies. AVAILABILITY AND IMPLEMENTATION: OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish. CONTACT: clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Bacteriophages/genetics , Metagenomics/methods , Software , Archaea/virology , Bacteria/virology , Contig Mapping
20.
Nature ; 484(7394): 386-9, 2012 Mar 25.
Article in English | MEDLINE | ID: mdl-22446626

ABSTRACT

Alternative messenger RNA splicing is the main reason that vast mammalian proteomic complexity can be achieved with a limited number of genes. Splicing is physically and functionally coupled to transcription, and is greatly affected by the rate of transcript elongation. As the nascent pre-mRNA emerges from transcribing RNA polymerase II (RNAPII), it is assembled into a messenger ribonucleoprotein (mRNP) particle; this is the functional form of the nascent pre-mRNA and determines the fate of the mature transcript. However, factors that connect the transcribing polymerase with the mRNP particle and help to integrate transcript elongation with mRNA splicing remain unclear. Here we characterize the human interactome of chromatin-associated mRNP particles. This led us to identify deleted in breast cancer 1 (DBC1) and ZNF326 (which we call ZNF-protein interacting with nuclear mRNPs and DBC1 (ZIRD)) as subunits of a novel protein complex--named DBIRD--that binds directly to RNAPII. DBIRD regulates alternative splicing of a large set of exons embedded in (A + T)-rich DNA, and is present at the affected exons. RNA-interference-mediated DBIRD depletion results in region-specific decreases in transcript elongation, particularly across areas encompassing affected exons. Together, these data indicate that the DBIRD complex acts at the interface between mRNP particles and RNAPII, integrating transcript elongation with the regulation of alternative splicing.


Subject(s)
Alternative Splicing , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , RNA Polymerase II/metabolism , RNA, Messenger/biosynthesis , RNA, Messenger/genetics , Transcription, Genetic , Adaptor Proteins, Signal Transducing/genetics , Adaptor Proteins, Signal Transducing/metabolism , Animals , Carrier Proteins/genetics , Carrier Proteins/metabolism , Chromatin/genetics , Chromatin/metabolism , Exons/genetics , HEK293 Cells , Heterogeneous-Nuclear Ribonucleoproteins/deficiency , Heterogeneous-Nuclear Ribonucleoproteins/metabolism , Humans , Mice , Multiprotein Complexes/genetics , RNA Interference , RNA, Messenger/metabolism , Ribonucleoproteins/chemistry , Ribonucleoproteins/genetics , Ribonucleoproteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL