Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
Proc Natl Acad Sci U S A ; 121(7): e2320240121, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38315865

ABSTRACT

DNA structure can regulate genome function. Four-stranded DNA G-quadruplex (G4) structures have been implicated in transcriptional regulation; however, previous studies have not directly addressed the role of an individual G4 within its endogenous cellular context. Using CRISPR to genetically abrogate endogenous G4 structure folding, we directly interrogate the G4 found within the upstream regulatory region of the critical human MYC oncogene. G4 loss leads to suppression of MYC transcription from the P1 promoter that is mediated by the deposition of a de novo nucleosome alongside alterations in RNA polymerase recruitment. We also show that replacement of the endogenous MYC G4 with a different G4 structure from the KRAS oncogene restores G4 folding and MYC transcription. Moreover, we demonstrate that the MYC G4 structure itself, rather than its sequence, recruits transcription factors and histone modifiers. Overall, our work establishes that G4 structures are important features of transcriptional regulation that coordinate recruitment of key chromatin proteins and the transcriptional machinery through interactions with DNA secondary structure, rather than primary sequence.


Subject(s)
G-Quadruplexes , Proto-Oncogene Proteins c-myc , Humans , DNA/metabolism , Gene Expression Regulation , Promoter Regions, Genetic/genetics , Transcription Factors/metabolism , Proto-Oncogene Proteins c-myc/genetics
2.
Nat Genet ; 52(12): 1364-1372, 2020 12.
Article in English | MEDLINE | ID: mdl-33230297

ABSTRACT

Inappropriate stimulation or defective negative regulation of the type I interferon response can lead to autoinflammation. In genetically uncharacterized cases of the type I interferonopathy Aicardi-Goutières syndrome, we identified biallelic mutations in LSM11 and RNU7-1, which encode components of the replication-dependent histone pre-mRNA-processing complex. Mutations were associated with the misprocessing of canonical histone transcripts and a disturbance of linker histone stoichiometry. Additionally, we observed an altered distribution of nuclear cyclic guanosine monophosphate-adenosine monophosphate synthase (cGAS) and enhanced interferon signaling mediated by the cGAS-stimulator of interferon genes (STING) pathway in patient-derived fibroblasts. Finally, we established that chromatin without linker histone stimulates cyclic guanosine monophosphate-adenosine monophosphate (cGAMP) production in vitro more efficiently. We conclude that nuclear histones, as key constituents of chromatin, are essential in suppressing the immunogenicity of self-DNA.


Subject(s)
Chromatin/metabolism , Histones/metabolism , Interferon Type I/biosynthesis , RNA Precursors/metabolism , RNA-Binding Proteins/genetics , Ribonucleoprotein, U7 Small Nuclear/genetics , Autoimmune Diseases of the Nervous System/genetics , Autoimmune Diseases of the Nervous System/immunology , Cell Line , DNA/immunology , Gene Expression Regulation/genetics , Gene Expression Regulation/immunology , HCT116 Cells , HEK293 Cells , Hereditary Autoinflammatory Diseases/genetics , Hereditary Autoinflammatory Diseases/immunology , Humans , Membrane Proteins/metabolism , Nervous System Malformations/genetics , Nervous System Malformations/immunology , Nucleotides, Cyclic/biosynthesis , Nucleotidyltransferases/metabolism
3.
Mol Cell ; 76(4): 600-616.e6, 2019 11 21.
Article in English | MEDLINE | ID: mdl-31679819

ABSTRACT

Widespread antisense long noncoding RNA (lncRNA) overlap with many protein-coding genes in mammals and emanate from gene promoter, enhancer, and termination regions. However, their origin and biological purpose remain unclear. We show that these antisense lncRNA can be generated by R-loops that form when nascent transcript invades the DNA duplex behind elongating RNA polymerase II (Pol II). Biochemically, R-loops act as intrinsic Pol II promoters to induce de novo RNA synthesis. Furthermore, their removal across the human genome by RNase H1 overexpression causes the selective reduction of antisense transcription. Consequently, we predict that R-loops act to facilitate the synthesis of many gene proximal antisense lncRNA. Not only are R-loops widely associated with DNA damage and repair, but we now show that they have the capacity to promote de novo transcript synthesis that may have aided the evolution of gene regulation.


Subject(s)
Genome, Human , Promoter Regions, Genetic , R-Loop Structures , RNA, Antisense/biosynthesis , RNA, Long Noncoding/biosynthesis , Transcription, Genetic , Transcriptional Activation , HEK293 Cells , HeLa Cells , Humans , RNA, Antisense/genetics , RNA, Long Noncoding/genetics , Ribonuclease H/metabolism , Structure-Activity Relationship
4.
Elife ; 72018 12 03.
Article in English | MEDLINE | ID: mdl-30507380

ABSTRACT

Replication-dependent (RD) core histone mRNA produced during S-phase is the only known metazoan protein-coding mRNA presenting a 3' stem-loop instead of the otherwise universal polyA tail. A metallo ß-lactamase (MBL) fold enzyme, cleavage and polyadenylation specificity factor 73 (CPSF73), is proposed to be the sole endonuclease responsible for 3' end processing of both mRNA classes. We report cellular, genetic, biochemical, substrate selectivity, and crystallographic studies providing evidence that an additional endoribonuclease, MBL domain containing protein 1 (MBLAC1), is selective for 3' processing of RD histone pre-mRNA during the S-phase of the cell cycle. Depletion of MBLAC1 in cells significantly affects cell cycle progression thus identifying MBLAC1 as a new type of S-phase-specific cancer target.


Subject(s)
Endoribonucleases/chemistry , Histones/biosynthesis , RNA, Messenger/biosynthesis , Amino Acid Sequence , Binding Sites , Cloning, Molecular , Crystallography, X-Ray , Endoribonucleases/genetics , Endoribonucleases/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression , Genetic Vectors/chemistry , Genetic Vectors/metabolism , HEK293 Cells , HeLa Cells , Histones/genetics , Humans , Hydrolases , Kinetics , Models, Molecular , Mutagenesis, Site-Directed , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , RNA, Messenger/genetics , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , S Phase Cell Cycle Checkpoints , Sequence Alignment , Sequence Homology, Amino Acid , Substrate Specificity , beta-Lactamases/chemistry , beta-Lactamases/genetics , beta-Lactamases/metabolism
5.
Mol Cell ; 72(6): 970-984.e7, 2018 12 20.
Article in English | MEDLINE | ID: mdl-30449723

ABSTRACT

Extensive tracts of the mammalian genome that lack protein-coding function are still transcribed into long noncoding RNA. While these lncRNAs are generally short lived, length restricted, and non-polyadenylated, how their expression is distinguished from protein-coding genes remains enigmatic. Surprisingly, depletion of the ubiquitous Pol-II-associated transcription elongation factor SPT6 promotes a redistribution of H3K36me3 histone marks from active protein coding to lncRNA genes, which correlates with increased lncRNA transcription. SPT6 knockdown also impairs the recruitment of the Integrator complex to chromatin, which results in a transcriptional termination defect for lncRNA genes. This leads to the formation of extended, polyadenylated lncRNAs that are both chromatin restricted and form increased levels of RNA:DNA hybrid (R-loops) that are associated with DNA damage. Additionally, these deregulated lncRNAs overlap with DNA replication origins leading to localized DNA replication stress and a cellular senescence phenotype. Overall, our results underline the importance of restricting lncRNA expression.


Subject(s)
Cell Proliferation , Cellular Senescence , DNA Damage , DNA Replication , DNA, Neoplasm/biosynthesis , RNA, Long Noncoding/metabolism , RNA, Neoplasm/metabolism , Transcription Factors/metabolism , Uterine Neoplasms/metabolism , Animals , Chromatin Assembly and Disassembly , DNA Polymerase II/genetics , DNA Polymerase II/metabolism , DNA, Neoplasm/genetics , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Female , Gene Expression Regulation, Neoplastic , HeLa Cells , Histones/metabolism , Humans , Methylation , Nucleic Acid Conformation , Nucleic Acid Heteroduplexes/genetics , Nucleic Acid Heteroduplexes/metabolism , RNA Stability , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Neoplasm/genetics , Transcription Factors/genetics , Transcription, Genetic , Uterine Neoplasms/genetics
6.
Nature ; 560(7717): 238-242, 2018 08.
Article in English | MEDLINE | ID: mdl-30046113

ABSTRACT

Mitochondria are descendants of endosymbiotic bacteria and retain essential prokaryotic features such as a compact circular genome. Consequently, in mammals, mitochondrial DNA is subjected to bidirectional transcription that generates overlapping transcripts, which are capable of forming long double-stranded RNA structures1,2. However, to our knowledge, mitochondrial double-stranded RNA has not been previously characterized in vivo. Here we describe the presence of a highly unstable native mitochondrial double-stranded RNA species at single-cell level and identify key roles for the degradosome components mitochondrial RNA helicase SUV3 and polynucleotide phosphorylase PNPase in restricting the levels of mitochondrial double-stranded RNA. Loss of either enzyme results in massive accumulation of mitochondrial double-stranded RNA that escapes into the cytoplasm in a PNPase-dependent manner. This process engages an MDA5-driven antiviral signalling pathway that triggers a type I interferon response. Consistent with these data, patients carrying hypomorphic mutations in the gene PNPT1, which encodes PNPase, display mitochondrial double-stranded RNA accumulation coupled with upregulation of interferon-stimulated genes and other markers of immune activation. The localization of PNPase to the mitochondrial inter-membrane space and matrix suggests that it has a dual role in preventing the formation and release of mitochondrial double-stranded RNA into the cytoplasm. This in turn prevents the activation of potent innate immune defence mechanisms that have evolved to protect vertebrates against microbial and viral attack.


Subject(s)
Herpesvirus 1, Human/immunology , RNA, Double-Stranded/immunology , RNA, Mitochondrial/immunology , Animals , DEAD-box RNA Helicases/deficiency , DEAD-box RNA Helicases/genetics , DEAD-box RNA Helicases/metabolism , Endoribonucleases/metabolism , Exoribonucleases/deficiency , Exoribonucleases/genetics , Exoribonucleases/metabolism , Gene Expression Regulation/immunology , HeLa Cells , Herpesvirus 1, Human/genetics , Humans , Interferon Type I/antagonists & inhibitors , Interferon Type I/immunology , Interferon-Induced Helicase, IFIH1/metabolism , Mice , Mice, Inbred C57BL , Multienzyme Complexes/metabolism , Mutation , Polyribonucleotide Nucleotidyltransferase/metabolism , RNA Helicases/metabolism , Single-Cell Analysis , bcl-2 Homologous Antagonist-Killer Protein/metabolism , bcl-2-Associated X Protein/metabolism
7.
Mol Cell ; 70(4): 650-662.e8, 2018 05 17.
Article in English | MEDLINE | ID: mdl-29731414

ABSTRACT

Class switch recombination (CSR) at the immunoglobulin heavy-chain (IgH) locus is associated with the formation of R-loop structures over switch (S) regions. While these often occur co-transcriptionally between nascent RNA and template DNA, we now show that they also form as part of a post-transcriptional mechanism targeting AID to IgH S-regions. This depends on the RNA helicase DDX1 that is also required for CSR in vivo. DDX1 binds to G-quadruplex (G4) structures present in intronic switch transcripts and converts them into S-region R-loops. This in turn targets the cytidine deaminase enzyme AID to S-regions so promoting CSR. Notably R-loop levels over S-regions are diminished by chemical stabilization of G4 RNA or by the expression of a DDX1 ATPase-deficient mutant that acts as a dominant-negative protein to reduce CSR efficiency. In effect, we provide evidence for how S-region transcripts interconvert between G4 and R-loop structures to promote CSR in the IgH locus.


Subject(s)
Adenosine Triphosphatases/metabolism , DEAD-box RNA Helicases/physiology , G-Quadruplexes , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Switch Region/genetics , RNA/chemistry , Adenosine Triphosphatases/genetics , Animals , B-Lymphocytes/cytology , B-Lymphocytes/metabolism , Cytidine Deaminase/genetics , Cytidine Deaminase/metabolism , DNA Replication , Immunoglobulin Class Switching , Immunoglobulin Heavy Chains/chemistry , Immunoglobulin Heavy Chains/metabolism , Mice , Mice, Inbred C57BL , Mice, Knockout , RNA/genetics , Recombination, Genetic
8.
Nat Commun ; 9(1): 1783, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29725044

ABSTRACT

Small nucleolar RNA (snoRNA) are conserved and essential non-coding RNA that are transcribed by RNA Polymerase II (Pol II). Two snoRNA classes, formerly distinguished by their structure and ribonucleoprotein composition, act as guide RNA to target RNA such as ribosomal RNA, and thereby introduce specific modifications. We have studied the 5'end processing of individually transcribed snoRNA in S. cerevisiae to define their role in snoRNA biogenesis and functionality. Here we show that pre-snoRNA processing by the endonuclease Rnt1 occurs co-transcriptionally with removal of the m7G cap facilitating the formation of box C/D snoRNA. Failure of this process causes aberrant 3'end processing and mislocalization of snoRNA to the cytoplasm. Consequently, Rnt1-dependent 5'end processing of box C/D snoRNA is critical for snoRNA-dependent methylation of ribosomal RNA. Our results reveal that the 5'end processing of box C/D snoRNA defines their distinct pathway of maturation.


Subject(s)
Cell Nucleus/metabolism , RNA, Fungal/genetics , RNA, Small Nucleolar/metabolism , Saccharomyces cerevisiae/genetics , Cytoplasm/metabolism , Methylation , RNA Caps , RNA Processing, Post-Transcriptional , RNA, Fungal/metabolism , Ribonuclease III/genetics , Ribonuclease III/metabolism , Ribonucleoproteins, Small Nucleolar/genetics , Ribonucleoproteins, Small Nucleolar/metabolism , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
9.
Mol Cell ; 70(2): 312-326.e7, 2018 04 19.
Article in English | MEDLINE | ID: mdl-29656924

ABSTRACT

Many non-coding transcripts (ncRNA) generated by RNA polymerase II in S. cerevisiae are terminated by the Nrd1-Nab3-Sen1 complex. However, Sen1 helicase levels are surprisingly low compared with Nrd1 and Nab3, raising questions regarding how ncRNA can be terminated in an efficient and timely manner. We show that Sen1 levels increase during the S and G2 phases of the cell cycle, leading to increased termination activity of NNS. Overexpression of Sen1 or failure to modulate its abundance by ubiquitin-proteasome-mediated degradation greatly decreases cell fitness. Sen1 toxicity is suppressed by mutations in other termination factors, and NET-seq analysis shows that its overexpression leads to a decrease in ncRNA production and altered mRNA termination. We conclude that Sen1 levels are carefully regulated to prevent aberrant termination. We suggest that ncRNA levels and coding gene transcription termination are modulated by Sen1 to fulfill critical cell cycle-specific functions.


Subject(s)
DNA Helicases/metabolism , G1 Phase Cell Cycle Checkpoints , Gene Expression Regulation, Fungal , RNA Helicases/metabolism , RNA, Fungal/biosynthesis , RNA, Messenger/biosynthesis , RNA, Untranslated/biosynthesis , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Transcription Termination, Genetic , DNA Helicases/genetics , Microbial Viability , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Proteasome Endopeptidase Complex/metabolism , Proteolysis , RNA Helicases/genetics , RNA, Fungal/genetics , RNA, Messenger/genetics , RNA, Untranslated/genetics , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae Proteins/genetics , Ubiquitination
11.
Cell ; 161(3): 526-540, 2015 Apr 23.
Article in English | MEDLINE | ID: mdl-25910207

ABSTRACT

Transcription is a highly dynamic process. Consequently, we have developed native elongating transcript sequencing technology for mammalian chromatin (mNET-seq), which generates single-nucleotide resolution, nascent transcription profiles. Nascent RNA was detected in the active site of RNA polymerase II (Pol II) along with associated RNA processing intermediates. In particular, we detected 5'splice site cleavage by the spliceosome, showing that cleaved upstream exon transcripts are associated with Pol II CTD phosphorylated on the serine 5 position (S5P), which is accumulated over downstream exons. Also, depletion of termination factors substantially reduces Pol II pausing at gene ends, leading to termination defects. Notably, termination factors play an additional promoter role by restricting non-productive RNA synthesis in a Pol II CTD S2P-specific manner. Our results suggest that CTD phosphorylation patterns established for yeast transcription are significantly different in mammals. Taken together, mNET-seq provides dynamic and detailed snapshots of the complex events underlying transcription in mammals.


Subject(s)
Genome, Human , RNA Processing, Post-Transcriptional , Transcription, Genetic , HeLa Cells , Humans , MicroRNAs/metabolism , Phosphorylation , Protein Structure, Tertiary , RNA Polymerase II/chemistry , RNA Polymerase II/metabolism , Sequence Analysis, RNA/methods
12.
Nat Struct Mol Biol ; 22(4): 319-27, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25730776

ABSTRACT

MicroRNAs (miRNAs) play a major part in the post-transcriptional regulation of gene expression. Mammalian miRNA biogenesis begins with cotranscriptional cleavage of RNA polymerase II (Pol II) transcripts by the Microprocessor complex. Although most miRNAs are located within introns of protein-coding transcripts, a substantial minority of miRNAs originate from long noncoding (lnc) RNAs, for which transcript processing is largely uncharacterized. We show, by detailed characterization of liver-specific lnc-pri-miR-122 and genome-wide analysis in human cell lines, that most lncRNA transcripts containing miRNAs (lnc-pri-miRNAs) do not use the canonical cleavage-and-polyadenylation pathway but instead use Microprocessor cleavage to terminate transcription. Microprocessor inactivation leads to extensive transcriptional readthrough of lnc-pri-miRNA and transcriptional interference with downstream genes. Consequently we define a new RNase III-mediated, polyadenylation-independent mechanism of Pol II transcription termination in mammalian cells.


Subject(s)
MicroRNAs/metabolism , Models, Genetic , RNA Processing, Post-Transcriptional , RNA, Long Noncoding/metabolism , Transcription, Genetic , Gene Expression Regulation , HeLa Cells , Humans , MicroRNAs/chemistry
13.
Curr Protein Pept Sci ; 11(7): 538-49, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20887262

ABSTRACT

SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.


Subject(s)
Data Mining , Databases, Protein , Proteins/chemistry , Algorithms , Humans , Neural Networks, Computer , Online Systems , Protein Structure, Tertiary , Proteins/classification , ROC Curve , Sequence Homology, Amino Acid
14.
Curr Protein Pept Sci ; 11(7): 515-22, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20887264

ABSTRACT

The emerging role of internal dynamics in protein fold and function requires new avenues of structure analysis. We analyzed the dynamically restrained conformational ensemble of ubiquitin generated from residual dipolar coupling data, in terms of protruding and buried atoms as well as interatomic distances, using four proximity-based algorithms, CX, DPX, PRIDE and PRIDE-NMR (http://hydra.icgeb.trieste.it/protein/). We found that Ubiquitin, this relatively rigid molecule has a highly diverse dynamic ensemble. The environment of protruding atoms is highly variable across conformers, on the other hand, only a part of buried atoms tends to fluctuate. The variability of the ensemble cautions against the use of single conformers when explaining functional phenomena. We also give a detailed evaluation of PRIDE-NMR on a wide dataset and discuss its usage in the light of the features of available NMR distance restraint sets in public databases.


Subject(s)
Ubiquitin/chemistry , Animals , Computer Simulation , Magnetic Resonance Spectroscopy , Models, Molecular , Online Systems , Principal Component Analysis , Protein Conformation
15.
Bioinformatics ; 26(19): 2482-3, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20679333

ABSTRACT

UNLABELLED: Multi-netclust is a simple tool that allows users to extract connected clusters of data represented by different networks given in the form of matrices. The tool uses user-defined threshold values to combine the matrices, and uses a straightforward, memory-efficient graph algorithm to find clusters that are connected in all or in either of the networks. The tool is written in C/C++ and is available either as a form-based or as a command-line-based program running on Linux platforms. The algorithm is fast, processing a network of > 10(6) nodes and 10(8) edges takes only a few minutes on an ordinary computer. AVAILABILITY: http://www.bioinformatics.nl/netclust/.


Subject(s)
Cluster Analysis , Software , Algorithms , Databases, Factual , User-Computer Interface
16.
Nucleus ; 1(1): 8-11, 2010.
Article in English | MEDLINE | ID: mdl-21327098

ABSTRACT

Retroviruses integrate their genome into the chromatin of the host cell and are subject to the same control mechanisms governing transcription in the nucleus. There is increasing evidence that the spatial position of a gene within the nucleus in time affects its activity. Therefore it becomes important to study the chromatin environment in space and time of the HIV-1 provirus, particularly in cells where a tight transcriptional control allows the virus to hide away from antiviral treatment and immune response. We recently showed that the HIV-1 provirus is found at the nuclear periphery of latently infected lymphocytes associated in trans with centromeric heterochromatin. After induction of transcription, this association was lost, although the location of the transcribing provirus remained peripheral. Our results reveal a novel mechanism of transcriptional silencing involved in HIV-1 post-transcriptional latency and open wider perspectives for the general organization of chromatin in the nucleus.


Subject(s)
Cell Nucleus/virology , HIV/genetics , Cell Nucleus/metabolism , Chromatin/metabolism , HIV/metabolism , HIV Infections/genetics , HIV Infections/virology , Humans , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , T-Lymphocytes/virology , Transcription, Genetic , Virus Latency/genetics
17.
FEBS J ; 276(21): 6247-57, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19780835

ABSTRACT

Notch signaling controls spatial patterning and cell-fate decisions in all metazoans. Mutations in JAG1, one of the five Notch ligands in man, have been associated with Alagille syndrome and with a familial form of tetralogy of Fallot. A specific G274D mutation in the second epidermal growth factor repeat of the Jagged-1 was found to correlate with tetralogy of Fallot symptoms but not with usual Alagille syndrome phenotypes. To investigate the effects of this mutation, we studied the in vitro oxidative folding of the wild-type and mutant peptides encompassing the second epidermal growth factor. We found that the G274D mutation strongly impairs the correct folding of the epidermal growth factor module, and folding cannot be rescued by compensative mutations. The 274 position displays very low tolerance to substitution because neither the G274S nor the G274A mutants could be refolded in vitro. A sequence comparison of epidermal growth factor repeats found in human proteins revealed that the pattern displayed by the second epidermal growth factor is exclusively found in Notch ligands and that G274 is absolutely conserved within this group. We carried out a systematic and comprehensive analysis of mutations found in epidermal growth factor repeats and show that specific residue requirements for folding, structural integrity and correct post-translational processing may provide a rationale for most of the disease-associated mutations.


Subject(s)
Alagille Syndrome/genetics , Calcium-Binding Proteins/chemistry , Epidermal Growth Factor/chemistry , Intercellular Signaling Peptides and Proteins/chemistry , Membrane Proteins/chemistry , Mutation , Protein Folding , Tetralogy of Fallot/genetics , Amino Acid Sequence , Animals , Calcium-Binding Proteins/genetics , Humans , Intercellular Signaling Peptides and Proteins/genetics , Jagged-1 Protein , Membrane Proteins/genetics , Mice , Molecular Sequence Data , NIH 3T3 Cells , Serrate-Jagged Proteins , Tandem Repeat Sequences
18.
BMC Struct Biol ; 9: 43, 2009 Jul 08.
Article in English | MEDLINE | ID: mdl-19586525

ABSTRACT

BACKGROUND: Notch signaling drives developmental processes in all metazoans. The receptor binding region of the human Notch ligand Jagged-1 is made of a DSL (Delta/Serrate/Lag-2) domain and two atypical epidermal growth factor (EGF) repeats encoded by two exons, exon 5 and 6, which are out of phase with respect to the EGF domain boundaries. RESULTS: We determined the 1H-NMR solution structure of the polypeptide encoded by exon 6 of JAG1 and spanning the C-terminal region of EGF1 and the entire EGF2. We show that this single, evolutionary conserved exon defines an autonomous structural unit that, despite the minimal structural context, closely matches the structure of the same region in the entire receptor binding module. CONCLUSION: In eukaryotic genomes, exon and domain boundaries usually coincide. We report a case study where this assertion does not hold, and show that the autonomously folding, structural unit is delimited by exon boundaries, rather than by predicted domain boundaries.


Subject(s)
Calcium-Binding Proteins/chemistry , Exons , Intercellular Signaling Peptides and Proteins/chemistry , Membrane Proteins/chemistry , Amino Acid Sequence , Calcium-Binding Proteins/genetics , Computer Simulation , Crystallography, X-Ray , Epidermal Growth Factor/chemistry , Humans , Intercellular Signaling Peptides and Proteins/genetics , Intracellular Signaling Peptides and Proteins , Jagged-1 Protein , Membrane Proteins/genetics , Molecular Sequence Data , Protein Structure, Tertiary , Receptors, Notch/chemistry , Serrate-Jagged Proteins
19.
J Biochem Biophys Methods ; 70(6): 1215-23, 2008 Apr 24.
Article in English | MEDLINE | ID: mdl-17604112

ABSTRACT

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.


Subject(s)
Algorithms , Proteins/analysis , Proteins/classification , Proteins/chemistry , Sequence Analysis, Protein
20.
Nucleic Acids Res ; 35(Database issue): D232-6, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17142240

ABSTRACT

Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms.


Subject(s)
Artificial Intelligence , Databases, Protein , Proteins/classification , Algorithms , Internet , Protein Structure, Tertiary , Proteins/chemistry , Reproducibility of Results , Sequence Analysis, Protein , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...