Search | VHL Search Portal

CATH: increased structural coverage of functional space.

Sillitoe, Ian; Bordin, Nicola; Dawson, Natalie; Waman, Vaishali P; Ashford, Paul; Scholes, Harry M; Pang, Camilla S M; Woodridge, Laurel; Rauer, Clemens; Sen, Neeladri; Abbasian, Mahnaz; Le Cornu, Sean; Lam, Su Datt; Berka, Karel; Varekova, Ivana Hutarová; Svobodova, Radka; Lees, Jon; Orengo, Christine A.

Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33237325

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

Subject(s)

Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism

CATH functional families predict functional sites in proteins.

Das, Sayoni; Scholes, Harry M; Sen, Neeladri; Orengo, Christine.

Bioinformatics ; 37(8): 1099-1106, 2021 05 23.

Article in English | MEDLINE | ID: mdl-33135053

ABSTRACT

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION: https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Machine Learning , Proteins , Amino Acid Sequence , Biological Evolution , Humans , Proteins/genetics

CATH: expanding the horizons of structure-based functional annotations for genome sequences.

Sillitoe, Ian; Dawson, Natalie; Lewis, Tony E; Das, Sayoni; Lees, Jonathan G; Ashford, Paul; Tolulope, Adeyelu; Scholes, Harry M; Senatorov, Ilya; Bujan, Andra; Ceballos Rodriguez-Conde, Fatima; Dowling, Benjamin; Thornton, Janet; Orengo, Christine A.

Nucleic Acids Res ; 47(D1): D280-D284, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30398663

ABSTRACT

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.

Subject(s)

Databases, Protein , Genome , Amino Acid Sequence , Animals , Conserved Sequence , Gene Ontology , Humans , Models, Molecular , Molecular Sequence Annotation , Multigene Family/genetics , Protein Conformation , Protein Domains/genetics , Sequence Alignment , Sequence Homology, Amino Acid , Structure-Activity Relationship

CD8+ T cell responses to lytic EBV infection: late antigen specificities as subdominant components of the total response.

Abbott, Rachel J M; Quinn, Laura L; Leese, Alison M; Scholes, Harry M; Pachnio, Annette; Rickinson, Alan B.

J Immunol ; 191(11): 5398-409, 2013 Dec 01.

Article in English | MEDLINE | ID: mdl-24146041

ABSTRACT

EBV elicits primary CD8(+) T cell responses that, by T cell cloning from infectious mononucleosis (IM) patients, appear skewed toward immediate early (IE) and some early (E) lytic cycle proteins, with late (L) proteins rarely targeted. However, L Ag-specific responses have been detected regularly in polyclonal T cell cultures from long-term virus carriers. To resolve this apparent difference between responses to primary and persistent infection, 13 long-term carriers were screened in ex vivo IFN-Î³ ELISPOT assays using peptides spanning the two IE, six representative E, and seven representative L proteins. This revealed memory CD8 responses to 44 new lytic cycle epitopes that straddle all three protein classes but, in terms of both frequency and size, maintain the IE > E > L hierarchy of immunodominance. Having identified the HLA restriction of 10 (including 7 L) new epitopes using memory CD8(+) T cell clones, we looked in HLA-matched IM patients and found such reactivities but typically at low levels, explaining why they had gone undetected in the original IM clonal screens. Wherever tested, all CD8(+) T cell clones against these novel lytic cycle epitopes recognized lytically infected cells naturally expressing their target Ag. Surprisingly, however, clones against the most frequently recognized L Ag, the BNRF1 tegument protein, also recognized latently infected, growth-transformed cells. We infer that BNRF1 is also a latent Ag that could be targeted in T cell therapy of EBV-driven B-lymphoproliferative disease.

Subject(s)

CD8-Positive T-Lymphocytes/immunology , Herpesvirus 4, Human/immunology , Infectious Mononucleosis/immunology , Amino Acid Sequence , CD8-Positive T-Lymphocytes/virology , Cells, Cultured , Enzyme-Linked Immunospot Assay , HLA Antigens/metabolism , Humans , Immunodominant Epitopes/immunology , Immunodominant Epitopes/metabolism , Interferon-gamma/metabolism , Molecular Sequence Data , Peptide Fragments/immunology , Peptide Fragments/metabolism , Protein Binding , Viral Envelope Proteins/immunology , Viral Envelope Proteins/metabolism , Virus Latency/immunology

Dynamic changes in the brain protein interaction network correlates with progression of Aß42 pathology in Drosophila.

Scholes, Harry M; Cryar, Adam; Kerr, Fiona; Sutherland, David; Gethings, Lee A; Vissers, Johannes P C; Lees, Jonathan G; Orengo, Christine A; Partridge, Linda; Thalassinos, Konstantinos.

Sci Rep ; 10(1): 18517, 2020 10 28.

Article in English | MEDLINE | ID: mdl-33116184

ABSTRACT

Alzheimer's disease (AD), the most prevalent form of dementia, is a progressive and devastating neurodegenerative condition for which there are no effective treatments. Understanding the molecular pathology of AD during disease progression may identify new ways to reduce neuronal damage. Here, we present a longitudinal study tracking dynamic proteomic alterations in the brains of an inducible Drosophila melanogaster model of AD expressing the Arctic mutant Aß42 gene. We identified 3093 proteins from flies that were induced to express Aß42 and age-matched healthy controls using label-free quantitative ion-mobility data independent analysis mass spectrometry. Of these, 228 proteins were significantly altered by Aß42 accumulation and were enriched for AD-associated processes. Network analyses further revealed that these proteins have distinct hub and bottleneck properties in the brain protein interaction network, suggesting that several may have significant effects on brain function. Our unbiased analysis provides useful insights into the key processes governing the progression of amyloid toxicity and forms a basis for further functional analyses in model organisms and translation to mammalian systems.

Subject(s)

Amyloid beta-Peptides/metabolism , Brain/metabolism , Peptide Fragments/metabolism , Protein Interaction Maps/physiology , Alzheimer Disease/metabolism , Alzheimer Disease/physiopathology , Amyloid beta-Peptides/physiology , Animals , Disease Models, Animal , Disease Progression , Drosophila Proteins/metabolism , Drosophila melanogaster/metabolism , Longitudinal Studies , Neurons/metabolism , Peptide Fragments/physiology , Proteomics/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL