Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33237325

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
2.
Nucleic Acids Res ; 47(D1): D280-D284, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398663

ABSTRACT

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.


Subject(s)
Databases, Protein , Genome , Amino Acid Sequence , Animals , Conserved Sequence , Gene Ontology , Humans , Models, Molecular , Molecular Sequence Annotation , Multigene Family/genetics , Protein Conformation , Protein Domains/genetics , Sequence Alignment , Sequence Homology, Amino Acid , Structure-Activity Relationship
3.
Nucleic Acids Res ; 46(D1): D435-D439, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29112716

ABSTRACT

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.


Subject(s)
Databases, Protein , Genome , Protein Domains , Proteins/chemistry , Software , Amino Acid Sequence , Animals , Computer Graphics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , Sequence Analysis, Protein
4.
Nucleic Acids Res ; 45(D1): D289-D295, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899584

ABSTRACT

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.


Subject(s)
Computational Biology/methods , Databases, Protein , Models, Molecular , Proteins/chemistry , Proteins/metabolism , Software , Structure-Activity Relationship , Web Browser
5.
Nucleic Acids Res ; 44(D1): D404-9, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578585

ABSTRACT

Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼ 20,000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Humans , Internet , Models, Molecular , Molecular Sequence Annotation , Protein Interaction Domains and Motifs , Protein Structure, Tertiary/genetics
6.
PLoS Comput Biol ; 12(6): e1004926, 2016 06.
Article in English | MEDLINE | ID: mdl-27332861

ABSTRACT

Beta-lactamases represent the main bacterial mechanism of resistance to beta-lactam antibiotics and are a significant challenge to modern medicine. We have developed an automated classification and analysis protocol that exploits structure- and sequence-based approaches and which allows us to propose a grouping of serine beta-lactamases that more consistently captures and rationalizes the existing three classification schemes: Classes, (A, C and D, which vary in their implementation of the mechanism of action); Types (that largely reflect evolutionary distance measured by sequence similarity); and Variant groups (which largely correspond with the Bush-Jacoby clinical groups). Our analysis platform exploits a suite of in-house and public tools to identify Functional Determinants (FDs), i.e. residue sites, responsible for conferring different phenotypes between different classes, different types and different variants. We focused on Class A beta-lactamases, the most highly populated and clinically relevant class, to identify FDs implicated in the distinct phenotypes associated with different Class A Types and Variants. We show that our FunFHMMer method can separate the known beta-lactamase classes and identify those positions likely to be responsible for the different implementations of the mechanism of action in these enzymes. Two novel algorithms, ASSP and SSPA, allow detection of FD sites likely to contribute to the broadening of the substrate profiles. Using our approaches, we recognise 151 Class A types in UniProt. Finally, we used our beta-lactamase FunFams and ASSP profiles to detect 4 novel Class A types in microbiome samples. Our platforms have been validated by literature studies, in silico analysis and some targeted experimental verification. Although developed for the serine beta-lactamases they could be used to classify and analyse any diverse protein superfamily where sub-families have diverged over both long and short evolutionary timescales.


Subject(s)
Algorithms , Molecular Docking Simulation/methods , Sequence Analysis, Protein/methods , Software , beta-Lactamases/chemistry , beta-Lactamases/ultrastructure , Binding Sites , Computer Simulation , Drug Resistance, Bacterial , Enzyme Activation , Protein Binding , Serine , Structure-Activity Relationship , Substrate Specificity , beta-Lactam Resistance , beta-Lactamase Inhibitors/chemistry
7.
Nucleic Acids Res ; 43(W1): W148-53, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25964299

ABSTRACT

The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.


Subject(s)
Molecular Sequence Annotation , Protein Structure, Tertiary , Software , Gene Ontology , Internet , Proteins/classification , Proteins/genetics , Proteins/physiology
8.
Nucleic Acids Res ; 43(Database issue): D376-81, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25348408

ABSTRACT

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Genomics , Internet , Protein Structure, Tertiary/genetics , Proteins/classification
9.
Hum Mutat ; 37(4): 364-70, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26703369

ABSTRACT

Inactivating mutations in TSC1 and TSC2 cause tuberous sclerosis complex (TSC). The 2012 international consensus meeting on TSC diagnosis and management agreed that the identification of a pathogenic TSC1 or TSC2 variant establishes a diagnosis of TSC, even in the absence of clinical signs. However, exons 25 and 31 of TSC2 are subject to alternative splicing. No variants causing clinically diagnosed TSC have been reported in these exons, raising the possibility that such variants would not cause TSC. We present truncating and in-frame variants in exons 25 and 31 in three individuals unlikely to fulfil TSC diagnostic criteria and examine the importance of these exons in TSC using different approaches. Amino acid conservation analysis suggests significantly less conservation in these exons compared with the majority of TSC2 exons, and TSC2 expression data demonstrates that the majority of TSC2 transcripts lack exons 25 and/or 31 in many human adult tissues. In vitro assay of both exons shows that neither exon is essential for TSC complex function. Our evidence suggests that variants in TSC2 exons 25 or 31 are very unlikely to cause classical TSC, although a role for these exons in tissue/stage specific development cannot be excluded.


Subject(s)
Exons , Genetic Association Studies , Mutation , Tuberous Sclerosis/diagnosis , Tuberous Sclerosis/genetics , Tumor Suppressor Proteins/genetics , Adult , Alleles , Alternative Splicing , Child , Child, Preschool , Computational Biology/methods , Databases, Genetic , Gene Expression , Genetic Variation , Humans , Phenotype , Tuberous Sclerosis Complex 2 Protein
10.
Bioinformatics ; 31(21): 3460-7, 2015 Nov 01.
Article in English | MEDLINE | ID: mdl-26139634

ABSTRACT

MOTIVATION: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. RESULTS: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110,439 FunFams in 2735 superfamilies which can be used to functionally annotate>16 million domain sequences. AVAILABILITY AND IMPLEMENTATION: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. CONTACT: sayoni.das.12@ucl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Amino Acid Sequence , Humans , Molecular Sequence Data , Proteins/genetics , Proteins/metabolism , Sequence Analysis, Protein , Sequence Homology, Amino Acid , Structural Homology, Protein
11.
Nucleic Acids Res ; 42(Database issue): D240-5, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24270792

ABSTRACT

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Genome , Genomics , Internet , Models, Molecular , Protein Structure, Tertiary/genetics , Sequence Analysis, Protein
13.
Nucleic Acids Res ; 41(Database issue): D490-8, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203873

ABSTRACT

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Genomics , Internet , Molecular Sequence Annotation , Protein Folding , Proteins/chemistry , Proteins/classification , Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein , Structural Homology, Protein
14.
Biochim Biophys Acta ; 1834(5): 874-89, 2013 May.
Article in English | MEDLINE | ID: mdl-23499848

ABSTRACT

We present, to our knowledge, the first quantitative analysis of functional site diversity in homologous domain superfamilies. Different types of functional sites are considered separately. Our results show that most diverse superfamilies are very plastic in terms of the spatial location of their functional sites. This is especially true for protein-protein interfaces. In contrast, we confirm that catalytic sites typically occupy only a very small number of topological locations. Small-ligand binding sites are more diverse than expected, although in a more limited manner than protein-protein interfaces. In spite of the observed diversity, our results also confirm the previously reported preferential location of functional sites. We identify a subset of homologous domain superfamilies where diversity is particularly extreme, and discuss possible reasons for such plasticity, i.e. structural diversity. Our results do not contradict previous reports of preferential co-location of sites among homologues, but rather point at the importance of not ignoring other sites, especially in large and diverse superfamilies. Data on sites exploited by different relatives, within each well annotated domain superfamily, has been made accessible from the CATH website in order to highlight versatile superfamilies or superfamilies with highly preferential sites. This information is valuable for system biology and knowledge of any constraints on protein interactions could help in understanding the dynamic control of networks in which these proteins participate. The novelty of our work lies in the comprehensive nature of the analysis - we have used a significantly larger dataset than previous studies - and the fact that in many superfamilies we show that different parts of the domain surface are exploited by different relatives for ligand/protein interactions, particularly in superfamilies which are diverse in sequence and structure, an observation not previously reported on such a large scale. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.


Subject(s)
Proteins/physiology , Protein Binding
15.
Mol Pharm ; 10(1): 127-41, 2013 Jan 07.
Article in English | MEDLINE | ID: mdl-23210981

ABSTRACT

Cationic peptide sequences, whether linear, branched, or dendritic, are widely used to condense and protect DNA in both polyplex and lipopolyplex gene delivery vectors. How these peptides behave within these particles and the consequences this has on transfection efficiency remain poorly understood. We have compared, in parallel, a complete series of cationic peptides, both branched and linear, coformulated with plasmid DNA to give polyplexes, or with plasmid DNA and the cationic lipid, DOTMA, mixed with 50% of the neutral helper lipid, DOPE, to give lipopolyplexes, and correlated the transfection efficiencies of these complexes to their biophysical properties. Lipopolyplexes formulated from branched Arg-rich peptides, or linear Lys-rich peptides, show the best transfection efficiencies in an alveolar epithelial cell line, with His-rich peptides being relatively ineffective. The majority of the biophysical studies (circular dichroism, dynamic light scattering, zeta potential, small angle neutron scattering, and gel band shift assay) indicated that all of the formulations were similar in size, surface charge, and lipid bilayer structure, and longer cationic sequences, in general, gave better transfection efficiencies. Whereas lipopolyplexes formulated from branched Arg-containing peptides were more effective than those formulated from linear Arg-containing sequences, the reverse was true for Lys-containing sequences, which may be related to differences in DNA condensation between Arg-rich and Lys-rich peptides observed in the CD studies.


Subject(s)
Genetic Vectors/administration & dosage , Genetic Vectors/genetics , Lipids/administration & dosage , Lipids/genetics , Peptides/administration & dosage , Peptides/genetics , Cations/administration & dosage , Cations/chemistry , Cell Line, Tumor , Cell Survival/drug effects , Cell Survival/genetics , Chemistry, Pharmaceutical/methods , Circular Dichroism/methods , DNA/administration & dosage , DNA/chemistry , DNA/genetics , Epithelial Cells/drug effects , Epithelial Cells/metabolism , Gene Transfer Techniques , Genetic Vectors/chemistry , Humans , Lipids/chemistry , Particle Size , Peptides/chemistry , Plasmids/administration & dosage , Plasmids/chemistry , Plasmids/genetics , Pulmonary Alveoli/drug effects , Pulmonary Alveoli/metabolism , Respiratory Mucosa/drug effects , Respiratory Mucosa/metabolism , Transfection/methods
17.
PLoS One ; 17(7): e0268594, 2022.
Article in English | MEDLINE | ID: mdl-35793337

ABSTRACT

Birdwatching is considered one of the fastest growing nature-based tourism sectors in the world. Tourists who identify as birdwatchers tend to be well-educated and wealthy travellers with a specific interest in the places they visit. Birdwatchers can bring economic resources to remote communities diversifying their economies and contribute to biodiversity conservation in areas of bird habitat with global significance. Alaska plays a critical role in understanding the link between bird conservation and bird tourism as it supports the world's largest concentration of shorebirds and is a global breeding hotspot for hundreds of migratory species, including many species of conservation concern for their decline across their ranges. Alaska is also a global destination for birders due to the large congregations of birds that occur during the spring, summer and fall seasons. Despite its global importance, relatively little information exists on the significance of bird tourism in Alaska or on opportunities for community development that align with conservation. This study used ebird data to look at trends in Alaska birdwatching and applied existing information from the Alaska Visitor Statistics Program to estimate visitor expenditures and the impact of that spending on Alaska's regional economies. In 2016, nearly 300,000 birdwatchers visited Alaska and spent $378 million, supporting approximately 4,000 jobs. The study describes bird tourism's contributions to local jobs and income in remote rural and urban economies and discusses opportunities for developing and expanding the nature-based tourism sector. The study points toward the importance of partnering with rural communities and landowners to advance both economic opportunities and biodiversity conservation actions. The need for new data collection addressing niche market development and economic diversification is also discussed.


Subject(s)
Rural Population , Tourism , Animals , Biodiversity , Birds , Ecosystem , Humans
19.
Methods Mol Biol ; 2112: 43-57, 2020.
Article in English | MEDLINE | ID: mdl-32006277

ABSTRACT

The functional diversity of proteins is closely related to their differences in sequence and structure. Despite variations in functional sites, global structural similarity is a valuable source of information when assessing potential functional similarities between proteins. The CATH database contains a well-established hierarchical classification of more than 430,000 protein domain structures and nearly 95 million protein domain sequences, with integrated functional annotations for each represented family. The present chapter provides an overview of the main features of CATH with emphasis on exploiting structural similarities to obtain functional information for proteins.


Subject(s)
Proteins/chemistry , Sequence Analysis, Protein , Sequence Homology, Amino Acid , Structural Homology, Protein , Databases, Protein , Protein Structure, Tertiary
20.
Sci Signal ; 12(594)2019 08 13.
Article in English | MEDLINE | ID: mdl-31409758

ABSTRACT

The 21st century is witnessing an explosive surge in our understanding of pseudoenzyme-driven regulatory mechanisms in biology. Pseudoenzymes are proteins that have sequence homology with enzyme families but that are proven or predicted to lack enzyme activity due to mutations in otherwise conserved catalytic amino acids. The best-studied pseudoenzymes are pseudokinases, although examples from other families are emerging at a rapid rate as experimental approaches catch up with an avalanche of freely available informatics data. Kingdom-wide analysis in prokaryotes, archaea and eukaryotes reveals that between 5 and 10% of proteins that make up enzyme families are pseudoenzymes, with notable expansions and contractions seemingly associated with specific signaling niches. Pseudoenzymes can allosterically activate canonical enzymes, act as scaffolds to control assembly of signaling complexes and their localization, serve as molecular switches, or regulate signaling networks through substrate or enzyme sequestration. Molecular analysis of pseudoenzymes is rapidly advancing knowledge of how they perform noncatalytic functions and is enabling the discovery of unexpected, and previously unappreciated, functions of their intensively studied enzyme counterparts. Notably, upon further examination, some pseudoenzymes have previously unknown enzymatic activities that could not have been predicted a priori. Pseudoenzymes can be targeted and manipulated by small molecules and therefore represent new therapeutic targets (or anti-targets, where intervention should be avoided) in various diseases. In this review, which brings together broad bioinformatics and cell signaling approaches in the field, we highlight a selection of findings relevant to a contemporary understanding of pseudoenzyme-based biology.


Subject(s)
Enzymes/classification , Enzymes/genetics , Evolution, Molecular , Signal Transduction/genetics
SELECTION OF CITATIONS
SEARCH DETAIL