RESUMO
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de PrecisãoRESUMO
Knowledge about features distinguishing deleterious and neutral variations is crucial for interpretation of novel variants. Bruton tyrosine kinase (BTK) contains the highest number of unique disease-causing variations among the human protein kinases, still it is just 10% of all the possible single-nucleotide substitution-caused amino acid variations (SNAVs). In the BTK kinase domain (BTK-KD) can appear altogether 1,495 SNAVs. We investigated them all with bioinformatic and protein structure analysis methods. Most disease-causing variations affect conserved and buried residues disturbing protein stability. Minority of exposed residues is conserved, but strongly tied to pathogenicity. Sixty-seven percent of variations are predicted to be harmful. In 39% of the residues, all the variants are likely harmful, whereas in 10% of sites, all the substitutions are tolerated. Results indicate the importance of the entire kinase domain, involvement in numerous interactions, and intricate functional regulation by conformational change. These results can be extended to other protein kinases and organisms.
Assuntos
Substituição de Aminoácidos , Polimorfismo de Nucleotídeo Único , Domínios e Motivos de Interação entre Proteínas/genética , Proteínas Tirosina Quinases/genética , Tirosina Quinase da Agamaglobulinemia , Agamaglobulinemia/genética , Sequência Conservada , Evolução Molecular , Genes Ligados ao Cromossomo X , Humanos , Modelos Moleculares , Conformação Proteica , Proteínas Tirosina Quinases/química , Seleção Genética , Relação Estrutura-AtividadeRESUMO
High-throughput sequencing data generation demands the development of methods for interpreting the effects of genomic variants. Numerous computational methods have been developed to assess the impact of variations because experimental methods are unable to cope with both the speed and volume of data generation. To harness the strength of currently available predictors, the Pathogenic-or-Not-Pipeline (PON-P) integrates five predictors to predict the probability that nonsynonymous variations affect protein function and may consequently be disease related. Random forest methodology-based PON-P shows consistently improved performance in cross-validation tests and on independent test sets, providing ternary classification and statistical reliability estimate of results. Applied to missense variants in a melanoma cancer cell line, PON-P predicts variants in 17 genes to affect protein function. Previous studies implicate nine of these genes in the pathogenesis of various forms of cancer. PON-P may thus be used as a first step in screening and prioritizing variants to determine deleterious ones for further experimentation.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Humanos , Mutação de Sentido Incorreto/genéticaRESUMO
BACKGROUND: STAT1 is an essential transcription factor for interferon-γ-mediated gene responses. A distinct sumoylation consensus site (ψKxE) 702IKTE705 is localized in the C-terminal region of STAT1, where Lys703 is a target for PIAS-induced SUMO modification. Several studies indicate that sumoylation has an inhibitory role on STAT1-mediated gene expression but the molecular mechanisms are not fully understood. RESULTS: Here, we have performed a structural and functional analysis of sumoylation in STAT1. We show that deconjugation of SUMO by SENP1 enhances the transcriptional activity of STAT1, confirming a negative regulatory effect of sumoylation on STAT1 activity. Inspection of molecular model indicated that consensus site is well exposed to SUMO-conjugation in STAT1 homodimer and that the conjugated SUMO moiety is directed towards DNA, thus able to form a sterical hindrance affecting promoter binding of dimeric STAT1. In addition, oligoprecipitation experiments indicated that sumoylation deficient STAT1 E705Q mutant has higher DNA-binding activity on STAT1 responsive gene promoters than wild-type STAT1. Furthermore, sumoylation deficient STAT1 E705Q mutant displayed enhanced histone H4 acetylation on interferon-γ-responsive promoter compared to wild-type STAT1. CONCLUSIONS: Our results suggest that sumoylation participates in regulation of STAT1 responses by modulating DNA-binding properties of STAT1.
Assuntos
DNA/metabolismo , Fator de Transcrição STAT1/metabolismo , Acetilação , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Células COS , Chlorocebus aethiops , Imunoprecipitação da Cromatina , Cisteína Endopeptidases , Dimerização , Endopeptidases/química , Endopeptidases/metabolismo , Células HeLa , Histonas/metabolismo , Humanos , Regiões Promotoras Genéticas , Estrutura Terciária de Proteína , Proteínas Recombinantes/biossíntese , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Fator de Transcrição STAT1/química , Fator de Transcrição STAT1/genética , SumoilaçãoRESUMO
PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.
Assuntos
Bases de Dados Genéticas , Mutação , Fenótipo , Tirosina Quinase da Agamaglobulinemia , Antígenos de Grupos Sanguíneos/genética , Comportamento Cooperativo , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Bases de Dados Genéticas/normas , Genótipo , Globinas/genética , Humanos , Internet , Fenilalanina Hidroxilase/genética , Proteínas Tirosina Quinases/genética , Receptores Androgênicos/genética , Design de Software , Integração de SistemasRESUMO
Primary immunodeficiencies (IDs) are a heterogenic group of inherited disorders of the immune system. Immunodeficiency patients have increased susceptibility to recurrent and persistent, even life-threatening infections. Mutations in a large number of genes can cause defects in different cellular functions and lead to impaired immune response. To date, approximately 150 IDs and more than 100 affected genes have been identified. ID-related genes are distributed throughout the genome, and diseases can be inherited in an X-linked, an autosomal recessive, or an autosomal dominant way. We have collected ID mutation data into locus-specific patient-related mutation databases, IDbases (http://bioinf.uta.fi/IDbases). Mutations are described at DNA, mRNA, and protein levels with links to reference sequences and reference articles. The mutation data has been collated into entries along with some clinical information. IDbases offer an easy way, e.g., to find recently identified mutations, to reveal genotype-phenotype correlations, and to discover a specific mutation or to examine the most common mutations in a single immunodeficiency related gene. At the moment we have databases for 107 ID genes with 4,140 public patient entries. An exhaustive statistical analysis of mutation data from the IDbases was made. Missense and nonsense mutations are the most common mutation types, and the most common single substitution is a nonsense mutation from tryptophan to a stop codon. Arginine is the most mutated as well as the most abundant mutant amino acid.
Assuntos
Bases de Dados Genéticas , Síndromes de Imunodeficiência/genética , Mutação , Sequência de Aminoácidos , Humanos , Dados de Sequência Molecular , SoftwareRESUMO
X-linked agammaglobulinemia (XLA) is a hereditary immunodeficiency caused by mutations in the gene encoding Bruton tyrosine kinase (BTK). XLA patients have a decreased number of mature B cells and a lack of all immunoglobulin isotypes, resulting in susceptibility to severe bacterial infections. XLA-causing mutations are collected in a mutation database (BTKbase), which is available at http://bioinf.uta.fi/BTKbase. For each patient the following information is given (when available): the identification of the entry, a plain English description of the mutation followed by a reference, formal characterization of the mutation, and the various parameters from the patient. BTKbase is implemented with the MUTbase program suite, which provides an easy, interactive, and quality controlled submission of information to mutation databases. BTKbase version 8 lists mutation entries of 1,111 patients from 973 unrelated families showing 602 unique molecular events. The localization of the mutations on the gene and protein for BTK can be analyzed by clicking sequences on the web pages. The distribution of the mutations in the five structural domains is approximately proportional to the length of the domains, except for the Tec homology (TH) domain. The most frequently affected sites are CpG dinucleotides. The majority of the missense mutations are structural-disturbing Bruton tyrosine kinase (Btk) folding or decreasing stability. Many of the mutations affect functionally significant, conserved residues. The structural consequences of the mutations in all the domains have been studied based on crystallographic and nuclear magnetic resonance (NMR) structures as well as computer-aided molecular modeling.
Assuntos
Agamaglobulinemia/genética , Bases de Dados Genéticas , Doenças Genéticas Ligadas ao Cromossomo X/genética , Mutação , Sequência de Aminoácidos , Doenças Genéticas Ligadas ao Cromossomo X/classificação , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Relação Estrutura-AtividadeRESUMO
The ImmunoDeficiency Resource (IDR), freely available at http://www.uta.fi/imt/bioinfo/idr/, is a comprehensive knowledge base on immunodeficiencies. It is designed for different user groups such as researchers, physicians and nurses as well as patients and their families and the general public. Information on immunodeficiencies is stored as fact files, which are disease- and gene-based information resources. We have developed an inherited disease markup language (IDML) data model, which is designed for storing disease- and gene-specific data in extensible markup language (XML) format. The fact files written by the IDML can be used to present data in different contexts and platforms. All the information in the IDR is validated by expert curators.
Assuntos
Bases de Dados Factuais , Síndromes de Imunodeficiência , Sistemas de Gerenciamento de Base de Dados , Humanos , Síndromes de Imunodeficiência/diagnóstico , Síndromes de Imunodeficiência/imunologia , Síndromes de Imunodeficiência/terapia , Armazenamento e Recuperação da Informação , Internet , Controle de QualidadeRESUMO
A large number of disease-causing mutations have been identified from several protein kinases. KinMutBase is a comprehensive knowledge base for human disease-related mutations in protein kinase domains (http://bioinf.uta.fi/KinMutBase/). The latest version contains 582 different mutations for 1,790 cases in 1,322 families. KinMutBase entries are described on the DNA, mRNA, and protein level. Numbers for affected patients and families are also provided. KinMutBase has extensive amount of links and cross-references to literature, other databases, and information sources. There are numerous interactive pages about sequences, structures, mutation statistics, and diseases. Detailed statistical study was done on frequencies of different types of mutations both on the DNA and protein level in serine/threonine kinase (PSK) and tyrosine kinase (PTK). Three-dimensional structures indicate clustering of disease-related mutations mainly to conserved subdomains, and substrate and coligand binding amino acids, although mutations appear throughout the sequences. CpG containing codons, especially for arginine, constitute the majority of mutational hotspots. There are certain clear differences in mutation patterns and types between PSKs and PTKs.
Assuntos
Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Mutação , Proteínas Quinases/genética , Sistema de Registros , Sequência de Aminoácidos , Predisposição Genética para Doença , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Proteínas Quinases/química , Estrutura Terciária de ProteínaRESUMO
BACKGROUND: Although biomedical information is growing rapidly, it is difficult to find and retrieve validated data especially for rare hereditary diseases. There is an increased need for services capable of integrating and validating information as well as proving it in a logically organized structure. A XML-based language enables creation of open source databases for storage, maintenance and delivery for different platforms. METHODS: Here we present a new data model called fact file and an XML-based specification Inherited Disease Markup Language (IDML), that were developed to facilitate disease information integration, storage and exchange. The data model was applied to primary immunodeficiencies, but it can be used for any hereditary disease. Fact files integrate biomedical, genetic and clinical information related to hereditary diseases. RESULTS: IDML and fact files were used to build a comprehensive Web and WAP accessible knowledge base ImmunoDeficiency Resource (IDR) available at http://bioinf.uta.fi/idr/. A fact file is a user oriented user interface, which serves as a starting point to explore information on hereditary diseases. CONCLUSION: The IDML enables the seamless integration and presentation of genetic and disease information resources in the Internet. IDML can be used to build information services for all kinds of inherited diseases. The open source specification and related programs are available at http://bioinf.uta.fi/idml/.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Doenças Genéticas Inatas , Síndromes de Imunodeficiência , Internet , Humanos , Erros Inatos do Metabolismo , Linguagens de Programação , Integração de Sistemas , Interface Usuário-ComputadorRESUMO
BACKGROUND: The ImmunoDeficiency Resource (IDR) is a knowledge base for the integration of the clinical, biochemical, genetic, genomic, proteomic, structural, and computational data of primary immunodeficiencies. The need for the IDR arises from the lack of structured and systematic information about primary immunodeficiencies on the Internet, and from the lack of a common platform which enables doctors, researchers, students, nurses and patients to find out validated information about these diseases. DESCRIPTION: The IDR knowledge base, first released in 1999, has grown substantially. It contains information for 158 diseases, both from a clinical as well as molecular point of view. The database and the user interface have been reformatted. This new IDR release has a richer and more complete breadth, depth and scope. The service provides the most complete and up-to-date dataset. The IDR has been integrated with several internal and external databases and services. The contents of the IDR are validated and selected for different types of users (doctors, nurses, researchers and students, as well as patients and their families). The search engine has been improved and allows either a detailed or a broad search from a simple user interface. CONCLUSION: The IDR is the first knowledge base specifically designed to capture in a systematic and validated way both clinical and molecular information for primary immunodeficiencies. The service is freely available at http://bioinf.uta.fi/idr and is regularly updated. The IDR facilitates primary immunodeficiencies informatics and helps to parameterise in silico modelling of these diseases. The IDR is useful also as an advanced education tool for medical students, and physicians.
RESUMO
Bruton's tyrosine kinase (Btk) is encoded by the gene that when mutated causes the primary immunodeficiency disease X-linked agammaglobulinemia (XLA) in humans and X-linked immunodeficiency (Xid) in mice. Btk is a member of the Tec family of protein tyrosine kinases (PTKs) and plays a vital, but diverse, modulatory role in many cellular processes. Mutations affecting Btk block B-lymphocyte development. Btk is conserved among species, and in this review, we present the sequence of the full-length rat Btk and find it to be analogous to the mouse Btk sequence. We have also analyzed the wealth of information compiled in the mutation database for XLA (BTKbase), representing 554 unique molecular events in 823 families and demonstrate that only selected amino acids are sensitive to replacement (P < 0.001). Although genotype-phenotype correlations have not been established in XLA, based on these findings, we hypothesize that this relationship indeed exists. Using short interfering-RNA technology, we have previously generated active constructs downregulating Btk expression. However, application of recently established guidelines to enhance or decrease the activity was not successful, demonstrating the importance of the primary sequence. We also review the outcome of expression profiling, comparing B lymphocytes from XLA-, Xid-, and Btk-knockout (KO) donors to healthy controls. Finally, in spite of a few genes differing in expression between Xid- and Btk-KO mice, in vivo competition between cells expressing either mutation shows that there is no selective survival advantage of cells carrying one genetic defect over the other. We conclusively demonstrate that for the R28C-missense mutant (Xid), there is no biologically relevant residual activity or any dominant negative effect versus other proteins.
Assuntos
Agamaglobulinemia/genética , Síndromes de Imunodeficiência/genética , Proteínas Tirosina Quinases/química , Proteínas Tirosina Quinases/genética , Tirosina Quinase da Agamaglobulinemia , Sequência de Aminoácidos , Animais , Sequência Conservada , Perfilação da Expressão Gênica , Humanos , Camundongos , Dados de Sequência Molecular , Mutação , Proteínas Tirosina Quinases/metabolismo , RNA Interferente Pequeno/genética , Ratos , Alinhamento de SequênciaRESUMO
Primary immunodeficiencies (IDs) are caused by inherited genetic defects leading to intrinsic defects in cells of the immune systems. Most IDs are rare diseases and can be difficult to diagnose because similar symptoms characterize several disorders. Mutation detection is the most reliable method in such cases. These tests are not available at most centers and physicians can have difficulties in finding laboratories that could analyze the genetic defects because certain genes are possibly analyzed by just one laboratory. The IDdiagnostics registry has been established to provide information for physicians and other health care professionals. The database at http://bioinf.uta.fi/IDdiagnostics contains currently information for the analysis of defects in 30 ID-related genes. Another part of IDdiagnostics is a database of clinical tests. Laboratories performing these analyses, either gene or clinical tests, are asked to submit their information to the database by using a printed form or electronic submission at http://bioinf.uta.fi/cgi-bin/submit/IDClini.cgi. The clinical test database contains information about tests for clinical data, immune status, and studies of function, antibody response, cell function, enzyme assays, clinical function, and apoptosis assays. Both the services are freely available and regularly updated. The services aim at increasing the awareness of IDs and helping to obtain exact and early diagnosis.