RESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Asunto(s)
Investigación Biomédica , Genoma Humano , Proyecto Genoma Humano , Europa (Continente) , HumanosRESUMEN
In the version of this article initially published, Lena Dolman's second affiliation was given as Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. The correct second affiliation is Ontario Institute for Cancer Research, Toronto, Ontario, Canada. The error has been corrected in the HTML and PDF versions of the article.
RESUMEN
A common Authentication and Authorisation Infrastructure (AAI) that would allow single sign-on to services has been identified as a key enabler for European bioinformatics. ELIXIR AAI is an ELIXIR service portfolio for authenticating researchers to ELIXIR services and assisting these services on user privileges during research usage. It relieves the scientific service providers from managing the user identities and authorisation themselves, enables the researcher to have a single set of credentials to all ELIXIR services and supports meeting the requirements imposed by the data protection laws. ELIXIR AAI was launched in late 2016 and is part of the ELIXIR Compute platform portfolio. By the end of 2017 the number of users reached 1000, while the number of relying scientific services was 36. This paper presents the requirements and design of the ELIXIR AAI and the policies related to its use, and how it can be used for serving some example services, such as document management, social media, data discovery, human data access, cloud compute and training services.
Asunto(s)
Investigación Biomédica/métodos , Biología Computacional/métodos , Seguridad Computacional , Sistemas de Administración de Bases de Datos , Programas Informáticos , Humanos , Investigadores , Interfaz Usuario-ComputadorRESUMEN
The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
Asunto(s)
Acceso a la Información , Genética Médica/normas , Genómica/normas , Difusión de la Información , Genética Médica/ética , Genética Médica/legislación & jurisprudencia , Genómica/ética , Genómica/legislación & jurisprudencia , Humanos , Concesión de Licencias , Guías de Práctica Clínica como AsuntoRESUMEN
BACKGROUND: Translational researchers need robust IT solutions to access a range of data types, varying from public data sets to pseudonymised patient information with restricted access, provided on a case by case basis. The reason for this complication is that managing access policies to sensitive human data must consider issues of data confidentiality, identifiability, extent of consent, and data usage agreements. All these ethical, social and legal aspects must be incorporated into a differential management of restricted access to sensitive data. METHODS: In this paper we present a pilot system that uses several common open source software components in a novel combination to coordinate access to heterogeneous biomedical data repositories containing open data (open access) as well as sensitive data (restricted access) in the domain of biobanking and biosample research. Our approach is based on a digital identity federation and software to manage resource access entitlements. RESULTS: Open source software components were assembled and configured in such a way that they allow for different ways of restricted access according to the protection needs of the data. We have tested the resulting pilot infrastructure and assessed its performance, feasibility and reproducibility. CONCLUSIONS: Common open source software components are sufficient to allow for the creation of a secure system for differential access to sensitive data. The implementation of this system is exemplary for researchers facing similar requirements for restricted access data. Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios.
Asunto(s)
Bancos de Muestras Biológicas/normas , Investigación Biomédica/normas , Seguridad Computacional/normas , Conjuntos de Datos como Asunto , Investigación Biomédica Traslacional/normas , Humanos , Proyectos PilotoRESUMEN
Much has changed in the last two years at DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar). We are now processing direct submissions rather than only curating data from the literature and our joint study catalog includes data from over 100 studies in 11 organisms. Studies from human dominate with data from control and case populations, tumor samples as well as three large curated studies derived from multiple sources. During the processing of these data, we have made improvements to our data model, submission process and data representation. Additionally, we have made significant improvements in providing access to these data via web and FTP interfaces.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Variación Estructural del Genoma , Genotipo , Humanos , Internet , FenotipoRESUMEN
The Human Variome Project (HVP) has established a pilot program with the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) to compile all inherited variation affecting colon cancer susceptibility genes. An HVP-InSiGHT Workshop was held on May 10, 2010, prior to the HVP Integration and Implementation Meeting at UNESCO in Paris, to review the progress of this pilot program. A wide range of topics were covered, including issues relating to genotype-phenotype data submission to the InSiGHT Colon Cancer Gene Variant Databases (chromium.liacs.nl/LOVD2/colon_cancer/home.php). The meeting also canvassed the recent exciting developments in models to evaluate the pathogenicity of unclassified variants using in silico data, tumor pathology information, and functional assays, and made further plans for the future progress and sustainability of the pilot program.
Asunto(s)
Neoplasias del Colon/genética , Genes Relacionados con las Neoplasias/genética , Variación Genética/genética , Genoma Humano , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Paris , Naciones UnidasAsunto(s)
Archivos , Bases de Datos Genéticas , Variación Estructural del Genoma , Sector Público , HumanosRESUMEN
UNLABELLED: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented. AVAILABILITY: The source code, documentation and initialization scripts are available at http://simbioms.org.
Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Gestión de la Información/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Bases de Datos FactualesRESUMEN
The authors have made a genome-wide analysis of mutations in Src homology 2 (SH2) domains associated with human disease. Disease-causing mutations have been detected in the SH2 domains of cytoplasmic signaling proteins Bruton tyrosine kinase (BTK), SH2D1A, Ras GTPase activating protein (RasGAP), ZAP-70, SHP-2, STAT1, STAT5B, and the p85alpha subunit of the PIP3. Mutations in the BTK, SH2D1A, ZAP70, STAT1, and STAT5B genes have been shown to cause diverse immunodeficiencies, whereas the mutations in RASA1 and PIK3R1 genes lead to basal carcinoma and diabetes, respectively. PTPN11 mutations cause Noonan sydrome and different types of cancer, depending mainly on whether the mutation is inherited or sporadic. We collected and analyzed all known pathogenic mutations affecting human SH2 domains by bioinformatics methods. Among the investigated protein properties are sequence conservation and covariance, structural stability, side chain rotamers, packing effects, surface electrostatics, hydrogen bond formation, accessible surface area, salt bridges, and residue contacts. The majority of the mutations affect positions essential for phosphotyrosine ligand binding and specificity. The structural basis of the SH2 domain diseases was elucidated based on the bioinformatic analysis.
Asunto(s)
Genoma Humano , Mutación , Proteínas Tirosina Quinasas/genética , Dominios Homologos src , Agammaglobulinemia Tirosina Quinasa , Secuencia de Aminoácidos , Humanos , Datos de Secuencia Molecular , Conformación Proteica , Proteínas Tirosina Quinasas/química , Homología de Secuencia de AminoácidoRESUMEN
A number of beta-sandwich immunoglobulin-like domains have been shown to fold using a set of structurally equivalent residues that form a folding nucleus deep within the core of the protein. Formation of this nucleus is sufficient to establish the complex Greek key topology of the native state. These nucleating residues are highly conserved within the immunoglobulin superfamily, but are less well conserved in the fibronectin type III (fnIII) superfamily, where the requirement is simply to have four interacting hydrophobic residues. However, there are rare examples where this nucleation pattern is absent. In this study, we have investigated the folding of a novel member of the fnIII superfamily whose nucleus appears to lack one of the four buried hydrophobic residues. We show that the folding mechanism is unaltered, but the folding nucleus has moved within the hydrophobic core.
Asunto(s)
Inmunoglobulinas/química , Pliegue de Proteína , Secuencia de Aminoácidos , Aminoácidos Aromáticos/química , Bacillus/enzimología , Bacillus/genética , Quitinasas/química , Quitinasas/genética , Quitinasas/aislamiento & purificación , Quitinasas/metabolismo , Secuencia Conservada , Fibronectinas/química , Fibronectinas/genética , Fibronectinas/aislamiento & purificación , Fibronectinas/metabolismo , Humanos , Enlace de Hidrógeno , Inmunoglobulinas/genética , Inmunoglobulinas/metabolismo , Cinética , Modelos Químicos , Modelos Moleculares , Datos de Secuencia Molecular , Mutación , Desnaturalización Proteica , Estructura Cuaternaria de Proteína , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Tenascina/química , Tenascina/genética , Tenascina/aislamiento & purificación , Tenascina/metabolismo , TermodinámicaRESUMEN
It has proved impossible to purify some proteins implicated in disease in sufficient quantities to allow a biophysical characterization of the effect of pathogenic mutations. To overcome this problem we have analyzed 37 different disease-causing mutations located in the L1 and IL2Rgamma proteins in well characterized related model proteins in which mutations that are identical or equivalent to pathogenic mutations were introduced. We show that data from these models are consistent and that changes in stability observed can be correlated to severity of disease, to correct trafficking within the cell and to in vitro ligand binding studies. Interestingly, we find that any mutations that cause a loss of stability of more than 2 kcal/mol are severely debilitating, even though some model proteins with these mutations can be easily expressed and analyzed. Furthermore we show that the severity of mutation can be predicted by a DeltaDeltaG(evolution) scale, a measure of conservation. Our results demonstrate that model proteins can be used to analyze disease-causing mutations when wild-type proteins are not stable enough to carry mutations for biophysical analysis.
Asunto(s)
Modelos Moleculares , Mutación , Molécula L1 de Adhesión de Célula Nerviosa/química , Receptores de Interleucina/química , Predisposición Genética a la Enfermedad , Humanos , Inmunoglobulinas/química , Inmunoglobulinas/genética , Subunidad gamma Común de Receptores de Interleucina , Molécula L1 de Adhesión de Célula Nerviosa/genética , Valor Predictivo de las Pruebas , Estructura Terciaria de Proteína , Receptores de Interleucina/genética , Análisis de Secuencia , Relación Estructura-ActividadRESUMEN
The PrsA protein of Bacillus subtilis is an essential membrane-bound lipoprotein that is assumed to assist post-translocational folding of exported proteins and stabilize them in the compartment between the cytoplasmic membrane and cell wall. This folding activity is consistent with the homology of a segment of PrsA with parvulin-type peptidyl-prolyl cis/trans isomerases (PPIase). In this study, molecular modeling showed that the parvulin-like region can adopt a parvulin-type fold with structurally conserved active site residues. PrsA exhibits PPIase activity in a manner dependent on the parvulin-like domain. We constructed deletion, peptide insertion, and amino acid substitution mutations and demonstrated that the parvulin-like domain as well as flanking N- and C-terminal domains are essential for in vivo PrsA function in protein secretion and growth. Surprisingly, none of the predicted active site residues of the parvulin-like domain was essential for growth and protein secretion, although several active site mutations reduced or abolished the PPIase activity or the ability of PrsA to catalyze proline-limited protein folding in vitro. Our results indicate that PrsA is a PPIase, but the essential role in vivo seems to depend on some non-PPIase activity of both the parvulin-like and flanking domains.
Asunto(s)
Bacillus subtilis/química , Lipoproteínas/química , Lipoproteínas/fisiología , Proteínas de la Membrana/química , Proteínas de la Membrana/fisiología , Pliegue de Proteína , Proteínas/metabolismo , Bacillus subtilis/metabolismo , Proteínas Bacterianas/química , Dominio Catalítico , Lipoproteínas/genética , Proteínas de la Membrana/genética , Mutagénesis Sitio-Dirigida , Peptidilprolil Isomerasa de Interacción con NIMA , Isomerasa de Peptidilprolil/química , Estructura Terciaria de ProteínaRESUMEN
Progression to hormone-refractory growth of prostate cancer has been suggested to be mediated by androgen receptor (AR) gene alterations. We analyzed AR for mutations and amplifications in 21 locally recurrent prostate carcinomas treated with orchiectomy, estrogens, or a combination of orchiectomy and estramustine phosphate using fluorescence in situ hybridization, single-strand conformation polymorphism, and DNA sequence analyses. Amplification was observed in 4 of 16 (25%) and amino acid changing mutations was observed in 7 of 21 (33%) of the tumors, respectively. Two (50%) tumors with AR amplification also had missense mutation of the gene. Four of five (80%) cancers that were treated with a combination of orchiectomy and estramustine phosphate had a mutation clustered at codons 514 to 533 in the N-terminal domain of AR. In functional studies, these mutations did not render AR more sensitive to testosterone, dihydrotestosterone, androstenedione, or beta-estradiol. Tumors treated by orchiectomy had mutations predominantly in the ligand-binding domain. In summary, we found molecular alterations of AR in more than half of the prostate carcinomas that recurred locally. Some tumors developed both aberrations, possibly enhancing the cancer cell to respond efficiently to low levels of androgens. Furthermore, localization of point mutations in AR seems to be influenced by the type of treatment.
Asunto(s)
Estramustina/uso terapéutico , Mutación , Neoplasias Hormono-Dependientes/genética , Neoplasias de la Próstata/genética , Receptores Androgénicos/genética , Humanos , Masculino , Recurrencia Local de Neoplasia/genética , Neoplasias Hormono-Dependientes/terapia , Orquiectomía , Neoplasias de la Próstata/terapia , Transcripción GenéticaRESUMEN
Mutations in the gene encoding for a de novo methyltransferase, DNMT3B, lead to an autosomal recessive Immunodeficiency, Centromeric instability and Facial anomalies (ICF) syndrome. To analyse the protein structure and consequences of ICF-causing mutations, we modelled the structure of the DNMT3B methyltransferase domain based on Haemophilus haemolyticus protein in complex with the cofactor AdoMet and the target DNA sequence. The structural model has a two-subdomain fold where the DNA-binding region is situated between the subdomains on a surface cleft having positive electrostatic potential. The smaller subdomains of the methyltransferases differ in length and sequences and therefore only the target recognition domain loop was modelled to show the location of an ICF-causing mutation. Based on the model, the DNMT3B recognizes the GC sequence and flips the cytosine from the double-stranded DNA to the catalytic pocket. The amino acids in the cofactor and target cytosine binding sites and also the electrostatic properties of the binding pockets are conserved. In addition, a registry of all known ICF-causing mutations, DNMT3Bbase, was constructed. The structural principles of the pathogenic mutations based on the modelled structure and the analysis of chi angle rotation changes of mutated side chains are discussed.