Búsqueda | Portal Regional de la BVS

1.

STIGMA: Single-cell tissue-specific gene prioritization using machine learning.

Balachandran, Saranya; Prada-Medina, Cesar A; Mensah, Martin A; Glaser, Juliane; Kakar, Naseebullah; Nagel, Inga; Pozojevic, Jelena; Audain, Enrique; Hitz, Marc-Phillip; Kircher, Martin; Sreenivasan, Varun K A; Spielmann, Malte.

Am J Hum Genet ; 111(3): 618, 2024 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-38458167

2.

STIGMA: Single-cell tissue-specific gene prioritization using machine learning.

Balachandran, Saranya; Prada-Medina, Cesar A; Mensah, Martin A; Kakar, Naseebullah; Nagel, Inga; Pozojevic, Jelena; Audain, Enrique; Hitz, Marc-Phillip; Kircher, Martin; Sreenivasan, Varun K A; Spielmann, Malte.

Am J Hum Genet ; 111(2): 338-349, 2024 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-38228144

RESUMEN

Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.

Asunto(s)

Cardiopatías Congénitas , Transcriptoma , Humanos , Animales , Ratones , Exoma/genética , Cardiopatías Congénitas/genética , Secuenciación del Exoma , Aprendizaje Automático , Análisis de la Célula Individual/métodos , Enzimas Activadoras de Ubiquitina/genética

3.

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides.

Umer, Husen M; Audain, Enrique; Zhu, Yafeng; Pfeuffer, Julianus; Sachsenberg, Timo; Lehtiö, Janne; Branca, Rui M; Perez-Riverol, Yasset.

Bioinformatics ; 38(5): 1470-1472, 2022 02 07.

Artículo en Inglés | MEDLINE | ID: mdl-34904638

RESUMEN

SUMMARY: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified. AVAILABILITY AND IMPLEMENTATION: The software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Proteogenómica , Humanos , Péptidos/genética , Programas Informáticos , Algoritmos , Proteínas

4.

A proteomics sample metadata representation for multiomics integration and big data analysis.

Dai, Chengxin; Füllgrabe, Anja; Pfeuffer, Julianus; Solovyeva, Elizaveta M; Deng, Jingwen; Moreno, Pablo; Kamatchinathan, Selvakumar; Kundu, Deepti Jaiswal; George, Nancy; Fexova, Silvie; Grüning, Björn; Föll, Melanie Christine; Griss, Johannes; Vaudel, Marc; Audain, Enrique; Locard-Paulet, Marie; Turewicz, Michael; Eisenacher, Martin; Uszkoreit, Julian; Van Den Bossche, Tim; Schwämmle, Veit; Webel, Henry; Schulze, Stefan; Bouyssié, David; Jayaram, Savita; Duggineni, Vinay Kumar; Samaras, Patroklos; Wilhelm, Mathias; Choi, Meena; Wang, Mingxun; Kohlbacher, Oliver; Brazma, Alvis; Papatheodorou, Irene; Bandeira, Nuno; Deutsch, Eric W; Vizcaíno, Juan Antonio; Bai, Mingze; Sachsenberg, Timo; Levitsky, Lev I; Perez-Riverol, Yasset.

Nat Commun ; 12(1): 5854, 2021 10 06.

Artículo en Inglés | MEDLINE | ID: mdl-34615866

RESUMEN

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Asunto(s)

Análisis de Datos , Bases de Datos de Proteínas , Metadatos , Proteómica , Macrodatos , Humanos , Reproducibilidad de los Resultados , Programas Informáticos , Transcriptoma

5.

Correction to: Rare variants in KDR, encoding VEGF Receptor 2, are associated with tetralogy of Fallot.

Skoric-Milosavljevic, Doris; Lahrouchi, Najim; Bosada, Fernanda M; Dombrowsky, Gregor; Williams, Simon G; Lesurf, Robert; Tjong, Fleur V Y; Walsh, Roddy; El Bouchikhi, Ihssane; Breckpot, Jeroen; Audain, Enrique; Ilgun, Aho; Beekman, Leander; Ratbi, Ilham; Strong, Alanna; Muenke, Maximilian; Heide, Solveig; Muir, Alison M; Hababa, Mariam; Cross, Laura; Zhou, Dihong; Pastinen, Tomi; Zackai, Elaine; Atmani, Samir; Ouldim, Karim; Adadi, Najlae; Steindl, Katharina; Rauch, Anita; Brook, David; Wilsdon, Anna; Kuipers, Irene; Blom, Nico A; Mulder, Barbara J; Mefford, Heather C; Keren, Boris; Joset, Pascal; Kruszka, Paul; Thiffault, Isabelle; Sheppard, Sarah E; Roberts, Amy; Lodder, Elisabeth M; Keavney, Bernard D; Clur, Sally-Ann B; Mital, Seema; Hitz, Marc-Philip; Christoffels, Vincent M; Postma, Alex V; Bezzina, Connie R.

Genet Med ; 23(10): 2013, 2021 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-34522030

6.

Correction: Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease.

Audain, Enrique; Wilsdon, Anna; Breckpot, Jeroen; Izarzugaza, Jose M G; Fitzgerald, Tomas W; Kahlert, Anne-Karin; Sifrim, Alejandro; Wünnemann, Florian; Perez-Riverol, Yasset; Abdul-Khaliq, Hashim; Bak, Mads; Bassett, Anne S; Benson, D Woodrow; Berger, Felix; Daehnert, Ingo; Devriendt, Koenraad; Dittrich, Sven; Daubeney, Piers Ef; Garg, Vidu; Hackmann, Karl; Hoff, Kirstin; Hofmann, Philipp; Dombrowsky, Gregor; Pickardt, Thomas; Bauer, Ulrike; Keavney, Bernard D; Klaassen, Sabine; Kramer, Hans-Heiner; Marshall, Christian R; Milewicz, Dianna M; Lemaire, Scott; Coselli, Joseph S; Mitchell, Michael E; Tomita-Mitchell, Aoy; Prakash, Siddharth K; Stamm, Karl; Stewart, Alexandre F R; Silversides, Candice K; Siebert, Reiner; Stiller, Brigitte; Rosenfeld, Jill A; Vater, Inga; Postma, Alex V; Caliebe, Almuth; Brook, J David; Andelfinger, Gregor; Hurles, Matthew E; Thienpont, Bernard; Larsen, Lars Allan; Hitz, Marc-Phillip.

PLoS Genet ; 17(9): e1009809, 2021 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-34547032

RESUMEN

[This corrects the article DOI: 10.1371/journal.pgen.1009679.].

7.

Integrative analysis of genomic variants reveals new associations of candidate haploinsufficient genes with congenital heart disease.

Audain, Enrique; Wilsdon, Anna; Breckpot, Jeroen; Izarzugaza, Jose M G; Fitzgerald, Tomas W; Kahlert, Anne-Karin; Sifrim, Alejandro; Wünnemann, Florian; Perez-Riverol, Yasset; Abdul-Khaliq, Hashim; Bak, Mads; Bassett, Anne S; Benson, D Woodrow; Berger, Felix; Daehnert, Ingo; Devriendt, Koenraad; Dittrich, Sven; Daubeney, Piers Ef; Garg, Vidu; Hackmann, Karl; Hoff, Kirstin; Hofmann, Philipp; Dombrowsky, Gregor; Pickardt, Thomas; Bauer, Ulrike; Keavney, Bernard D; Klaassen, Sabine; Kramer, Hans-Heiner; Marshall, Christian R; Milewicz, Dianna M; Lemaire, Scott; Coselli, Joseph S; Mitchell, Michael E; Tomita-Mitchell, Aoy; Prakash, Siddharth K; Stamm, Karl; Stewart, Alexandre F R; Silversides, Candice K; Siebert, Reiner; Stiller, Brigitte; Rosenfeld, Jill A; Vater, Inga; Postma, Alex V; Caliebe, Almuth; Brook, J David; Andelfinger, Gregor; Hurles, Matthew E; Thienpont, Bernard; Larsen, Lars Allan; Hitz, Marc-Phillip.

PLoS Genet ; 17(7): e1009679, 2021 07.

Artículo en Inglés | MEDLINE | ID: mdl-34324492

RESUMEN

Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways.

Asunto(s)

Variaciones en el Número de Copia de ADN/genética , Haploinsuficiencia/genética , Cardiopatías Congénitas/genética , Bases de Datos Genéticas , Expresión Génica/genética , Perfilación de la Expresión Génica/métodos , Predisposición Genética a la Enfermedad/genética , Genómica/métodos , Humanos , Canales Iónicos/genética , Proteínas de la Membrana/genética , Polimorfismo de Nucleótido Simple/genética , Transcriptoma/genética

8.

Rare variants in KDR, encoding VEGF Receptor 2, are associated with tetralogy of Fallot.

Skoric-Milosavljevic, Doris; Lahrouchi, Najim; Bosada, Fernanda M; Dombrowsky, Gregor; Williams, Simon G; Lesurf, Robert; Tjong, Fleur V Y; Walsh, Roddy; El Bouchikhi, Ihssane; Breckpot, Jeroen; Audain, Enrique; Ilgun, Aho; Beekman, Leander; Ratbi, Ilham; Strong, Alanna; Muenke, Maximilian; Heide, Solveig; Muir, Alison M; Hababa, Mariam; Cross, Laura; Zhou, Dihong; Pastinen, Tomi; Zackai, Elaine; Atmani, Samir; Ouldim, Karim; Adadi, Najlae; Steindl, Katharina; Rauch, Anita; Brook, David; Wilsdon, Anna; Kuipers, Irene; Blom, Nico A; Mulder, Barbara J; Mefford, Heather C; Keren, Boris; Joset, Pascal; Kruszka, Paul; Thiffault, Isabelle; Sheppard, Sarah E; Roberts, Amy; Lodder, Elisabeth M; Keavney, Bernard D; Clur, Sally-Ann B; Mital, Seema; Hitz, Marc-Philip; Christoffels, Vincent M; Postma, Alex V; Bezzina, Connie R.

Genet Med ; 23(10): 1952-1960, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-34113005

RESUMEN

PURPOSE: Rare genetic variants in KDR, encoding the vascular endothelial growth factor receptor 2 (VEGFR2), have been reported in patients with tetralogy of Fallot (TOF). However, their role in disease causality and pathogenesis remains unclear. METHODS: We conducted exome sequencing in a familial case of TOF and large-scale genetic studies, including burden testing, in >1,500 patients with TOF. We studied gene-targeted mice and conducted cell-based assays to explore the role of KDR genetic variation in the etiology of TOF. RESULTS: Exome sequencing in a family with two siblings affected by TOF revealed biallelic missense variants in KDR. Studies in knock-in mice and in HEK 293T cells identified embryonic lethality for one variant when occurring in the homozygous state, and a significantly reduced VEGFR2 phosphorylation for both variants. Rare variant burden analysis conducted in a set of 1,569 patients of European descent with TOF identified a 46-fold enrichment of protein-truncating variants (PTVs) in TOF cases compared to controls (P = 7 × 10-11). CONCLUSION: Rare KDR variants, in particular PTVs, strongly associate with TOF, likely in the setting of different inheritance patterns. Supported by genetic and in vivo and in vitro functional analysis, we propose loss-of-function of VEGFR2 as one of the mechanisms involved in the pathogenesis of TOF.

Asunto(s)

Tetralogía de Fallot , Receptor 2 de Factores de Crecimiento Endotelial Vascular , Animales , Predisposición Genética a la Enfermedad , Células HEK293 , Humanos , Ratones , Tetralogía de Fallot/genética , Receptor 2 de Factores de Crecimiento Endotelial Vascular/genética , Secuenciación del Exoma

9.

Systems genetics analysis identifies calcium-signaling defects as novel cause of congenital heart disease.

Izarzugaza, Jose M G; Ellesøe, Sabrina G; Doganli, Canan; Ehlers, Natasja Spring; Dalgaard, Marlene D; Audain, Enrique; Dombrowsky, Gregor; Banasik, Karina; Sifrim, Alejandro; Wilsdon, Anna; Thienpont, Bernard; Breckpot, Jeroen; Gewillig, Marc; Brook, J David; Hitz, Marc-Phillip; Larsen, Lars A; Brunak, Søren.

Genome Med ; 12(1): 76, 2020 08 28.

Artículo en Inglés | MEDLINE | ID: mdl-32859249

RESUMEN

BACKGROUND: Congenital heart disease (CHD) occurs in almost 1% of newborn children and is considered a multifactorial disorder. CHD may segregate in families due to significant contribution of genetic factors in the disease etiology. The aim of the study was to identify pathophysiological mechanisms in families segregating CHD. METHODS: We used whole exome sequencing to identify rare genetic variants in ninety consenting participants from 32 Danish families with recurrent CHD. We applied a systems biology approach to identify developmental mechanisms influenced by accumulation of rare variants. We used an independent cohort of 714 CHD cases and 4922 controls for replication and performed functional investigations using zebrafish as in vivo model. RESULTS: We identified 1785 genes, in which rare alleles were shared between affected individuals within a family. These genes were enriched for known cardiac developmental genes, and 218 of these genes were mutated in more than one family. Our analysis revealed a functional cluster, enriched for proteins with a known participation in calcium signaling. Replication in an independent cohort confirmed increased mutation burden of calcium-signaling genes in CHD patients. Functional investigation of zebrafish orthologues of ITPR1, PLCB2, and ADCY2 verified a role in cardiac development and suggests a combinatorial effect of inactivation of these genes. CONCLUSIONS: The study identifies abnormal calcium signaling as a novel pathophysiological mechanism in human CHD and confirms the complex genetic architecture underlying CHD.

Asunto(s)

Señalización del Calcio , Calcio/metabolismo , Predisposición Genética a la Enfermedad , Cardiopatías Congénitas/genética , Cardiopatías Congénitas/metabolismo , Biología de Sistemas/métodos , Alelos , Animales , Biología Computacional/métodos , Bases de Datos Genéticas , Dinamarca , Femenino , Estudios de Asociación Genética/métodos , Variación Genética , Humanos , Masculino , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Sistema de Registros , Secuenciación del Exoma , Pez Cebra

10.

The omics discovery REST interface.

Dass, Gaurhari; Vu, Manh-Tu; Xu, Pan; Audain, Enrique; Hitz, Marc-Phillip; Grüning, Björn A; Hermjakob, Henning; Perez-Riverol, Yasset.

Nucleic Acids Res ; 48(W1): W380-W384, 2020 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-32374843

RESUMEN

The Omics Discovery Index is an open source platform that can be used to access, discover and disseminate omics datasets. OmicsDI integrates proteomics, genomics, metabolomics, models and transcriptomics datasets. Using an efficient indexing system, OmicsDI integrates different biological entities including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed. In addition, it implements a group of pipelines to estimate the impact of each dataset by tracing the number of citations, reanalysis and biological entities reported by each dataset. Here, we present the OmicsDI REST interface (www.omicsdi.org/ws/) to enable programmatic access to any dataset in OmicsDI or all the datasets for a specific provider (database). Clients can perform queries on the API using different metadata information such as sample details (species, tissues, etc), instrumentation (mass spectrometer, sequencer), keywords and other provided annotations. In addition, we present two different libraries in R and Python to facilitate the development of tools that can programmatically interact with the OmicsDI REST interface.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Proteómica/métodos , Programas Informáticos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genómica/métodos , Metabolómica/métodos , Interfaz Usuario-Computador

11.

Loss of ADAMTS19 causes progressive non-syndromic heart valve disease.

Wünnemann, Florian; Ta-Shma, Asaf; Preuss, Christoph; Leclerc, Severine; van Vliet, Patrick Piet; Oneglia, Andrea; Thibeault, Maryse; Nordquist, Emily; Lincoln, Joy; Scharfenberg, Franka; Becker-Pauly, Christoph; Hofmann, Philipp; Hoff, Kirstin; Audain, Enrique; Kramer, Hans-Heiner; Makalowski, Wojciech; Nir, Amiram; Gerety, Sebastian S; Hurles, Matthew; Comes, Johanna; Fournier, Anne; Osinska, Hanna; Robins, Jeffrey; Pucéat, Michel; Elpeleg, Orly; Hitz, Marc-Phillip; Andelfinger, Gregor.

Nat Genet ; 52(1): 40-47, 2020 01.

Artículo en Inglés | MEDLINE | ID: mdl-31844321

RESUMEN

Valvular heart disease is observed in approximately 2% of the general population1. Although the initial observation is often localized (for example, to the aortic or mitral valve), disease manifestations are regularly observed in the other valves and patients frequently require surgery. Despite the high frequency of heart valve disease, only a handful of genes have so far been identified as the monogenic causes of disease2-7. Here we identify two consanguineous families, each with two affected family members presenting with progressive heart valve disease early in life. Whole-exome sequencing revealed homozygous, truncating nonsense alleles in ADAMTS19 in all four affected individuals. Homozygous knockout mice for Adamts19 show aortic valve dysfunction, recapitulating aspects of the human phenotype. Expression analysis using a lacZ reporter and single-cell RNA sequencing highlight Adamts19 as a novel marker for valvular interstitial cells; inference of gene regulatory networks in valvular interstitial cells positions Adamts19 in a highly discriminatory network driven by the transcription factor lymphoid enhancer-binding factor 1 downstream of the Wnt signaling pathway. Upregulation of endocardial Krüppel-like factor 2 in Adamts19 knockout mice precedes hemodynamic perturbation, showing that a tight balance in the Wnt-Adamts19-Klf2 axis is required for proper valve maturation and maintenance.

Asunto(s)

Proteínas ADAMTS/metabolismo , Regulación del Desarrollo de la Expresión Génica , Enfermedades de las Válvulas Cardíacas/etiología , Proteínas ADAMTS/genética , Animales , Familia , Femenino , Enfermedades de las Válvulas Cardíacas/patología , Humanos , Factores de Transcripción de Tipo Kruppel/genética , Factores de Transcripción de Tipo Kruppel/metabolismo , Masculino , Ratones , Ratones Noqueados , Linaje , Análisis de la Célula Individual , Vía de Señalización Wnt

12.

DNA methylation profiling allows for characterization of atrial and ventricular cardiac tissues and hiPSC-CMs.

Hoff, Kirstin; Lemme, Marta; Kahlert, Anne-Karin; Runde, Kerstin; Audain, Enrique; Schuster, Dorit; Scheewe, Jens; Attmann, Tim; Pickardt, Thomas; Caliebe, Almuth; Siebert, Reiner; Kramer, Hans-Heiner; Milting, Hendrik; Hansen, Arne; Ammerpohl, Ole; Hitz, Marc-Phillip.

Clin Epigenetics ; 11(1): 89, 2019 06 11.

Artículo en Inglés | MEDLINE | ID: mdl-31186048

RESUMEN

BACKGROUND: Cardiac disease modelling using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) requires thorough insight into cardiac cell type differentiation processes. However, current methods to discriminate different cardiac cell types are mostly time-consuming, are costly and often provide imprecise phenotypic evaluation. DNA methylation plays a critical role during early heart development and cardiac cellular specification. We therefore investigated the DNA methylation pattern in different cardiac tissues to identify CpG loci for further cardiac cell type characterization. RESULTS: An array-based genome-wide DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChips led to the identification of 168 differentially methylated CpG loci in atrial and ventricular human heart tissue samples (n = 49) from different patients with congenital heart defects (CHD). Systematic evaluation of atrial-ventricular DNA methylation pattern in cardiac tissues in an independent sample cohort of non-failing donor hearts and cardiac patients using bisulfite pyrosequencing helped us to define a subset of 16 differentially methylated CpG loci enabling precise characterization of human atrial and ventricular cardiac tissue samples. This defined set of reproducible cardiac tissue-specific DNA methylation sites allowed us to consistently detect the cellular identity of hiPSC-CM subtypes. CONCLUSION: Testing DNA methylation of only a small set of defined CpG sites thus makes it possible to distinguish atrial and ventricular cardiac tissues and cardiac atrial and ventricular subtypes of hiPSC-CMs. This method represents a rapid and reliable system for phenotypic characterization of in vitro-generated cardiomyocytes and opens new opportunities for cardiovascular research and patient-specific therapy.

Asunto(s)

Metilación de ADN , Atrios Cardíacos/citología , Cardiopatías Congénitas/patología , Ventrículos Cardíacos/citología , Miocitos Cardíacos/citología , Células Cultivadas , Islas de CpG , Femenino , Atrios Cardíacos/química , Cardiopatías Congénitas/genética , Ventrículos Cardíacos/química , Humanos , Células Madre Pluripotentes Inducidas/química , Células Madre Pluripotentes Inducidas/citología , Masculino , Modelos Biológicos , Miocitos Cardíacos/química , Especificidad de Órganos , Análisis de Secuencia de ADN , Ingeniería de Tejidos

13.

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Perez-Riverol, Yasset; Csordas, Attila; Bai, Jingwen; Bernal-Llinares, Manuel; Hewapathirana, Suresh; Kundu, Deepti J; Inuganti, Avinash; Griss, Johannes; Mayer, Gerhard; Eisenacher, Martin; Pérez, Enrique; Uszkoreit, Julian; Pfeuffer, Julianus; Sachsenberg, Timo; Yilmaz, Sule; Tiwary, Shivani; Cox, Jürgen; Audain, Enrique; Walzer, Mathias; Jarnuczak, Andrew F; Ternent, Tobias; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 47(D1): D442-D450, 2019 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-30395289

RESUMEN

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

Asunto(s)

Bases de Datos de Proteínas , Espectrometría de Masas , Proteómica , Péptidos/química , Programas Informáticos

14.

Accurate and fast feature selection workflow for high-dimensional omics data.

Perez-Riverol, Yasset; Kuhn, Max; Vizcaíno, Juan Antonio; Hitz, Marc-Phillip; Audain, Enrique.

PLoS One ; 12(12): e0189875, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29261781

RESUMEN

We are moving into the age of 'Big Data' in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S * F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical omics classification includes redundant and irrelevant features (e.g. genes or proteins) that can result in long computation times; decrease of the model performance and the selection of suboptimal features (genes and proteins) after the classification/regression step. Multiple algorithms and reviews has been published to describe all the existing methods for feature selection, their strengths and weakness. However, the selection of the correct FS algorithm and strategy constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a 'grey zone'. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domain's problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.

Asunto(s)

Algoritmos , Bases de Datos como Asunto , Genómica/métodos , Flujo de Trabajo , Humanos , Análisis Multivariante , Análisis de Componente Principal , Máquina de Vectores de Soporte

15.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset.

J Proteomics ; 150: 170-182, 2017 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-27498275

RESUMEN

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. SIGNIFICANCE: Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Bases de Datos de Proteínas , Proteómica/métodos , Motor de Búsqueda/métodos , Humanos , Péptidos/química , Isoformas de Proteínas , Programas Informáticos , Espectrometría de Masas en Tándem

16.

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences.

Audain, Enrique; Ramos, Yassel; Hermjakob, Henning; Flower, Darren R; Perez-Riverol, Yasset.

Bioinformatics ; 32(6): 821-7, 2016 03 15.

Artículo en Inglés | MEDLINE | ID: mdl-26568629

RESUMEN

MOTIVATION: In any macromolecular polyprotic system-for example protein, DNA or RNA-the isoelectric point-commonly referred to as the pI-can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge-and thus the electrophoretic mobility-of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. RESULTS: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. CONTACT: yperez@ebi.ac.uk AVAILABILITY AND IMPLEMENTATION: The software and data are freely available at https://github.com/ypriverol/pIRSupplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuencia de Aminoácidos , Focalización Isoeléctrica , Punto Isoeléctrico , Péptidos , Proteómica , Espectrometría de Masas en Tándem

17.

Bio-analytical method based on MALDI-MS analysis for the quantification of CIGB-300 anti-tumor peptide in human plasma.

Cabrales-Rico, Ania; de la Torre, Beatriz G; Garay, Hilda E; Machado, Yoan J; Gómez, Jose A; Audain, Enrique; Morales, Orlando; Besada, Vladimir; Marcelo, Jose Luis; Reyes, Vilcy; Perera, Yasser; Perea, Silvio E; Reyes, Osvaldo; González, Luis Javier.

J Pharm Biomed Anal ; 105: 107-114, 2015 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-25546027

RESUMEN

A fully validated bio-analytical method based on Matrix-Assisted-Laser-Desorption/Ionization-Time of Flight Mass Spectrometry was developed for quantitation in human plasma of the anti-tumor peptide CIGB-300. An analog of this peptide acetylated at the N-terminal, was used as internal standard for absolute quantitation. Acid treatment allowed efficient precipitation of plasma proteins as well as high recovery (approximately 80%) of the intact peptide. No other chromatographic step was required for sample processing before MALDI-MS analysis. Spectra were acquired in linear positive ion mode to ensure maximum sensitivity. The lower limit of quantitation was established at 0.5 µg/mL, which is equivalent to 160 fmol peptide. The calibration curve was linear from 0.5 to 7.5 µg/mL, with R(2)>0.98, and permitted quantitation of highly concentrated samples evaluated by dilution integrity testing. All parameters assessed for five validation batches met the FDA guidelines for industry. The method was successfully applied to analysis of clinical samples obtained in a phase I clinical trial following intravenous administration of CIGB-300 at a dose of 1.6 mg/kg body weight. With the exception of Cmax and AUC, pharmacokinetic parameters were similar for ELISA and MALDI-MS methods.

Asunto(s)

Antineoplásicos/sangre , Péptidos Cíclicos/sangre , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Acetilación , Antineoplásicos/química , Ensayos Clínicos como Asunto , Humanos , Inyecciones Intravenosas , Límite de Detección , Neoplasias/sangre , Neoplasias/tratamiento farmacológico , Péptidos Cíclicos/química , Estándares de Referencia , Reproducibilidad de los Resultados , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/instrumentación

18.

A survey of molecular descriptors used in mass spectrometry based proteomics.

Audain, Enrique; Sanchez, Aniel; Vizcaíno, Juan Antonio; Perez-Riverol, Yasset.

Curr Top Med Chem ; 14(3): 388-97, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24304317

RESUMEN

The field of proteomics has grown vertiginously in the last years. This has been due fundamentally to technological improvements in the instrumentation, methods, and easy-to-use software, thereby making it possible to address a large number of biological questions and to deepen the study of the proteome of several organisms. The development in the field has imposed a challenge in the computational analysis of the commonly obtained large datasets generated in a single proteomics experiment, which still remains. An alternative to tackle this general issue has been the use of auxiliary information generated during the proteomics experiment to validate the confidence of the identifications. In this manuscript we review the main molecular descriptors used for building predictor models for estimating retention time, isoelectric point and peptide "detectability", which are key tools in the design of several validation strategies based in these criteria. We also give an overview of the main open source tools and libraries used for computing molecular descriptors.

Asunto(s)

Espectrometría de Masas , Proteómica , Programas Informáticos

19.

Isoelectric point optimization using peptide descriptors and support vector machines.

Perez-Riverol, Yasset; Audain, Enrique; Millan, Aleli; Ramos, Yassel; Sanchez, Aniel; Vizcaíno, Juan Antonio; Wang, Rui; Müller, Markus; Machado, Yoan J; Betancourt, Lazaro H; González, Luis J; Padrón, Gabriel; Besada, Vladimir.

J Proteomics ; 75(7): 2269-74, 2012 Apr 03.

Artículo en Inglés | MEDLINE | ID: mdl-22326964

RESUMEN

IPG (Immobilized pH Gradient) based separations are frequently used as the first step in shotgun proteomics methods; it yields an increase in both the dynamic range and resolution of peptide separation prior to the LC-MS analysis. Experimental isoelectric point (pI) values can improve peptide identifications in conjunction with MS/MS information. Thus, accurate estimation of the pI value based on the amino acid sequence becomes critical to perform these kinds of experiments. Nowadays, pI is commonly predicted using the charge-state model [1], and/or the cofactor algorithm [2]. However, none of these methods is capable of calculating the pI value for basic peptides accurately. In this manuscript, we present an new approach that can significant improve the pI estimation, by using Support Vector Machines (SVM) [3], an experimental amino acid descriptor taken from the AAIndex database [4] and the isoelectric point predicted by the charge-state model. Our results have shown a strong correlation (R(2)=0.98) between the predicted and observed values, with a standard deviation of 0.32 pH units across the complete pH range.

Asunto(s)

Modelos Químicos , Péptidos/química , Máquina de Vectores de Soporte , Punto Isoeléctrico

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA