Búsqueda | Portal de Búsqueda de la BVS

1.

Burgess, Andrew; Vuong, Jenny; Marzec, Kamila A; Nicolai de Lichtenberg, Ulrik; O'Donoghue, Seán I; Jensen, Lars Juhl.

Cell ; 179(3): 802-802.e1, 2019 10 17.

Artículo en Inglés | MEDLINE | ID: mdl-31626778

RESUMEN

S-phase entry and exit are regulated by hundreds of protein complexes that assemble "just in time," orchestrated by a multitude of distinct events. To help understand their interplay, we have created a tailored visualization based on the Minardo layout, highlighting over 80 essential events. This complements our earlier visualization of M-phase, and both can be displayed together, giving a comprehensive overview of the events regulating the cell division cycle. To view this SnapShot, open or download the PDF.

Asunto(s)

Ciclo Celular/genética , Mitosis/genética , Complejos Multiproteicos/genética , Fase S/genética , División Celular/genética , Ciclina B/genética , Ciclina D/genética , Quinasas Ciclina-Dependientes/genética , Fase G2/genética , Humanos , Fosforilación/genética , Complejo de la Endopetidasa Proteasomal/genética

2.

A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk.

Blair, David R; Lyttle, Christopher S; Mortensen, Jonathan M; Bearden, Charles F; Jensen, Anders Boeck; Khiabanian, Hossein; Melamed, Rachel; Rabadan, Raul; Bernstam, Elmer V; Brunak, Søren; Jensen, Lars Juhl; Nicolae, Dan; Shah, Nigam H; Grossman, Robert L; Cox, Nancy J; White, Kevin P; Rzhetsky, Andrey.

Cell ; 155(1): 70-80, 2013 Sep 26.

Artículo en Inglés | MEDLINE | ID: mdl-24074861

RESUMEN

Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.

Asunto(s)

Enfermedad/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Registros de Salud Personal , Humanos , Penetrancia , Polimorfismo de Nucleótido Simple

3.

FAVA: high-quality functional association networks inferred from scRNA-seq and proteomics data.

Koutrouli, Mikaela; Nastou, Katerina; Piera Líndez, Pau; Bouwmeester, Robbin; Rasmussen, Simon; Martens, Lennart; Jensen, Lars Juhl.

Bioinformatics ; 40(2)2024 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-38192003

RESUMEN

MOTIVATION: Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS: To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.

Asunto(s)

Proteómica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos

4.

Pharos 2023: an integrated resource for the understudied human proteome.

Kelleher, Keith J; Sheils, Timothy K; Mathias, Stephen L; Yang, Jeremy J; Metzger, Vincent T; Siramshetty, Vishal B; Nguyen, Dac-Trung; Jensen, Lars Juhl; Vidovic, Dusica; Schürer, Stephan C; Holmes, Jayme; Sharma, Karlie R; Pillai, Ajay; Bologa, Cristian G; Edwards, Jeremy S; Mathé, Ewy A; Oprea, Tudor I.

Nucleic Acids Res ; 51(D1): D1405-D1416, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36624666

RESUMEN

The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.

Asunto(s)

Bases de Datos Factuales , Terapia Molecular Dirigida , Proteoma , Humanos , Productos Biológicos , Descubrimiento de Drogas , Internet , Proteoma/efectos de los fármacos

5.

Metastatic Infiltration of Nervous Tissue and Periosteal Nerve Sprouting in Multiple Myeloma-Induced Bone Pain in Mice and Human.

Diaz-delCastillo, Marta; Palasca, Oana; Nemler, Tim T; Thygesen, Didde M; Chávez-Saldaña, Norma A; Vázquez-Mora, Juan A; Ponce Gomez, Lizeth Y; Jensen, Lars Juhl; Evans, Holly; Andrews, Rebecca E; Mandal, Aritri; Neves, David; Mehlen, Patrick; Caruso, James P; Dougherty, Patrick M; Price, Theodore J; Chantry, Andrew; Lawson, Michelle A; Andersen, Thomas L; Jimenez-Andrade, Juan M; Heegaard, Anne-Marie.

J Neurosci ; 43(29): 5414-5430, 2023 07 19.

Artículo en Inglés | MEDLINE | ID: mdl-37286351

RESUMEN

Multiple myeloma (MM) is a neoplasia of B plasma cells that often induces bone pain. However, the mechanisms underlying myeloma-induced bone pain (MIBP) are mostly unknown. Using a syngeneic MM mouse model, we show that periosteal nerve sprouting of calcitonin gene-related peptide (CGRP+) and growth associated protein 43 (GAP43+) fibers occurs concurrent to the onset of nociception and its blockade provides transient pain relief. MM patient samples also showed increased periosteal innervation. Mechanistically, we investigated MM induced gene expression changes in the dorsal root ganglia (DRG) innervating the MM-bearing bone of male mice and found alterations in pathways associated with cell cycle, immune response and neuronal signaling. The MM transcriptional signature was consistent with metastatic MM infiltration to the DRG, a never-before described feature of the disease that we further demonstrated histologically. In the DRG, MM cells caused loss of vascularization and neuronal injury, which may contribute to late-stage MIBP. Interestingly, the transcriptional signature of a MM patient was consistent with MM cell infiltration to the DRG. Overall, our results suggest that MM induces a plethora of peripheral nervous system alterations that may contribute to the failure of current analgesics and suggest neuroprotective drugs as appropriate strategies to treat early onset MIBP.SIGNIFICANCE STATEMENT Multiple myeloma (MM) is a painful bone marrow cancer that significantly impairs the quality of life of the patients. Analgesic therapies for myeloma-induced bone pain (MIBP) are limited and often ineffective, and the mechanisms of MIBP remain unknown. In this manuscript, we describe cancer-induced periosteal nerve sprouting in a mouse model of MIBP, where we also encounter metastasis to the dorsal root ganglia (DRG), a never-before described feature of the disease. Concomitant to myeloma infiltration, the lumbar DRGs presented blood vessel damage and transcriptional alterations, which may mediate MIBP. Explorative studies on human tissue support our preclinical findings. Understanding the mechanisms of MIBP is crucial to develop targeted analgesic with better efficacy and fewer side effects for this patient population.

Asunto(s)

Enfermedades Óseas , Mieloma Múltiple , Tejido Nervioso , Humanos , Ratones , Masculino , Animales , Mieloma Múltiple/complicaciones , Mieloma Múltiple/metabolismo , Mieloma Múltiple/patología , Calidad de Vida , Dolor/metabolismo , Tejido Nervioso/metabolismo , Tejido Nervioso/patología , Ganglios Espinales/metabolismo

6.

Opportunities and barriers in omics-based biomarker discovery for steatotic liver diseases.

Thiele, Maja; Villesen, Ida Falk; Niu, Lili; Johansen, Stine; Sulek, Karolina; Nishijima, Suguru; Espen, Lore Van; Keller, Marisa; Israelsen, Mads; Suvitaival, Tommi; Zawadzki, Andressa de; Juel, Helene Bæk; Brol, Maximilian Joseph; Stinson, Sara Elizabeth; Huang, Yun; Silva, Maria Camilla Alvarez; Kuhn, Michael; Anastasiadou, Ema; Leeming, Diana Julie; Karsdal, Morten; Matthijnssens, Jelle; Arumugam, Manimozhiyan; Dalgaard, Louise Torp; Legido-Quigley, Cristina; Mann, Matthias; Trebicka, Jonel; Bork, Peer; Jensen, Lars Juhl; Hansen, Torben; Krag, Aleksander.

J Hepatol ; 81(2): 345-359, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38552880

RESUMEN

The rising prevalence of liver diseases related to obesity and excessive use of alcohol is fuelling an increasing demand for accurate biomarkers aimed at community screening, diagnosis of steatohepatitis and significant fibrosis, monitoring, prognostication and prediction of treatment efficacy. Breakthroughs in omics methodologies and the power of bioinformatics have created an excellent opportunity to apply technological advances to clinical needs, for instance in the development of precision biomarkers for personalised medicine. Via omics technologies, biological processes from the genes to circulating protein, as well as the microbiome - including bacteria, viruses and fungi, can be investigated on an axis. However, there are important barriers to omics-based biomarker discovery and validation, including the use of semi-quantitative measurements from untargeted platforms, which may exhibit high analytical, inter- and intra-individual variance. Standardising methods and the need to validate them across diverse populations presents a challenge, partly due to disease complexity and the dynamic nature of biomarker expression at different disease stages. Lack of validity causes lost opportunities when studies fail to provide the knowledge needed for regulatory approvals, all of which contributes to a delayed translation of these discoveries into clinical practice. While no omics-based biomarkers have matured to clinical implementation, the extent of data generated has enabled the hypothesis-free discovery of a plethora of candidate biomarkers that warrant further validation. To explore the many opportunities of omics technologies, hepatologists need detailed knowledge of commonalities and differences between the various omics layers, and both the barriers to and advantages of these approaches.

Asunto(s)

Biomarcadores , Humanos , Biomarcadores/análisis , Biomarcadores/metabolismo , Hígado Graso/diagnóstico , Hígado Graso/genética , Proteómica/métodos , Metabolómica/métodos , Genómica/métodos

7.

S1000: a better taxonomic name corpus for biomedical information extraction.

Luoma, Jouni; Nastou, Katerina; Ohta, Tomoko; Toivonen, Harttu; Pafilis, Evangelos; Jensen, Lars Juhl; Pyysalo, Sampo.

Bioinformatics ; 39(6)2023 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-37289518

RESUMEN

MOTIVATION: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. RESULTS: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods. AVAILABILITY AND IMPLEMENTATION: All resources introduced in this study are available under open licenses from https://jensenlab.org/resources/s1000/. The webpage contains links to a Zenodo project and three GitHub repositories associated with the study.

Asunto(s)

Minería de Datos , Minería de Datos/métodos

8.

Identifying the genes impacted by cell proliferation in proteomics and transcriptomics studies.

Locard-Paulet, Marie; Palasca, Oana; Jensen, Lars Juhl.

PLoS Comput Biol ; 18(10): e1010604, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-36201535

RESUMEN

Hypothesis-free high-throughput profiling allows relative quantification of thousands of proteins or transcripts across samples and thereby identification of differentially expressed genes. It is used in many biological contexts to characterize differences between cell lines and tissues, identify drug mode of action or drivers of drug resistance, among others. Changes in gene expression can also be due to confounding factors that were not accounted for in the experimental plan, such as change in cell proliferation. We combined the analysis of 1,076 and 1,040 cell lines in five proteomics and three transcriptomics data sets to identify 157 genes that correlate with cell proliferation rates. These include actors in DNA replication and mitosis, and genes periodically expressed during the cell cycle. This signature of cell proliferation is a valuable resource when analyzing high-throughput data showing changes in proliferation across conditions. We show how to use this resource to help in interpretation of in vitro drug screens and tumor samples. It informs on differences of cell proliferation rates between conditions where such information is not directly available. The signature genes also highlight which hits in a screen may be due to proliferation changes; this can either contribute to biological interpretation or help focus on experiment-specific regulation events otherwise buried in the statistical analysis.

Asunto(s)

Proteómica , Transcriptoma , Transcriptoma/genética , Perfilación de la Expresión Génica , Proliferación Celular/genética , Mitosis

9.

TCRD and Pharos 2021: mining the human proteome for disease biology.

Sheils, Timothy K; Mathias, Stephen L; Kelleher, Keith J; Siramshetty, Vishal B; Nguyen, Dac-Trung; Bologa, Cristian G; Jensen, Lars Juhl; Vidovic, Dusica; Koleti, Amar; Schürer, Stephan C; Waller, Anna; Yang, Jeremy J; Holmes, Jayme; Bocci, Giovanni; Southall, Noel; Dharkar, Poorva; Mathé, Ewy; Simeonov, Anton; Oprea, Tudor I.

Nucleic Acids Res ; 49(D1): D1334-D1346, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33156327

RESUMEN

In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.

Asunto(s)

Bases de Datos Factuales , Genoma Humano , Enfermedades Neurodegenerativas/genética , Proteómica/métodos , Programas Informáticos , Virosis/genética , Animales , Anticonvulsivantes/química , Anticonvulsivantes/uso terapéutico , Antivirales/química , Antivirales/uso terapéutico , Productos Biológicos/química , Productos Biológicos/uso terapéutico , Minería de Datos/estadística & datos numéricos , Interacciones Huésped-Patógeno/efectos de los fármacos , Interacciones Huésped-Patógeno/genética , Humanos , Internet , Aprendizaje Automático/estadística & datos numéricos , Ratones , Ratones Noqueados , Terapia Molecular Dirigida/métodos , Enfermedades Neurodegenerativas/clasificación , Enfermedades Neurodegenerativas/tratamiento farmacológico , Enfermedades Neurodegenerativas/virología , Mapeo de Interacción de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inhibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/uso terapéutico , Virosis/clasificación , Virosis/tratamiento farmacológico , Virosis/virología

10.

TIGA: target illumination GWAS analytics.

Yang, Jeremy J; Grissa, Dhouha; Lambert, Christophe G; Bologa, Cristian G; Mathias, Stephen L; Waller, Anna; Wild, David J; Jensen, Lars Juhl; Oprea, Tudor I.

Bioinformatics ; 37(21): 3865-3873, 2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34086846

RESUMEN

MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo , Iluminación , Genotipo , Polimorfismo de Nucleótido Simple , Fenotipo

11.

CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.

Junge, Alexander; Jensen, Lars Juhl.

Bioinformatics ; 36(1): 264-271, 2020 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31199464

RESUMEN

MOTIVATION: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence. RESULTS: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications. AVAILABILITY AND IMPLEMENTATION: CoCoScore is available at: https://github.com/JungeAlexander/cocoscore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional , Minería de Datos , Procesamiento de Lenguaje Natural , Publicaciones , Biología Computacional/métodos , Humanos , Proteínas/genética

12.

Yield and Integrity of RNA from Brain Samples are Largely Unaffected by Pre-analytical Procedures.

Jensen, Pernille Søs Hovgaard; Johansen, Maja; Bak, Lasse K; Jensen, Lars Juhl; Kjær, Christina.

Neurochem Res ; 46(3): 447-454, 2021 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-33249516

RESUMEN

Gene expression studies are reported to be influenced by pre-analytical factors that can compromise RNA yield and integrity, which in turn may confound the experimental findings. Here we investigate the impact of four pre-analytical factors on brain-derived RNA: time-before-collection, tissue specimen size, tissue collection method, and RNA isolation method. We report no significant differences in RNA yield or integrity between 20 mg and 60 mg tissue samples collected in either liquid nitrogen or the RNAlater stabilizing solution. Isolation of RNA employing the TRIzol reagent resulted in a higher yield compared to isolation via the QIAcube kit while the latter resulted in RNA of slightly better integrity. Keeping brain tissue samples at room temperature for up to 160 min prior to collection and isolation of RNA resulted in no significant difference in yield or integrity. Our findings have significant practical and financial consequences for clinical genomic departments and other laboratory settings performing large-scale routine RNA expression analysis of brain samples.

Asunto(s)

Encéfalo/metabolismo , ARN/metabolismo , Animales , Ratones , ARN/aislamiento & purificación , Estabilidad del ARN , Manejo de Especímenes/métodos , Temperatura , Factores de Tiempo

13.

Inferring disease-associated long non-coding RNAs using genome-wide tissue expression profiles.

Pan, Xiaoyong; Jensen, Lars Juhl; Gorodkin, Jan.

Bioinformatics ; 35(9): 1494-1502, 2019 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-30295698

RESUMEN

MOTIVATION: Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative. RESULTS: In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard dataset and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/xypan1232/DislncRF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

ARN Largo no Codificante/genética , Genoma , Humanos , Aprendizaje Automático

14.

ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins.

Tagore, Somnath; Gorohovski, Alessandro; Jensen, Lars Juhl; Frenkel-Morgenstern, Milana.

PLoS Comput Biol ; 15(8): e1007239, 2019 08.

Artículo en Inglés | MEDLINE | ID: mdl-31437145

RESUMEN

Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.

Asunto(s)

Minería de Datos/métodos , Proteínas de Fusión Oncogénica/genética , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Teorema de Bayes , Macrodatos , Biología Computacional , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Humanos , Mutación , Neoplasias/genética , Neoplasias/terapia , Proteínas de Fusión Oncogénica/química , Proteínas de Fusión Oncogénica/metabolismo , Medicina de Precisión , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Mapas de Interacción de Proteínas

15.

Standardized benchmarking in the quest for orthologs.

Altenhoff, Adrian M; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P; Schreiber, Fabian; da Silva, Alan Sousa; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Jensen, Lars Juhl; Martin, Maria J; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nat Methods ; 13(5): 425-30, 2016 05.

Artículo en Inglés | MEDLINE | ID: mdl-27043882

RESUMEN

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

Asunto(s)

Biología Computacional/normas , Genómica/normas , Filogenia , Proteómica/normas , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Eucariontes/clasificación , Eucariontes/genética , Ontología de Genes , Genómica/métodos , Modelos Genéticos , Proteómica/métodos , Análisis de Secuencia de Proteína , Homología de Secuencia , Especificidad de la Especie

16.

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren.

PLoS Comput Biol ; 14(2): e1005962, 2018 02.

Artículo en Inglés | MEDLINE | ID: mdl-29447159

RESUMEN

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Asunto(s)

Indización y Redacción de Resúmenes , Minería de Datos/métodos , Almacenamiento y Recuperación de la Información , MEDLINE , Área Bajo la Curva , Biología Computacional/métodos , Reacciones Falso Positivas , Genes , Publicaciones Periódicas como Asunto , Proteínas/genética , Curva ROC , Programas Informáticos , Terminología como Asunto

17.

Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

Orlando, Ludovic; Ginolhac, Aurélien; Zhang, Guojie; Froese, Duane; Albrechtsen, Anders; Stiller, Mathias; Schubert, Mikkel; Cappellini, Enrico; Petersen, Bent; Moltke, Ida; Johnson, Philip L F; Fumagalli, Matteo; Vilstrup, Julia T; Raghavan, Maanasa; Korneliussen, Thorfinn; Malaspinas, Anna-Sapfo; Vogt, Josef; Szklarczyk, Damian; Kelstrup, Christian D; Vinther, Jakob; Dolocan, Andrei; Stenderup, Jesper; Velazquez, Amhed M V; Cahill, James; Rasmussen, Morten; Wang, Xiaoli; Min, Jiumeng; Zazula, Grant D; Seguin-Orlando, Andaine; Mortensen, Cecilie; Magnussen, Kim; Thompson, John F; Weinstock, Jacobo; Gregersen, Kristian; Røed, Knut H; Eisenmann, Véra; Rubin, Carl J; Miller, Donald C; Antczak, Douglas F; Bertelsen, Mads F; Brunak, Søren; Al-Rasheid, Khaled A S; Ryder, Oliver; Andersson, Leif; Mundy, John; Krogh, Anders; Gilbert, M Thomas P; Kjær, Kurt; Sicheritz-Ponten, Thomas; Jensen, Lars Juhl.

Nature ; 499(7456): 74-8, 2013 Jul 04.

Artículo en Inglés | MEDLINE | ID: mdl-23803765

RESUMEN

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.

Asunto(s)

Evolución Molecular , Genoma/genética , Caballos/genética , Filogenia , Animales , Conservación de los Recursos Naturales , ADN/análisis , ADN/genética , Especies en Peligro de Extinción , Equidae/clasificación , Equidae/genética , Fósiles , Variación Genética/genética , Historia Antigua , Caballos/clasificación , Proteínas/análisis , Proteínas/química , Proteínas/genética , El Yukón

18.

Accurate Quantification of Site-specific Acetylation Stoichiometry Reveals the Impact of Sirtuin Deacetylase CobB on the E. coli Acetylome.

Weinert, Brian Tate; Satpathy, Shankha; Hansen, Bogi Karbech; Lyon, David; Jensen, Lars Juhl; Choudhary, Chunaram.

Mol Cell Proteomics ; 16(5): 759-769, 2017 05.

Artículo en Inglés | MEDLINE | ID: mdl-28254776

RESUMEN

Lysine acetylation is a protein posttranslational modification (PTM) that occurs on thousands of lysine residues in diverse organisms from bacteria to humans. Accurate measurement of acetylation stoichiometry on a proteome-wide scale remains challenging. Most methods employ a comparison of chemically acetylated peptides to native acetylated peptides, however, the potentially large differences in abundance between these peptides presents a challenge for accurate quantification. Stable isotope labeling by amino acids in cell culture (SILAC)-based mass spectrometry (MS) is one of the most widely used quantitative proteomic methods. Here we show that serial dilution of SILAC-labeled peptides (SD-SILAC) can be used to identify accurately quantified peptides and to estimate the quantification error rate. We applied SD-SILAC to determine absolute acetylation stoichiometry in exponentially-growing and stationary-phase wild-type and Sirtuin deacetylase CobB-deficient cells. To further analyze CobB-regulated sites under conditions of globally increased or decreased acetylation, we measured stoichiometry in phophotransacetylase (ptaΔ) and acetate kinase (ackAΔ) mutant strains in the presence and absence of the Sirtuin inhibitor nicotinamide. We measured acetylation stoichiometry at 3,669 unique sites and found that the vast majority of acetylation occurred at a low stoichiometry. Manipulations that cause increased nonenzymatic acetylation by acetyl-phosphate (AcP), such as stationary-phase arrest and deletion of ackA, resulted in globally increased acetylation stoichiometry. Comparison to relative quantification under the same conditions validated our stoichiometry estimates at hundreds of sites, demonstrating the accuracy of our method. Similar to Sirtuin deacetylase 3 (SIRT3) in mitochondria, CobB suppressed acetylation to lower than median stoichiometry in WT, ptaΔ, and ackAΔ cells. Together, our results provide a detailed view of acetylation stoichiometry in E. coli and suggest an evolutionarily conserved function of Sirtuin deacetylases in suppressing low stoichiometry acetylation.

Asunto(s)

Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , Proteoma/metabolismo , Sirtuinas/metabolismo , Acetilación , Marcaje Isotópico

19.

Pharos: Collating protein information to shed light on the druggable genome.

Nguyen, Dac-Trung; Mathias, Stephen; Bologa, Cristian; Brunak, Soren; Fernandez, Nicolas; Gaulton, Anna; Hersey, Anne; Holmes, Jayme; Jensen, Lars Juhl; Karlsson, Anneli; Liu, Guixia; Ma'ayan, Avi; Mandava, Geetha; Mani, Subramani; Mehta, Saurabh; Overington, John; Patel, Juhee; Rouillard, Andrew D; Schürer, Stephan; Sheils, Timothy; Simeonov, Anton; Sklar, Larry A; Southall, Noel; Ursu, Oleg; Vidovic, Dusica; Waller, Anna; Yang, Jeremy; Jadhav, Ajit; Oprea, Tudor I; Guha, Rajarshi.

Nucleic Acids Res ; 45(D1): D995-D1002, 2017 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-27903890

RESUMEN

The 'druggable genome' encompasses several protein families, but only a subset of targets within them have attracted significant research attention and thus have information about them publicly available. The Illuminating the Druggable Genome (IDG) program was initiated in 2014, has the goal of developing experimental techniques and a Knowledge Management Center (KMC) that would collect and organize information about protein targets from four families, representing the most common druggable targets with an emphasis on understudied proteins. Here, we describe two resources developed by the KMC: the Target Central Resource Database (TCRD) which collates many heterogeneous gene/protein datasets and Pharos (https://pharos.nih.gov), a multimodal web interface that presents the data from TCRD. We briefly describe the types and sources of data considered by the KMC and then highlight features of the Pharos interface designed to enable intuitive access to the IDG knowledgebase. The aim of Pharos is to encourage 'serendipitous browsing', whereby related, relevant information is made easily discoverable. We conclude by describing two use cases that highlight the utility of Pharos and TCRD.

Asunto(s)

Bases de Datos Genéticas , Descubrimiento de Drogas , Genómica , Farmacogenética , Motor de Búsqueda , Análisis por Conglomerados , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Genómica/métodos , Humanos , Obesidad/tratamiento farmacológico , Obesidad/genética , Obesidad/metabolismo , Farmacogenética/métodos , Programas Informáticos , Navegador Web

20.

Impact of acute alcohol consumption on circulating microbiome in asymptomatic alcohol-related liver disease.

Israelsen, Mads; Alvarez-Silva, Camila; Madsen, Bjørn Stæhr; Hansen, Camilla Dalby; Torp, Nikolaj Christian; Johansen, Stine; Hansen, Johanne Kragh; Prier Lindvig, Katrine; Insonere, Jeanlouis; Riviere, Virginie; Juel, Helene Bæk; Brejnrod, Asker; Jensen, Lars Juhl; Thiele, Maja; Lelouvier, Benjamin; Hansen, Torben; Arumugam, Manimozhiyan; Krag, Aleksander.

Gut ; 2023 Jun 21.

Artículo en Inglés | MEDLINE | ID: mdl-37344168

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA