Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 118
Filtrar
1.
Database (Oxford) ; 20222022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35348648

RESUMO

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.


Assuntos
Mineração de Dados , Estudo de Associação Genômica Ampla , Bases de Dados Factuais
2.
Nat Biotechnol ; 40(5): 692-702, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35102292

RESUMO

Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of clinically relevant knowledge across multiple biomedical databases and publications, pose a challenge to data integration. Here we present the Clinical Knowledge Graph (CKG), an open-source platform currently comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature. The graph structure provides a flexible data model that is easily extendable to new nodes and relationships as new databases become available. The CKG incorporates statistical and machine learning algorithms that accelerate the analysis and interpretation of typical proteomics workflows. Using a set of proof-of-concept biomarker studies, we show how the CKG might augment and enrich proteomics data and help inform clinical decision-making.

3.
Microorganisms ; 10(2)2022 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-35208748

RESUMO

To elucidate ecosystem functioning, it is fundamental to recognize what processes occur in which environments (where) and which microorganisms carry them out (who). Here, we present PREGO, a one-stop-shop knowledge base providing such associations. PREGO combines text mining and data integration techniques to mine such what-where-who associations from data and metadata scattered in the scientific literature and in public omics repositories. Microorganisms, biological processes, and environment types are identified and mapped to ontology terms from established community resources. Analyses of comentions in text and co-occurrences in metagenomics data/metadata are performed to extract associations and a level of confidence is assigned to each of them thanks to a scoring scheme. The PREGO knowledge base contains associations for 364,508 microbial taxa, 1090 environmental types, 15,091 biological processes, and 7971 molecular functions with a total of almost 58 million associations. These associations are available through a web portal, an Application Programming Interface (API), and bulk download. By exploring environments and/or processes associated with each other or with microbes, PREGO aims to assist researchers in design and interpretation of experiments and their results. To demonstrate PREGO's capabilities, a thorough presentation of its web interface is given along with a meta-analysis of experimental results from a lagoon-sediment study of sulfur-cycle related microbes.

4.
NAR Genom Bioinform ; 3(4): lqab090, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34632381

RESUMO

Extracting and processing information from documents is of great importance as lots of experimental results and findings are stored in local files. Therefore, extracting and analyzing biomedical terms from such files in an automated way is absolutely necessary. In this article, we present OnTheFly2.0, a web application for extracting biomedical entities from individual files such as plain texts, office documents, PDF files or images. OnTheFly2.0 can generate informative summaries in popup windows containing knowledge related to the identified terms along with links to various databases. It uses the EXTRACT tagging service to perform named entity recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and gene ontology terms. Multiple files can be analyzed, whereas identified terms such as proteins or genes can be explored through functional enrichment analysis or be associated with diseases and PubMed entries. Finally, protein-protein and protein-chemical networks can be generated with the use of STRING and STITCH services. To demonstrate its capacity for knowledge discovery, we interrogated published meta-analyses of clinical biomarkers of severe COVID-19 and uncovered inflammatory and senescence pathways that impact disease pathogenesis. OnTheFly2.0 currently supports 197 species and is available at http://bib.fleming.gr:3838/OnTheFly/ and http://onthefly.pavlopouloslab.info.

5.
Bioinformatics ; 2021 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-34086846

RESUMO

MOTIVATION: Genome wide association studies (GWAS) can reveal important genotype-phenotype associations, however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. METHODS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite Relative Citation Ratio, and meanRank scores, to aggregate multivariate evidence. RESULTS: This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY: Web application, datasets, and source code via: https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Dev Cell ; 56(4): 461-477.e7, 2021 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-33621493

RESUMO

Homology-directed repair (HDR) safeguards DNA integrity under various forms of stress, but how HDR protects replicating genomes under extensive metabolic alterations remains unclear. Here, we report that besides stalling replication forks, inhibition of ribonucleotide reductase (RNR) triggers metabolic imbalance manifested by the accumulation of increased reactive oxygen species (ROS) in cell nuclei. This leads to a redox-sensitive activation of the ATM kinase followed by phosphorylation of the MRE11 nuclease, which in HDR-deficient settings degrades stalled replication forks. Intriguingly, nascent DNA degradation by the ROS-ATM-MRE11 cascade is also triggered by hypoxia, which elevates signaling-competent ROS and attenuates functional HDR without arresting replication forks. Under these conditions, MRE11 degrades daughter-strand DNA gaps, which accumulate behind active replisomes and attract error-prone DNA polymerases to escalate mutation rates. Thus, HDR safeguards replicating genomes against metabolic assaults by restraining mutagenic repair at aberrantly processed nascent DNA. These findings have implications for cancer evolution and tumor therapy.


Assuntos
Replicação do DNA , Genoma Humano , Metabolismo , Reparo de DNA por Recombinação , Proteínas Mutadas de Ataxia Telangiectasia/metabolismo , Proteína BRCA2/deficiência , Proteína BRCA2/metabolismo , Hipóxia Celular , Linhagem Celular Tumoral , DNA/metabolismo , Humanos , Proteína Homóloga a MRE11/metabolismo , Modelos Biológicos , Mutação/genética , Neoplasias/genética , Neoplasias/patologia , Polimerização , Espécies Reativas de Oxigênio/metabolismo , Transdução de Sinais
7.
Nat Biotechnol ; 39(5): 555-560, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33398153

RESUMO

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .


Assuntos
Genoma Bacteriano/genética , Metagenoma/genética , Anotação de Sequência Molecular , Software , Bacteroides/genética , Humanos , Metagenômica , Microbiota/genética
8.
Nucleic Acids Res ; 49(D1): D1334-D1346, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33156327

RESUMO

In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.


Assuntos
Bases de Dados Factuais , Genoma Humano , Doenças Neurodegenerativas/genética , Proteômica/métodos , Software , Viroses/genética , Animais , Anticonvulsivantes/química , Anticonvulsivantes/uso terapêutico , Antivirais/química , Antivirais/uso terapêutico , Produtos Biológicos/química , Produtos Biológicos/uso terapêutico , Mineração de Dados/estatística & dados numéricos , Interações Hospedeiro-Patógeno/efeitos dos fármacos , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Aprendizado de Máquina/estatística & dados numéricos , Camundongos , Camundongos Knockout , Terapia de Alvo Molecular/métodos , Doenças Neurodegenerativas/classificação , Doenças Neurodegenerativas/tratamento farmacológico , Doenças Neurodegenerativas/virologia , Mapeamento de Interação de Proteínas , Proteoma/agonistas , Proteoma/antagonistas & inibidores , Proteoma/genética , Proteoma/metabolismo , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/uso terapêutico , Viroses/classificação , Viroses/tratamento farmacológico , Viroses/virologia
9.
Neurochem Res ; 46(3): 447-454, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33249516

RESUMO

Gene expression studies are reported to be influenced by pre-analytical factors that can compromise RNA yield and integrity, which in turn may confound the experimental findings. Here we investigate the impact of four pre-analytical factors on brain-derived RNA: time-before-collection, tissue specimen size, tissue collection method, and RNA isolation method. We report no significant differences in RNA yield or integrity between 20 mg and 60 mg tissue samples collected in either liquid nitrogen or the RNAlater stabilizing solution. Isolation of RNA employing the TRIzol reagent resulted in a higher yield compared to isolation via the QIAcube kit while the latter resulted in RNA of slightly better integrity. Keeping brain tissue samples at room temperature for up to 160 min prior to collection and isolation of RNA resulted in no significant difference in yield or integrity. Our findings have significant practical and financial consequences for clinical genomic departments and other laboratory settings performing large-scale routine RNA expression analysis of brain samples.


Assuntos
Encéfalo/metabolismo , RNA/metabolismo , Animais , Camundongos , RNA/isolamento & purificação , Estabilidade de RNA , Manejo de Espécimes/métodos , Temperatura , Fatores de Tempo
10.
F1000Res ; 9: 157, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32399202

RESUMO

Cytoscape is an open-source software used to analyze and visualize biological networks. In addition to being able to import networks from a variety of sources, Cytoscape allows users to import tabular node data and visualize it onto networks. Unfortunately, such data tables can only contain one row of data per node, whereas omics data often have multiple rows for the same gene or protein, representing different post-translational modification sites, peptides, splice isoforms, or conditions. Here, we present a new app, Omics Visualizer, that allows users to import data tables with several rows referring to the same node, connect them to one or more networks, and visualize the connected data onto networks. Omics Visualizer uses the Cytoscape enhancedGraphics app to show the data either in the nodes (pie visualization) or around the nodes (donut visualization), where the colors of the slices represent the imported values. If the user does not provide a network, the app can retrieve one from the STRING database using the Cytoscape stringApp. The Omics Visualizer app is freely available at https://apps.cytoscape.org/apps/omicsvisualizer.


Assuntos
Biologia Computacional/métodos , Visualização de Dados , Software , Proteômica
11.
Bioinformatics ; 36(1): 264-271, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31199464

RESUMO

MOTIVATION: Information extraction by mining the scientific literature is key to uncovering relations between biomedical entities. Most existing approaches based on natural language processing extract relations from single sentence-level co-mentions, ignoring co-occurrence statistics over the whole corpus. Existing approaches counting entity co-occurrences ignore the textual context of each co-occurrence. RESULTS: We propose a novel corpus-wide co-occurrence scoring approach to relation extraction that takes the textual context of each co-mention into account. Our method, called CoCoScore, scores the certainty of stating an association for each sentence that co-mentions two entities. CoCoScore is trained using distant supervision based on a gold-standard set of associations between entities of interest. Instead of requiring a manually annotated training corpus, co-mentions are labeled as positives/negatives according to their presence/absence in the gold standard. We show that CoCoScore outperforms previous approaches in identifying human disease-gene and tissue-gene associations as well as in identifying physical and functional protein-protein associations in different species. CoCoScore is a versatile text mining tool to uncover pairwise associations via co-occurrence mining, within and beyond biomedical applications. AVAILABILITY AND IMPLEMENTATION: CoCoScore is available at: https://github.com/JungeAlexander/cocoscore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Mineração de Dados , Processamento de Linguagem Natural , Publicações , Biologia Computacional/métodos , Humanos , Proteínas/genética
12.
Cell ; 179(3): 802-802.e1, 2019 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-31626778

RESUMO

S-phase entry and exit are regulated by hundreds of protein complexes that assemble "just in time," orchestrated by a multitude of distinct events. To help understand their interplay, we have created a tailored visualization based on the Minardo layout, highlighting over 80 essential events. This complements our earlier visualization of M-phase, and both can be displayed together, giving a comprehensive overview of the events regulating the cell division cycle. To view this SnapShot, open or download the PDF.


Assuntos
Ciclo Celular/genética , Mitose/genética , Complexos Multiproteicos/genética , Fase S/genética , Divisão Celular/genética , Ciclina B/genética , Ciclina D/genética , Quinases Ciclina-Dependentes/genética , Fase G2/genética , Humanos , Fosforilação/genética , Complexo de Endopeptidases do Proteassoma/genética
14.
NPJ Syst Biol Appl ; 5: 27, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31396397

RESUMO

Non-oncogene addiction (NOA) genes are essential for supporting the stress-burdened phenotype of tumours and thus vital for their survival. Although NOA genes are acknowledged to be potential drug targets, there has been no large-scale attempt to identify and characterise them as a group across cancer types. Here we provide the first method for the identification of conditional NOA genes and their rewired neighbours using a systems approach. Using copy number data and expression profiles from The Cancer Genome Atlas (TCGA) we performed comparative analyses between high and low genomic stress tumours for 15 cancer types. We identified 101 condition-specific differential coexpression modules, mapped to a high-confidence human interactome, comprising 133 candidate NOA rewiring hub genes. We observe that most modules lose coexpression in the high-stress state and that activated stress modules and hubs take part in homoeostasis maintenance processes such as chromosome segregation, oxireductase activity, mitotic checkpoint (PLK1 signalling), DNA replication initiation and synaptic signalling. We furthermore show that candidate NOA rewiring hubs are unique for each cancer type, but that their respective rewired neighbour genes largely are shared across cancer types.


Assuntos
Biologia Computacional/métodos , Neoplasias/genética , Vício Oncogênico/genética , Algoritmos , Bases de Dados Genéticas , Redes Reguladoras de Genes , Genômica , Humanos , Mapeamento de Interação de Proteínas , Transcriptoma
15.
PLoS Comput Biol ; 15(8): e1007239, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31437145

RESUMO

Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.


Assuntos
Mineração de Dados/métodos , Proteínas de Fusão Oncogênica/genética , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Teorema de Bayes , Big Data , Biologia Computacional , Mineração de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Humanos , Mutação , Neoplasias/genética , Neoplasias/terapia , Proteínas de Fusão Oncogênica/química , Proteínas de Fusão Oncogênica/metabolismo , Medicina de Precisão , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas
16.
Methods Mol Biol ; 1939: 73-89, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30848457

RESUMO

PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Animais , Humanos , PubMed , Software
17.
J Cheminform ; 11(1): 19, 2019 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-30850898

RESUMO

Most BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision and recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. For about a decade, we have run named entity recognition (NER) web services, which are designed to be efficient, implemented using a multi-threaded queueing system to robustly handle many simultaneous requests, and hosted at a supercomputer facility. To participate in this new task, we extended the existing NER tagging service with support for the BeCalm API. The tagger suffered no downtime during the challenge and, as in earlier tests, proved to be highly efficient, consistently processing requests of 5000 abstracts in less than half a minute. In fact, the majority of this time was spent not on the NER task but rather on retrieving the document texts from the challenge servers. The latter was found to be the main bottleneck even when hosting a copy of the tagging service on a Raspberry Pi 3, showing that local document storage or caching would be desirable features to include in future revisions of the API standard.

18.
Bioinformatics ; 35(9): 1494-1502, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30295698

RESUMO

MOTIVATION: Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative. RESULTS: In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard dataset and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/xypan1232/DislncRF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA Longo não Codificante/genética , Genoma , Humanos , Aprendizado de Máquina
19.
Methods Mol Biol ; 1819: 175-196, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30421404

RESUMO

Since cell regulation and protein expression can be dramatically altered upon infection by viruses, studying the mechanisms by which viruses infect cells and the regulatory networks they disrupt is essential to understanding viral pathogenicity. This line of study can also lead to discoveries about the workings of host cells themselves. Computational methods are rapidly being developed to investigate viral-host interactions, and here we highlight recent methods and the insights that they have revealed so far, with a particular focus on methods that integrate different types of data. We also review the challenges of working with viruses compared with traditional cellular biology, and the limitations of current experimental and informatics methods.


Assuntos
Interações Hospedeiro-Patógeno/fisiologia , Modelos Biológicos , Proteínas Virais/metabolismo , Fenômenos Fisiológicos Virais , Vírus/metabolismo , Animais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...