Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 118(38)2021 09 21.
Artículo en Inglés | MEDLINE | ID: mdl-34526403

RESUMEN

The spleen contains phenotypically and functionally distinct conventional dendritic cell (cDC) subpopulations, termed cDC1 and cDC2, which each can be divided into several smaller and less well-characterized subsets. Despite advances in understanding the complexity of cDC ontogeny by transcriptional programming, the significance of posttranslational modifications in controlling tissue-specific cDC subset immunobiology remains elusive. Here, we identified the cell-surface-expressed A-disintegrin-and-metalloproteinase 10 (ADAM10) as an essential regulator of cDC1 and cDC2 homeostasis in the splenic marginal zone (MZ). Mice with a CD11c-specific deletion of ADAM10 (ADAM10ΔCD11c) exhibited a complete loss of splenic ESAMhi cDC2A because ADAM10 regulated the commitment, differentiation, and survival of these cells. The major pathways controlled by ADAM10 in ESAMhi cDC2A are Notch, signaling pathways involved in cell proliferation and survival (e.g., mTOR, PI3K/AKT, and EIF2 signaling), and EBI2-mediated localization within the MZ. In addition, we discovered that ADAM10 is a molecular switch regulating cDC2 subset heterogeneity in the spleen, as the disappearance of ESAMhi cDC2A in ADAM10ΔCD11c mice was compensated for by the emergence of a Clec12a+ cDC2B subset closely resembling cDC2 generally found in peripheral lymph nodes. Moreover, in ADAM10ΔCD11c mice, terminal differentiation of cDC1 was abrogated, resulting in severely reduced splenic Langerin+ cDC1 numbers. Next to the disturbed splenic cDC compartment, ADAM10 deficiency on CD11c+ cells led to an increase in marginal metallophilic macrophage (MMM) numbers. In conclusion, our data identify ADAM10 as a molecular hub on both cDC and MMM regulating their transcriptional programming, turnover, homeostasis, and ability to shape the anatomical niche of the MZ.


Asunto(s)
Proteína ADAM10/metabolismo , Secretasas de la Proteína Precursora del Amiloide/metabolismo , Células Dendríticas/metabolismo , Proteínas de la Membrana/metabolismo , Proteína ADAM10/fisiología , Secretasas de la Proteína Precursora del Amiloide/fisiología , Animales , Células Presentadoras de Antígenos/metabolismo , Antígeno CD11c/metabolismo , Diferenciación Celular , Proliferación Celular , Femenino , Homeostasis , Tejido Linfoide/metabolismo , Macrófagos/metabolismo , Masculino , Proteínas de la Membrana/fisiología , Ratones , Ratones Endogámicos C57BL , Células Mieloides/metabolismo , Fosfatidilinositol 3-Quinasas/metabolismo , Procesamiento Proteico-Postraduccional/genética , Procesamiento Proteico-Postraduccional/fisiología , Transducción de Señal , Bazo/citología , Bazo/metabolismo
2.
Curr Issues Mol Biol ; 45(12): 9904-9916, 2023 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-38132464

RESUMEN

Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions linked to CVD such as atherosclerosis, diabetes mellitus, obesity and dietary states. Notably, lipid research is particularly well developed in the context of CVD because of the relevance and multiple causes and risk factors of CVD. The advent of methods for high-throughput screening of biological molecules has recently resulted in the generation of lipidomic profiles that allow monitoring of lipid compositions in biological samples in an untargeted manner. These and other earlier advances in biomedical research have shaped the knowledge we have about lipids in CVD. To evaluate the knowledge acquired on the multiple biological functions of lipids in CVD and the trends in their research, we collected a dataset of references from the PubMed database of biomedical literature focused on plasma lipids and CVD in human and mouse. Using annotations from these records, we were able to categorize significant associations between lipids and particular types of research approaches, distinguish non-biological lipids used as markers, identify differential research between human and mouse models, and detect the increasingly mechanistic nature of the results in this field. Using known associations between lipids and proteins that metabolize or transport them, we constructed a comprehensive lipid-protein network, which we used to highlight proteins strongly connected to lipids found in the CVD-lipid literature. Our approach points to a series of proteins for which lipid-focused research would bring insights into CVD, including Prostaglandin G/H synthase 2 (PTGS2, a.k.a. COX2) and Acylglycerol kinase (AGK). In this review, we summarize our findings, putting them in a historical perspective of the evolution of lipid research in CVD.

3.
BMC Bioinformatics ; 23(Suppl 6): 279, 2022 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35836114

RESUMEN

BACKGROUND: The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. RESULTS: We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). CONCLUSIONS: In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably  corrected statistically in well-designed experiments.


Asunto(s)
Algoritmos , Programas Informáticos , Análisis por Conglomerados , Aprendizaje Automático , RNA-Seq
4.
Bioinformatics ; 37(21): 3981-3982, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34358314

RESUMEN

SUMMARY: Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of records from the PubMed biomedical literature database discussing lipids and diseases, predicts their association and ranks them according to false discovery rates generated by random simulations. The tool takes into account 4270 diseases and 4798 lipids. Since the tool extracts the information from PubMed records, the number of diseases and lipids will be expanded over time as the biomedical literature grows. AVAILABILITY AND IMPLEMENTATION: The LipiDisease webserver can be freely accessed at http://cbdm-01.zdv.uni-mainz.de:3838/piyusmor/LipiDisease/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Lípidos , Programas Informáticos , PubMed , Bases de Datos Factuales , Lípidos/análisis , Minería de Datos
5.
Nucleic Acids Res ; 48(9): e53, 2020 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-32187374

RESUMEN

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7 and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow-up experiments.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Línea Celular , Línea Celular Tumoral , Cromatina/metabolismo , Células Madre Embrionarias/metabolismo , Evolución Molecular , Variación Genética , Genómica/métodos , Humanos , Células K562 , Células MCF-7 , Ratones , Nucleótidos/análisis , Análisis de Componente Principal , Regiones Promotoras Genéticas , Estructuras R-Loop
6.
Genes Dev ; 27(17): 1932-46, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24013505

RESUMEN

Understanding how distinct cell types arise from multipotent progenitor cells is a major quest in stem cell biology. The liver and pancreas share many aspects of their early development and possibly originate from a common progenitor. However, how liver and pancreas cells diverge from a common endoderm progenitor population and adopt specific fates remains elusive. Using RNA sequencing (RNA-seq), we defined the molecular identity of liver and pancreas progenitors that were isolated from the mouse embryo at two time points, spanning the period when the lineage decision is made. The integration of temporal and spatial gene expression profiles unveiled mutually exclusive signaling signatures in hepatic and pancreatic progenitors. Importantly, we identified the noncanonical Wnt pathway as a potential developmental regulator of this fate decision and capable of inducing the pancreas program in endoderm and liver cells. Our study offers an unprecedented view of gene expression programs in liver and pancreas progenitors and forms the basis for formulating lineage-reprogramming strategies to convert adult hepatic cells into pancreatic cells.


Asunto(s)
Diferenciación Celular , Regulación del Desarrollo de la Expresión Génica , Hígado , Páncreas , Transducción de Señal , Células Madre/citología , Animales , Línea Celular , Linaje de la Célula , Endodermo/citología , Perfilación de la Expresión Génica , Hígado/citología , Hígado/embriología , Ratones , Páncreas/citología , Páncreas/embriología , Análisis de Secuencia de ARN , Factores de Tiempo , Proteínas Wnt/genética , Proteínas Wnt/metabolismo , Xenopus/embriología
7.
Methods ; 132: 57-65, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28716510

RESUMEN

Toxicity affecting humans is studied by observing the effects of chemical substances in animal organisms (in vivo) or in animal and human cultivated cell lines (in vitro). Toxicogenomics studies collect gene expression profiles and histopathology assessment data for hundreds of drugs and pollutants in standardized experimental designs using different model systems. These data are an invaluable source for analyzing genome-wide drug response in biological systems. However, a problem remains that is how to evaluate the suitability of heterogeneous in vitro and in vivo systems to model the many different aspects of human toxicity. We propose here that a given model system (cell type or animal organ) is supported to appropriately describe a particular aspect of human toxicity if the set of compounds associated in the literature with that aspect of toxicity causes a change in expression of genes with a particular function in the tested model system. This approach provides candidate genes to explain the toxicity effect (the differentially expressed genes) and the compounds whose effect could be modeled (the ones producing both the change of expression in the model system and that are associated with the human phenotype in the literature). Here we present an application of this approach using a computational pipeline that integrates compound-induced gene expression profiles (from the Open TG-GATEs database) and biomedical literature annotations (from the PubMed database) to evaluate the suitability of (human and rat) in vitro systems as well as rat in vivo systems to model human toxicity.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Animales , Células Cultivadas , Hepatocitos/efectos de los fármacos , Hepatocitos/fisiología , Humanos , Ratas , Toxicogenética , Transcriptoma
8.
Methods ; 74: 90-6, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25484337

RESUMEN

Clinical evaluation of patients and diagnosis of disorder is crucial to make decisions on appropriate therapies. In addition, in the case of genetic disorders resulting from gene abnormalities, phenotypic effects may guide basic research on the mechanisms of a disorder to find the mutated gene and therefore to propose novel targets for drug therapy. However, this approach is complicated by two facts. First, the relationship between genes and disorders is not simple: one gene may be related to multiple disorders and a disorder may be caused by mutations in different genes. Second, recognizing relevant phenotypes might be difficult for clinicians working with patients of closely related complex disorders. Neuropsychiatric disorders best illustrate these difficulties since phenotypes range from metabolic to behavioral aspects, the latter extremely complex. Based on our clinical expertise on five neurodegenerative disorders, and from the wealth of bibliographical data on neuropsychiatric disorders, we have built a resource to infer associations between genes, chemicals, phenotypes for a total of 31 disorders. An initial step of automated text mining of the literature related to 31 disorders returned thousands of enriched terms. Fewer relevant phenotypic terms were manually selected by clinicians as relevant to the five neural disorders of their expertise and used to analyze the complete set of disorders. Analysis of the data indicates general relationships between neuropsychiatric disorders, which can be used to classify and characterize them. Correlation analyses allowed us to propose novel associations of genes and drugs with disorders. More generally, the results led us to uncovering mechanisms of disease that span multiple neuropsychiatric disorders, for example that genes related to synaptic transmission and receptor functions tend to be involved in many disorders, whereas genes related to sensory perception and channel transport functions are associated with fewer disorders. Our study shows that starting from expertise covering a limited set of neurological disorders and using text and data mining methods, meaningful and novel associations regarding genes, chemicals and phenotypes can be derived for an expanded set of neuropsychiatric disorders. Our results are intended for clinicians to help them evaluate patients, and for basic scientists to propose new gene targets for drug therapies. This strategy can be extended to virtually all diseases and takes advantage of the ever increasing amount of biomedical literature.


Asunto(s)
Minería de Datos/métodos , Bases de Datos Genéticas , Redes Reguladoras de Genes/genética , Trastornos Mentales/genética , Fenotipo , Bases de Datos Genéticas/normas , Humanos
9.
Nucleic Acids Res ; 42(Database issue): D950-8, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24304896

RESUMEN

CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∼900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians.


Asunto(s)
Células/metabolismo , Bases de Datos Factuales , Animales , Línea Celular , Fenómenos Fisiológicos Celulares , Células/citología , Estructuras Celulares/ultraestructura , Minería de Datos , Perfilación de la Expresión Génica , Humanos , Internet , Riñón/citología , Hígado/citología , Proteínas/metabolismo , ARN/metabolismo
10.
Nucleic Acids Res ; 41(3): 1496-507, 2013 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-23275563

RESUMEN

The yeast two-hybrid (Y2H) system is the most widely applied methodology for systematic protein-protein interaction (PPI) screening and the generation of comprehensive interaction networks. We developed a novel Y2H interaction screening procedure using DNA microarrays for high-throughput quantitative PPI detection. Applying a global pooling and selection scheme to a large collection of human open reading frames, proof-of-principle Y2H interaction screens were performed for the human neurodegenerative disease proteins huntingtin and ataxin-1. Using systematic controls for unspecific Y2H results and quantitative benchmarking, we identified and scored a large number of known and novel partner proteins for both huntingtin and ataxin-1. Moreover, we show that this parallelized screening procedure and the global inspection of Y2H interaction data are uniquely suited to define specific PPI patterns and their alteration by disease-causing mutations in huntingtin and ataxin-1. This approach takes advantage of the specificity and flexibility of DNA microarrays and of the existence of solid-related statistical methods for the analysis of DNA microarray data, and allows a quantitative approach toward interaction screens in human and in model organisms.


Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Técnicas del Sistema de Dos Híbridos , Ataxina-1 , Ataxinas , Humanos , Proteína Huntingtina , Mutación , Proteínas del Tejido Nervioso/genética , Proteínas del Tejido Nervioso/metabolismo , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Sistemas de Lectura Abierta , Mapas de Interacción de Proteínas , Levaduras/genética
11.
PLoS Comput Biol ; 9(1): e1002860, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23300433

RESUMEN

Interactions of proteins regulate signaling, catalysis, gene expression and many other cellular functions. Therefore, characterizing the entire human interactome is a key effort in current proteomics research. This challenge is complicated by the dynamic nature of protein-protein interactions (PPIs), which are conditional on the cellular context: both interacting proteins must be expressed in the same cell and localized in the same organelle to meet. Additionally, interactions underlie a delicate control of signaling pathways, e.g. by post-translational modifications of the protein partners - hence, many diseases are caused by the perturbation of these mechanisms. Despite the high degree of cell-state specificity of PPIs, many interactions are measured under artificial conditions (e.g. yeast cells are transfected with human genes in yeast two-hybrid assays) or even if detected in a physiological context, this information is missing from the common PPI databases. To overcome these problems, we developed a method that assigns context information to PPIs inferred from various attributes of the interacting proteins: gene expression, functional and disease annotations, and inferred pathways. We demonstrate that context consistency correlates with the experimental reliability of PPIs, which allows us to generate high-confidence tissue- and function-specific subnetworks. We illustrate how these context-filtered networks are enriched in bona fide pathways and disease proteins to prove the ability of context-filters to highlight meaningful interactions with respect to various biological questions. We use this approach to study the lung-specific pathways used by the influenza virus, pointing to IRAK1, BHLHE40 and TOLLIP as potential regulators of influenza virus pathogenicity, and to study the signalling pathways that play a role in Alzheimer's disease, identifying a pathway involving the altered phosphorylation of the Tau protein. Finally, we provide the annotated human PPI network via a web frontend that allows the construction of context-specific networks in several ways.


Asunto(s)
Proteínas/metabolismo , Enfermedad de Alzheimer/metabolismo , Biocatálisis , Humanos , Fosforilación , Unión Proteica , Proteoma , Transducción de Señal , Proteínas Virales/metabolismo
12.
BMC Bioinformatics ; 14: 113, 2013 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-23537461

RESUMEN

BACKGROUND: A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. RESULTS: Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstract classification tool, cited references allowed accurate retrieval of the citing document in a test set of 10,000 documents and also of documents related to six biomedical topics defined by particular MeSH® terms from the entire PMC-OA (p-value<0.01). Classification performance was sensitive to the topic and also to the text sections from which the references were selected. Classifiers trained on the baseline (i.e., only text from the query document and not from the references) were outperformed in almost all the cases. Best performance was often obtained when using all cited references, though using the references from Introduction and Discussion sections led to similarly good results. This query expansion method performed significantly better than pseudo relevance feedback in 4 out of 6 topics. CONCLUSIONS: The retrieval of documents related to a single document can be significantly improved by using the references cited by this document (p-value<0.01). Using references from Introduction and Discussion performs almost as well as using all references, which might be useful for methods that require reduced datasets due to computational limitations. Cited references from particular sections might not be appropriate for all topics. Our method could be a better alternative to pseudo relevance feedback though it is limited by full text availability.


Asunto(s)
Minería de Datos/métodos , PubMed , Algoritmos , Medical Subject Headings
13.
BMC Bioinformatics ; 14: 228, 2013 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-23865855

RESUMEN

BACKGROUND: The need for detailed description and modeling of cells drives the continuous generation of large and diverse datasets. Unfortunately, there exists no systematic and comprehensive way to organize these datasets and their information. CELDA (Cell: Expression, Localization, Development, Anatomy) is a novel ontology for the association of primary experimental data and derived knowledge to various types of cells of organisms. RESULTS: CELDA is a structure that can help to categorize cell types based on species, anatomical localization, subcellular structures, developmental stages and origin. It targets cells in vitro as well as in vivo. Instead of developing a novel ontology from scratch, we carefully designed CELDA in such a way that existing ontologies were integrated as much as possible, and only minimal extensions were performed to cover those classes and areas not present in any existing model. Currently, ten existing ontologies and models are linked to CELDA through the top-level ontology BioTop. Together with 15.439 newly created classes, CELDA contains more than 196.000 classes and 233.670 relationship axioms. CELDA is primarily used as a representational framework for modeling, analyzing and comparing cells within and across species in CellFinder, a web based data repository on cells (http://cellfinder.org). CONCLUSIONS: CELDA can semantically link diverse types of information about cell types. It has been integrated within the research platform CellFinder, where it exemplarily relates cell types from liver and kidney during development on the one hand and anatomical locations in humans on the other, integrating information on all spatial and temporal stages. CELDA is available from the CellFinder website: http://cellfinder.org/about/ontology.


Asunto(s)
Células/clasificación , Vocabulario Controlado , Células/metabolismo , Estructuras Celulares , Células Madre Embrionarias , Expresión Génica , Humanos , Riñón/citología
14.
Nucleic Acids Res ; 39(Web Server issue): W455-61, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21609954

RESUMEN

UNLABELLED: Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not discussed in the literature. Moreover, a manual and comprehensive summarization of the literature attached to the genes of an organism is in general impossible due to the high number of genes and abstracts involved. We introduce the novel Génie algorithm that overcomes these problems by evaluating the literature attached to all genes in a genome and to their orthologs according to a selected topic. Génie showed high precision (up to 100%) and the best performance in comparison to other algorithms in most of the benchmarks, especially when high sensitivity was required. Moreover, the prioritization of zebrafish genes involved in heart development, using human and mouse orthologs, showed high enrichment in differentially expressed genes from microarray experiments. The Génie web server supports hundreds of species, millions of genes and offers novel functionalities. Common run times below a minute, even when analyzing the human genome with hundreds of thousands of literature records, allows the use of Génie in routine lab work. AVAILABILITY: http://cbdm.mdc-berlin.de/tools/genie/.


Asunto(s)
Genes , Programas Informáticos , Algoritmos , Animales , Perfilación de la Expresión Génica , Genómica , Corazón/embriología , Humanos , Internet , MEDLINE , Ratones , Modelos Animales , Pez Cebra/embriología , Pez Cebra/genética
15.
Bioinformatics ; 27(17): 2414-21, 2011 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21798963

RESUMEN

MOTIVATION: Protein-protein interaction (PPI) databases are widely used tools to study cellular pathways and networks; however, there are several databases available that still do not account for cell type-specific differences. Here, we evaluated the characteristics of six interaction databases, incorporated tissue-specific gene expression information and finally, investigated if the most popular proteins of scientific literature are involved in good quality interactions. RESULTS: We found that the evaluated databases are comparable in terms of node connectivity (i.e. proteins with few interaction partners also have few interaction partners in other databases), but may differ in the identity of interaction partners. We also observed that the incorporation of tissue-specific expression information significantly altered the interaction landscape and finally, we demonstrated that many of the most intensively studied proteins are engaged in interactions associated with low confidence scores. In summary, interaction databases are valuable research tools but may lead to different predictions on interactions or pathways. The accuracy of predictions can be improved by incorporating datasets on organ- and cell type-specific gene expression, and by obtaining additional interaction evidence for the most 'popular' proteins. CONTACT: kitano@sbi.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Expresión Génica , Humanos , Proteínas/genética , Proteínas/metabolismo
16.
PLoS One ; 17(7): e0270043, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35776722

RESUMEN

MOTIVATION: Single-cell Chromatin ImmunoPrecipitation DNA-Sequencing (scChIP-seq) analysis is challenging due to data sparsity. High degree of sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from the ENCODE project to impute missing protein-DNA interacting regions of target histone marks or transcription factors. RESULTS: Imputations using machine learning models trained for each single cell, each ChIP protein target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene identification on real human data. Results on bulk data simulating single cells show that the imputations are single-cell specific as the imputed profiles are closer to the simulated cell than to other cells related to the same ChIP protein target and the same cell type. Simulations also show that 100 input genomic regions are already enough to train single-cell specific models for the imputation of thousands of undetected regions. Furthermore, SIMPA enables the interpretation of machine learning models by revealing interaction sites of a given single cell that are most important for the imputation model trained for a specific genomic region. The corresponding feature importance values derived from promoter-interaction profiles of H3K4me3, an activating histone mark, highly correlate with co-expression of genes that are present within the cell-type specific pathways in 2 real human and mouse datasets. The SIMPA's interpretable imputation method allows users to gain a deep understanding of individual cells and, consequently, of sparse scChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION: Our interpretable imputation algorithm was implemented in Python and is available at https://github.com/salbrec/SIMPA.


Asunto(s)
Genómica , Aprendizaje Automático , Animales , Análisis por Conglomerados , ADN , Ratones , Análisis de Secuencia de ADN/métodos
17.
Genes (Basel) ; 13(5)2022 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-35627304

RESUMEN

The gene family of insect olfactory receptors (ORs) has expanded greatly over the course of evolution. ORs enable insects to detect volatile chemicals and therefore play an important role in social interactions, enemy and prey recognition, and foraging. The sequences of several thousand ORs are known, but their specific function or their ligands have only been identified for very few of them. To advance the functional characterization of ORs, we have assembled, curated, and aligned the sequences of 3902 ORs from 21 insect species, which we provide as an annotated online resource. Using functionally characterized proteins from the fly Drosophila melanogaster, the mosquito Anopheles gambiae and the ant Harpegnathos saltator, we identified amino acid positions that best predict response to ligands. We examined the conservation of these predicted relevant residues in all OR subfamilies; the results showed that the subfamilies that expanded strongly in social insects had a high degree of conservation in their binding sites. This suggests that the ORs of social insect families are typically finely tuned and exhibit sensitivity to very similar odorants. Our novel approach provides a powerful tool to exploit functional information from a limited number of genes to study the functional evolution of large gene families.


Asunto(s)
Receptores Odorantes , Animales , Drosophila melanogaster/metabolismo , Proteínas de Insectos/metabolismo , Insectos/genética , Insectos/metabolismo , Ligandos , Receptores Odorantes/genética , Receptores Odorantes/metabolismo
18.
BMC Bioinformatics ; 12: 435, 2011 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-22070195

RESUMEN

BACKGROUND: Biological function is greatly dependent on the interactions of proteins with other proteins and genes. Abstracts from the biomedical literature stored in the NCBI's PubMed database can be used for the derivation of interactions between genes and proteins by identifying the co-occurrences of their terms. Often, the amount of interactions obtained through such an approach is large and may mix processes occurring in different contexts. Current tools do not allow studying these data with a focus on concepts of relevance to a user, for example, interactions related to a disease or to a biological mechanism such as protein aggregation. RESULTS: To help the concept-oriented exploration of such data we developed PESCADOR, a web tool that extracts a network of interactions from a set of PubMed abstracts given by a user, and allows filtering the interaction network according to user-defined concepts. We illustrate its use in exploring protein aggregation in neurodegenerative disease and in the expansion of pathways associated to colon cancer. CONCLUSIONS: PESCADOR is a platform independent web resource available at: http://cbdm.mdc-berlin.de/tools/pescador/


Asunto(s)
Minería de Datos , PubMed , Programas Informáticos , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/metabolismo , Humanos , Internet , Enfermedades Neurodegenerativas/genética , Enfermedades Neurodegenerativas/metabolismo , Proteínas/genética , Proteínas/metabolismo
19.
BMC Bioinformatics ; 12 Suppl 8: S3, 2011 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-22151929

RESUMEN

BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. RESULTS: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.


Asunto(s)
Algoritmos , Minería de Datos , Proteínas/metabolismo , Animales , Bases de Datos de Proteínas , Humanos , Publicaciones Periódicas como Asunto , PubMed
20.
Nucleic Acids Res ; 37(Web Server issue): W141-6, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19429696

RESUMEN

The biomedical literature is represented by millions of abstracts available in the Medline database. These abstracts can be queried with the PubMed interface, which provides a keyword-based Boolean search engine. This approach shows limitations in the retrieval of abstracts related to very specific topics, as it is difficult for a non-expert user to find all of the most relevant keywords related to a biomedical topic. Additionally, when searching for more general topics, the same approach may return hundreds of unranked references. To address these issues, text mining tools have been developed to help scientists focus on relevant abstracts. We have implemented the MedlineRanker webserver, which allows a flexible ranking of Medline for a topic of interest without expert knowledge. Given some abstracts related to a topic, the program deduces automatically the most discriminative words in comparison to a random selection. These words are used to score other abstracts, including those from not yet annotated recent publications, which can be then ranked by relevance. We show that our tool can be highly accurate and that it is able to process millions of abstracts in a practical amount of time. MedlineRanker is free for use and is available at http://cbdm.mdc-berlin.de/tools/medlineranker.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , MEDLINE , Programas Informáticos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA